Hello there,
I would like somebody who has done binary classification machine learning in the past to predict income (0-1). This problem must be easy for the individual, since the task must be completed in a short time.
I would like the task to be done in a Python Jupyter notebook.
There are NaN values in both categorical and numerical columns in the dataset. Columns and rows containing only NaN values must be dropped. Including one duplicate column.
You should be left with NaN values in only two of the columns. The missing values for the categorical column and numerical column must then be imputed using an appropriate machine learning models. (not mean/mode etc.)
The categorical features should also be encoded in the most "intelligent" way. Taking nominal/ordinal features into consideration and deciding on the best route for the dataset (or more than one method can be used and the best one decided on based on model performance at the end when changing between the two). Perhaps one-hot encoding will cause high dimensional for features with more than 4 categories? If you are an experienced data scientist, you should be able to gauge what is best here.
Feature selection should then be done.
And then the prepared data set should be fitted to different appropriate model/algorithm based on what would be best in this case.
The dataset should be split for training, testing and validation.
About 4-5 different models should be applied and the output checked/validated with various metrics and a visual to compare.
Some text should be added to note why decisions were made and why a certain model/algorithm was decided on. Also shortly discuss the metrics considered to evaluate the performance of the model.
Comment should be made throughout so that the user would understand the solution of the tutorial.
The original tutorial came with a rubric, but no solutions manual. Please use the rubric to guide you.
Thank you.
PS: The Max file size was exceeded when adding the Rubric. I will share that once you have been selected for the task.
Hi! Hope you are doing great. I can help you in this project.
Please send me a message to discuss more.
I am available to start working on the project immediately
Hey
I am a professional data scientist with 5 years of experience. I hold an MBA and first Degree in statistics which provides me with the necessary background to handle your project.
Having done various projects using spss, R, python, I can deliver quality and superior work at a price we are both comfortable with and within the agreed timeline.
Kindly send me text