data munging, cleansing, builduing, predictive model, local regressors and clustering
Budget £20-250 GBP
1 Part 1 - Building up a basic predictive model
Load the dataset [login to view URL] into a pandas dataframe and carry out the following tasks. Organise your code bearing in mind robustness and maintainability:
1. Data cleaning and transformation:
If you have a closer look at the dataset, you will see that there are lots of missing values. They need be treated appropriately but in the first instance, we will take an aggressive approach to dealing with them.
• Show the shape of the dataset
• Rename incorrectly formatted column names (e.g. SALE\nPRICE)
• Create list of categorical variables and another for the numerical variables
• For each numerical column, remove the ',' the '$' for the sale price, and then convert them to numeric.
• Convert the 'SALE DATE' to datetime.
• For each categorical variable, remove the spaces, and then replace the empty string '' by NaN.
• Replace the zeros in Prices, Land squares, etc. by NaN
• Show a summary of all missing values as well as the summary statistics
• Drop the columns 'BOROUGH', 'EASE-MENT', 'APARTMENT NUMBER'
• Drop duplicates if any
• Drop rows with NaN values
• Identify and remove outliers if any
• Show the shape of the resulting dataframe.
• Consider the log of the prices and normalise the data.
2. Data Exploration. Consider the resulting dataframe. This first aggressive cleaning should give a smaller dataset, which you can start by exploring relationships between the various features of the dataset.
• Visualise the prices across neighborhood
• Visualise the prices over time
• Show the scatter matrix plot and the correlation matrix
• Any further plots, which demonstrate your understanding of the data
3. Model building. Consider the resulting dataframe.
• Select the predictors that would have impact in predicting house prices.
• Build up a first linear model with appropriate predictors and evaluate it. Split the data into a training and test sets; build up the model; and then show a histogram of the residuals. Evaluate your model by using a cross-validation procedure.
2 Part 2 - Improved model
This is an open-ended question and you are free to push your problem-solving skills in order to build up a useful model with higher performance.
1. Consider the entire datasets given in this assignment. Develop an improved predictive model that predicts the sales prices of houses. Make sure to validate your model. You should aim for a model with a higher performance while using a maximum of data points. This implies treating missing values differently for example through imputation rather than dropping them.
2. Use the K-Means algorithm to cluster your cleansed dataset and compare the obtained clusters with the distribution found in the data. Justify your clustering and visualise your clusters as appropriate.
3. Build up local regressors based on your clustering and discuss how this clusters-based regression compares to your regression model obtained in Part 2. 1.
25 freelancers are bidding on average £121 for this job
Top 1% in Freelancer.com Hi, Greetings! ✅checked your project details: ✅Completed Time: In project deadline We have worked on 850 + Projects. I have 6 + years of the experience in same kind of projects. I am a More
Hi there! I have more than 10 years of experience as a Data Base Administrator and Analytics Engineer. I'd love to work together on your project building your excel models. I'm so responsible and kind, I'll always send More
Hello i am engineer in Statistics, have a good experience , i can help asap................. regards
Hey I am expert in python and I can help you with your project message me to discuss so that we can start working on it
Hi. How about you? I have just read your proposal and I am sure I can complete the project on time. I am an expert in ML/DL who has 10+ years of experiences. Please contact me to discuss about the project in more detai More
Hi, Iam data analyst and have relaive experience to deal with your project. I can provide a Jupyter notebook book with part one requirements in one day and can complete the project with max 3 days. Please get in touch More
Hi there, I have read your project details.I can build you a predtive model for your manhattan data set . I will all the data science techniques such as data preprocessing visualizations to build model for you. Message More
I will do able to do this project because I expert in ms office and its its related all work so I want to give me this project.Thanks.
Hi, I can help u as i have done several similar jobs related to Data Processing, Big Data, Data Cleansing, Pandas and Predictive Analytics, I have read the details and furthermore discuss about it, plz discuss with me More
Hi, dear client, I have seen your description, I am an associate degree professional in Data Processing Analysis, data extraction. I am also an expert in python data scraper . Please send a Maine sage to contact me to More
Hi, I am new on the site but this is my function basically in my job which is required on your project. I am dealing in large data sets and I have finished nearly 70 projects in last two years. Presently, I am working More
Hi! I am Junior Actuarial Data Scientist. I can help you for this project.I have done many data analysis similar to this project before. I believe that I will analyze this project very well.
I am thrilled to have the opportunity to offer my expertise in data science to help you achieve your goals. As a skilled data scientist with a passion for solving complex problems, I am confident that I can provide you More
EXPERIENCED in dealing with data and cleaning part. Actively part of many ML competition to build and tune models. Can train your model in best outcome way. Recently worked on project name uber lyft price prediction of More
☑️FULL-TIME FREELANCER☑️ I am an expert in any Scraping, Leads, eCommerce Product Uploading, all kind of Data Entry, PDF Form Creation, Web Search Expert who knows the value of time, is very hard-working, and always d More
Hello, I’m a machine learning engineer and a certified data analyst. I’m a PhD student at the faculty of engineering and I have published papers in the fields of machine learning and computer vision. I have 9+ years More