project using R programming language


The two files that you are given contain data about a large number of patients (approximately 20000).

Both files have an id for each patient which uniquely identifies the patient. Thus patient 34573 in the first file is the same patient as 34573 in the second. The problem is that the id numbers are not in order and there are a couple of patients missing from each file (but not the same patient). Also, there are some missing values that need to be estimated.

The first file contains some demographic and simple measurements (height, weight, etc).

variable type values id unique key age numeric 1-80 ethnic factor 0=white, 1=black, 2=hispanic, 3=nativeAmerican, 4=asian, 5=other income factor 0 = 0 to 30,000, 1 = 30,001 to 60,000, 2 = 60,001 to 90,000, 3 = 90,001 to 120,000, 4= above 120,000 marital factor 0=single, 1=married occGroup factor 10 occupational groups gender factor 0=female, 1=male weight numeric 90 - 300 height numeric 40 - 80 heartRate numeric 30 - 110

The second file contains the results of 5 medical tests where the results range from 0 to 100. It also has an indicator of of whether or not the patient has a disease (0=yes, 1=no).

variable values id unique key testA numeric 0 to 100 testB numeric 0 to 100 testC numeric 0 to 100 testD numeric 0 to 100 testE numeric 0 to 100 disease factor 0=yes, 1=no Procedure Follow the steps below to process the data files:

1. do some preliminary analysis and clean up any outliers or missing data

2. merge the two data sets using the id field (R has a command called merge that will be helpful)

3. mining test all 5 classifiers that we used this semester (trees, naive Bayes, KNN, SVM, ANN) to find the best technique for classifying the data. Using disease as the class and 10-fold cross validation to verify the results. Use accuracy to measure the quality of the classification.

4. analyze the data further (plotting, clustering, statistics) to support your claim of which classifier is best. You may need to analyze just parts of the data by selecting certain values of variables.

Hand in •

a maximum 4 page report that discusses the decisions you made in the procedure above and the results that you received.

Also, most importantly, include the conclusions you reached and the support for the classifier that was chosen as best.

Use charts and plots to help illustrate.

In writing your report, try to use the ideas in the book by Knaflic (referenced above). • the R commands and scripts that you used to do the analysis above.

Rubric (hours are approximate – some students will take longer than others) :

hours points cleaning merging and preliminary analysis 3-5 7 classification 4-7 7 analysis 4-7 11 total 25

Skills: R Programming Language, Report Writing

See more: read oracle databse using spss programming language, programming language data analysis, r programming language, r-programming language, freelancer r programming language, r programming language download, r programming language jobs, r programming language tutorial, the r programming language, R Programming for Data Science, r programming language,spss, r programming language work, freelance r programming language jobs, online r programming language jobs, r programming language jobs afghanistan, online r programming language jobs afghanistan, r programming language work afghanistan, freelance r programming language jobs afghanistan, companies using r programming language

About the Employer:
( 1 review ) Walker, United States

Project ID: #24777187

18 freelancers are bidding on average $80 for this job


1.I am an expert in R programming, Statistics, regression analysis, using both Excel and SPSS, also expert in Excel, Excel formulas, and all excel functions, macros, lookup, pivot tables and charts. [login to view URL] done many p More

$150 USD in 3 days
(80 Reviews)

Hello I can achieve this project perfectly using R ggplot and ploty library please contact me for more details about the project I master R programming , statistics , data mining , linear regression machine learning us More

$210 USD in 2 days
(50 Reviews)

I am a data scientist and have experience in machine learning and statistical analysis of data using R and Python. I have read your description and can deliver within the targeted deadline. Thanks.

$40 USD in 1 day
(34 Reviews)

Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp More

$100 USD in 7 days
(36 Reviews)

A Data Scientist with experience in SPSS, CALCULUS, Advanced Excel, R programming, R Shiny, R studio and anything related to data science and python Master in Engineering.

$111 USD in 1 day
(24 Reviews)

Hi, I'm interested in your project. I have wide experience in machine learning and I can learn all the classifiers you've mentioned. just contact me for further details.

$111 USD in 3 days
(24 Reviews)

Dear client I am R programmer, statistical analyst and data scientist. I have experience in probability, statistical inference, hypothesis testing, statistical modelling and machine learning. I have wide experience in More

$50 USD in 3 days
(7 Reviews)

Dear Researcher I am experienced academic writer and expert in 'R' and SPSS for data analysis (Regression, ANOVA, Correlation, t-test, Chi Square test, Block Plot etc.). I always provide results along with descriptive More

$100 USD in 5 days
(11 Reviews)

----------------------------------------------- Hello Sir/Madam As an engineer in statistics and applied economics as well as IT development technician. moreover, I am very proficient in the language of R and SPSS as More

$20 USD in 7 days
(5 Reviews)

Hello, I am an independent, experienced R expert. I can help with this task with a quick turn-around. Looking to hearing from you. Kind regards Rina B.

$90 USD in 3 days
(4 Reviews)

Hello there, I have gone through your project details, The idea is [login to view URL] I am a Data Analyst, I have experience in machine learning cleaning data and apply algorithm such SVM ,Naive Bayes, kNN ,K-Means and Rando More

$80 USD in 7 days
(3 Reviews)

I am an experienced Data Scientist and Machine Learning Engineer. Deep learning, Artificial intelligence, machine learning, Data structures, and algorithms are my major fields. I finished specializations on Data Scienc More

$20 USD in 3 days
(4 Reviews)

Hello, My total experience is 6.8 years in Predictive modeling Statistical Analysis, Optimization and Machine learning techniques. I have implemented data science solutions for the Digital marketing & advertising, Tel More

$100 USD in 7 days
(3 Reviews)

Hi, I gone through the description it is very well written and understandable. I can help for the said project with a good accuracy for each classifier. I will also prepare the report for all the step by step analysi More

$100 USD in 3 days
(2 Reviews)

Hello I'm an experienced Statistician with extensive skills and knowledge in research and other areas of Statistics. I am good in Statistical Software’s like SPSS, R programming, Minitab, Excel. Having a Master degree More

$30 USD in 2 days
(2 Reviews)
$25 USD in 1 day
(0 Reviews)

First of all thank you for posting this work. I have 3 year of experience in R programming and machine learning algorithm. I can do your work AsAp. I have done many projects using R programming and published in Good co More

$78 USD in 5 days
(0 Reviews)

Hello! My name is Victor Clipa (29) and I am engineer working in Germany. I possess excellent statistical analysis skills, acquired during my academic studies in England, UK and I have several years of experience in st More

$20 USD in 7 days
(0 Reviews)