Peyton Data Mining

you are going to read some text files and classify them according to their labels. The Reuters corpus is one of the most famous datasets for text categorization tasks. We provide a subset of this dataset on Brightspace. You apply these files to make your classifier. There is more information about this dataset available on [login to view URL]

1- Download zip file and extract it. Consider this data is a subset of full Reuters corpus to make it possible for you to process without the need of a powerful server.

2- Each file contains some XML files. Explore XML files and find a list of all fields available there.

3- Write a function extract a Pandas's Dataframe containing: (1) headline, (2) text, (3) bip:topics,(4)

[login to view URL], (5) itemid, (6) XMLfilename

4- Write a python function to find all the possible values for bip:topics. Consider that each news can

belong to more than one topic.

5- Write a function to prepare your text data by methods such as removing stop words. You are allowed

to use the NLTK library.

6- Extract features from the text using any approach you like. Write a function that input the Dataframe

in step 3 and generates a new Dataframe of your features and labels.

7- Divide your data into a training and test set. You can use any method such as cross-validation. You

need to provide a reason why you decide so here.

8- Write a function to get the Dataframe of step 6 and a set of parameters to return a trained classifier

to classify all labels that you get in step 4.

9- Write a function to evaluate the quality of your classifier (like accuracy, F-score, AUC, ...). Explain why

you think this function is the best choice

9- Generate five different classifiers (Random Forest, Decision Tree, Linear Regression, Neural Network, and SVM) using step 8. Tune them up for the best parameters. Find the best classifier. Explain why.

Skills: Python, Data Mining, Software Architecture, Data Processing, XML

See more: excel data mining project, build data mining project, data mining marketing research, data mining research companies, example data mining, purchase data mining contract information, data mining cleaning, dataset data mining association, screen scraping data mining, role database developers data mining, data mining using aspnet, datasets data mining association, medical billing service data mining, data mining websites excell, find data mining clients, data mining find jobs php, email find data mining, Find research on Image Processing/ Data Mining, find data mining expert

About the Employer:
( 2 reviews ) Middle Sackville, Canada

Project ID: #21831994

Awarded to:


Hello Dear...! Alert: I will give you 20% discount on my bid rate also give on my All Services. So grabs this special offer is limited. Let’s get to the point. I came to know that your Looking a developer which More

$131 CAD in 3 days
(19 Reviews)

9 freelancers are bidding on average $186 for this job


Hi, I read your project description and I am interested in your job. As you can see my profile, I am a full-time developer and have just completed many projects. Specially, I have top skills for C/C++, C#, Java, Py More

$200 CAD in 2 days
(70 Reviews)

Hello? How are you? I am excited to work with you on this project. I have done a lot of jobs with python like Django admin, Flask, python scrap, pysql, python tkinter GUI etc Here is on of my scrap with python wor More

$155 CAD in 3 days
(138 Reviews)

Hi, I have gone through your requirement to scrape lots of websites. I am EXPERT in building scraping tools /scripts. Hence, I can SURELY work on your project. I am having 4 YEARS of EXPERIENCE in developing PHP-PYTHON More

$108 CAD in 3 days
(91 Reviews)

Hi, I have worked with NLP for sentiment analysis. I used Pythonfor the development. I would like to work on your project. Let me know if you want to discuss further. Regards, Monir

$250 CAD in 14 days
(21 Reviews)

I am a professional data scientist from Scotland I have a vast amount of experience in data mining I am more than happy to go ahead and discuss your project with you please drop me a text here.

$277 CAD in 1 day
(7 Reviews)

Hi Sir, Having Expertise in nature language processing, using python. also worked on different classification algorithm from machine learning and Deep learning. let's connect for further discussion. Thanks

$200 CAD in 2 days
(3 Reviews)

i can do it in a couple of days, i would use cross-validation because it is the one that i normally use.

$100 CAD in 10 days
(0 Reviews)

Certified in Java 1.2. I have been working with Java and JEE for 15 years. I have worked with several programming languages as: C, Python, Javascript, Visual Basic among others. I have experience doing compilers and in More

$250 CAD in 7 days
(0 Reviews)