Peyton Data Mining

Completed Posted 4 years ago Paid on delivery
Completed Paid on delivery

you are going to read some text files and classify them according to their labels. The Reuters corpus is one of the most famous datasets for text categorization tasks. We provide a subset of this dataset on Brightspace. You apply these files to make your classifier. There is more information about this dataset available on [login to view URL]

1- Download zip file and extract it. Consider this data is a subset of full Reuters corpus to make it possible for you to process without the need of a powerful server.

2- Each file contains some XML files. Explore XML files and find a list of all fields available there.

3- Write a function extract a Pandas's Dataframe containing: (1) headline, (2) text, (3) bip:topics,(4)

[login to view URL], (5) itemid, (6) XMLfilename

4- Write a python function to find all the possible values for bip:topics. Consider that each news can

belong to more than one topic.

5- Write a function to prepare your text data by methods such as removing stop words. You are allowed

to use the NLTK library.

6- Extract features from the text using any approach you like. Write a function that input the Dataframe

in step 3 and generates a new Dataframe of your features and labels.

7- Divide your data into a training and test set. You can use any method such as cross-validation. You

need to provide a reason why you decide so here.

8- Write a function to get the Dataframe of step 6 and a set of parameters to return a trained classifier

to classify all labels that you get in step 4.

9- Write a function to evaluate the quality of your classifier (like accuracy, F-score, AUC, ...). Explain why

you think this function is the best choice

9- Generate five different classifiers (Random Forest, Decision Tree, Linear Regression, Neural Network, and SVM) using step 8. Tune them up for the best parameters. Find the best classifier. Explain why.

Python Data Mining Software Architecture Data Processing XML

Project ID: #21831994

About the project

9 proposals Remote project Active 4 years ago

Awarded to:

Zohaib748

Hello Dear...! Alert: I will give you 20% discount on my bid rate also give on my All Services. So grabs this special offer is limited. Let’s get to the point. I came to know that your Looking a developer which More

$131 CAD in 3 days
(19 Reviews)
4.9

9 freelancers are bidding on average $186 for this job

DevStar925

Hi, I read your project description and I am interested in your job. As you can see my profile, I am a full-time developer and have just completed many projects. Specially, I have top skills for C/C++, C#, Java, Py More

$200 CAD in 2 days
(70 Reviews)
7.4
smsaurabhv

Hi, I have gone through your requirement to scrape lots of websites. I am EXPERT in building scraping tools /scripts. Hence, I can SURELY work on your project. I am having 4 YEARS of EXPERIENCE in developing PHP-PYTHON More

$108 CAD in 3 days
(91 Reviews)
5.7
Arahan00

Hi, I have worked with NLP for sentiment analysis. I used Pythonfor the development. I would like to work on your project. Let me know if you want to discuss further. Regards, Monir

$250 CAD in 14 days
(21 Reviews)
5.2
razajen

I am a professional data scientist from Scotland I have a vast amount of experience in data mining I am more than happy to go ahead and discuss your project with you please drop me a text here.

$277 CAD in 1 day
(7 Reviews)
4.1
agrepatil12345

Hi Sir, Having Expertise in nature language processing, using python. also worked on different classification algorithm from machine learning and Deep learning. let's connect for further discussion. Thanks

$200 CAD in 2 days
(3 Reviews)
0.0
soooky92

i can do it in a couple of days, i would use cross-validation because it is the one that i normally use.

$100 CAD in 10 days
(0 Reviews)
0.0
luisnarvaez19

Certified in Java 1.2. I have been working with Java and JEE for 15 years. I have worked with several programming languages as: C, Python, Javascript, Visual Basic among others. I have experience doing compilers and in More

$250 CAD in 7 days
(0 Reviews)
0.0