Scrape website sitemaps and test all the internal links inside scraped pages for status 200 or status 302 response

Closed Posted Jul 9, 2015 Paid on delivery
Closed Paid on delivery

Hi,

We are developing a deployment strategy and part of it involves the testing of all the links inside our pre-production website for 404 errors or other types of stuff.

We would like run a to scraper which would helps us test the following sequence:

1. Given website test.mywebsite.com..

- Does it have sitemap?

- Does it have multiple sitemaps included?

- Open all sitemaps file and scrape each link

2. For all the links and images from the sitemaps

- Run an HTTP Test on all the links inside each page to test their return status.

E.g.

$ python [url removed, login to view] [url removed, login to view]

Starting run for: [url removed, login to view]

Sitemap: [url removed, login to view]

3 Sitemaps Found:

- [url removed, login to view]

- [url removed, login to view]

- [url removed, login to view]

Testing:

/ -> OK

/contact-us -> OK

/our-team -> OK

/logon -> OK

/newpage -> ERRORS 404

STATUS 404: IMG : [url removed, login to view]

STATUS 404: LINK: [url removed, login to view]

/otherpage -> OK

Can you please provide estimates?

Linux PHP Python Shell Script Software Development

Project ID: #8031449

About the project

7 proposals Remote project Active Aug 15, 2015

7 freelancers are bidding on average $57 for this job

mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$120 USD in 3 days
(148 Reviews)
6.8
ahmedbassiouny

I have a bachelor in Computer Science from the American University in Cairo and a minor in Mathematics, with 10+ years of experience with hands-on programming. I have worked for the past year in Microsoft's Advanced Te More

$111 USD in 3 days
(10 Reviews)
4.1
prog2u

Dear Sir/ Madam, Kindly check my bid & project completion ratio befor awarding. I'm really interested to work on this project, I can start the work now , and can provide the best services from my end. Please come on More

$50 USD in 0 days
(14 Reviews)
3.5
orioncx

ok. can implement this use lxml + requests lib. Also maybe multiple sitemap from file to check and log results to file will be more optimally. Contact me to start work.

$25 USD in 2 days
(0 Reviews)
0.0
Onlance

Thu, 09 Jul 2015 16:38:10 +0000 Hello, Can do a quick Perl script. May need to install a few modules, though. Will follow all links starting from main page. Will detect copies of links and avoid third party link More

$25 USD in 1 day
(0 Reviews)
0.0
frankmowen

We are a USA based firm that delivers high quality work the first time. There is no need to explain your requirements more than once.

$55 USD in 1 day
(0 Reviews)
0.0