Scrape website sitemaps and test all the internal links inside scraped pages for status 200 or status 302 response
$10-30 USD
Paid on delivery
Hi,
We are developing a deployment strategy and part of it involves the testing of all the links inside our pre-production website for 404 errors or other types of stuff.
We would like run a to scraper which would helps us test the following sequence:
1. Given website test.mywebsite.com..
- Does it have sitemap?
- Does it have multiple sitemaps included?
- Open all sitemaps file and scrape each link
2. For all the links and images from the sitemaps
- Run an HTTP Test on all the links inside each page to test their return status.
E.g.
$ python [url removed, login to view] [url removed, login to view]
Starting run for: [url removed, login to view]
Sitemap: [url removed, login to view]
3 Sitemaps Found:
- [url removed, login to view]
- [url removed, login to view]
- [url removed, login to view]
Testing:
/ -> OK
/contact-us -> OK
/our-team -> OK
/logon -> OK
/newpage -> ERRORS 404
STATUS 404: IMG : [url removed, login to view]
STATUS 404: LINK: [url removed, login to view]
/otherpage -> OK
Can you please provide estimates?
Project ID: #8031449
About the project
7 freelancers are bidding on average $57 for this job
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
I have a bachelor in Computer Science from the American University in Cairo and a minor in Mathematics, with 10+ years of experience with hands-on programming. I have worked for the past year in Microsoft's Advanced Te More
ok. can implement this use lxml + requests lib. Also maybe multiple sitemap from file to check and log results to file will be more optimally. Contact me to start work.
We are a USA based firm that delivers high quality work the first time. There is no need to explain your requirements more than once.