We are a fintech company specializing in providing software solutions to financial brokers. Looking to work long term.
Write a simple single-thread web crawler. Starting from URL <[login to view URL]>, download a page and then wait 5 seconds before downloading the next page. Your program should find other pages to crawl by parsing link tags found in previously crawled documents.
Show the URLs of the first 10 web pages that satisfy the following three conditions simultaneously: (1) your program crawls successfully; (2) within the domain of [login to view URL]; and (3) each of such pages contain some URLs that your program has not met yet.
A page may contain multiple URLs, how does you program choose the next URL to crawl? Explain which factors/priorities are considered in your design
Change your program so that it can harvest as many URLs as possible. List the URLs of the first 10 pages that your program crawls successfully within the domain of sfu.ca. In total how many URLs does your program retrieve? What heuristics does your program use to select the next URL to search?
Hi
Hope you are doing well
I've gone through your posted job description for Single-thread web crawler
I am Web developer and Designer having 5+ years of website development and design. I have delivered websites for more than 550+ clients successfully.
I am confident that my skills make me a strong candidate to fulfill the creative needs of
your Project.
Please initiate a small chat so that we can discuss the details of the project and provide you exact quote with timeline.
Thanks
I will write a python script to crawl the webpage provided and retrieve all links while collecting the page source and test for if the page has urls that havent been met previously. The script will choose the next url by filtering out urls that havent been met before and crawling those to find the next set of urls. This way it gets all the urls in the website. Send a message for more details