Find Jobs
Hire Freelancers

Fast Webpage Crawler and Scraper

$30-250 USD

Closed
Posted over 4 years ago

$30-250 USD

Paid on delivery
I need a crawler that will crawl a list of domains that I will load in from a CSV file. The crawler needs to crawl ONLY THE LANDING PAGE - not the entire site - and capture the following and output a CSV file and stored to Dropbox: 1) Does URL have Google Analytics code - yes or no. Use a search for "Google Analytics" in the source of the page. 2) Is there a link to a privacy policy on the page - yes or no. Use a search for the word "Privacy" in the link text 3) How many unique internal URL links are present on the page. Return link count. 4) Is the URL secure (SSL) - yes or no. 5) Is the URL mobile-friendly - yes or no. Use a search for "meta name="viewport"" in the source of the page. 6) Is the domain parked - yes or no. Look for keywords or phrases in the source code. 7) Is a phone number present on the page? yes or no. Capture the phone number. 8) URL being crawled. The crawler must be capable of crawling 70,000 URLs per hour. To be successful, the script will be tested using 70,000 URLs in one hour.
Project ID: 23485154

About the project

12 proposals
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
12 freelancers are bidding on average $191 USD for this job
User Avatar
Hi there, I am scraping expert, I have did more than 500+ scraping project, please check my feedback then you will know. Can we discuss more details about this project? then I will provide example data/script for you. Thanks, Lin
$160 USD in 5 days
5.0 (410 reviews)
8.1
8.1
User Avatar
Hi. I did read the project description and have a few questions. 1. Do you need the script as well or data only? 2. What is the format of the output data? CSV is OK? We can do other formats as well. 3. Which fields do you want to extract from the website? 4. How many results/urls are there? 5. Can you share the CSV with urls? Let's get in touch and we can provide a sample. Thx, waiting for these details and hope to collaborate.
$200 USD in 4 days
5.0 (69 reviews)
7.5
7.5
User Avatar
Hi I can deliver a multi-threading desktop tool that process 70k per hour Thanks
$400 USD in 3 days
5.0 (99 reviews)
7.6
7.6
User Avatar
hello, i have a 13 years of experience with such tools development - you can check my profile reviews. so i can build such a script for you quick. and you don't need 15 parallel threads or some special vps with it, it can be done with single thread and will work on any machine with a good internet connection. give me a 1K list please, i wanna run a benchmark to measure the scrapping time. let me know if you're interested. thanks.
$180 USD in 3 days
5.0 (14 reviews)
6.3
6.3
User Avatar
Scraping 70,000 urls in an hour is entirely dependent on the hosting this bot will run on. You'll need atleast 15 parallel threads, which is provided on many VPS, VM, and dedicated hosting providers. I've experience with all of the above. Thanks! Relevant experience: Linux, PHP7+, cURL, Proxies, Excel, Data Parsing, Api Integration --- - Scripts, solutions, frameworks - Over 120 crawler and parser jobs completed(from here and other sources) - Proxies provided. Ask about hosting!
$230 USD in 2 days
4.9 (67 reviews)
5.4
5.4
User Avatar
I can make a desktop application that will be multi-threaded to download as many pages as possible. However its speed depends entirely on your internet speed and the response times of the websites its download from. My average project completion time is within 3-5 hours on the same day. The skills I have include PHP, HTML5, CSS3, JavaScript, jQuery, WordPress Themes & Plugins, Web Scrapers & Automation Bots, User Scripts, Macros, and much more. If you have any questions or concerns. Feel free to message me via chat to clarify any details.
$150 USD in 1 day
4.8 (20 reviews)
5.0
5.0
User Avatar
Hi, i'm an expert in highly responsive website with optimale web technologies.I could do the job perfectly. i will work this project with c # desktop application with buttons, progress bar, multithreading. everything is clear and detailed. Can we discuss more details about this project? then I will provide example data/script for you. I'm at your disposal for any further information. Waiting for your response!
$200 USD in 7 days
5.0 (17 reviews)
4.8
4.8
User Avatar
Hi, Im an expert at Web Scraping with Python. The task is clear, we need multi-threaded/asynchronous programming to achieve that speed. It also depends on your network bandwidth but thats supposed to be alright. Contact me in chat to begin thanks, Pandelis
$100 USD in 3 days
4.9 (15 reviews)
4.8
4.8
User Avatar
Hey I can provide such scraper done in python + Scrapy. If you have in mind a faster solution than Scrapy, I would like to here what that is. I will integrate with Dropbox SDK for uploading results there. I will wrap everything up with Docker. BUT, 70k requests per hour is something that really depends: - on your internet connection, but more important - on each website bandwidth If this 2 can deliver 70k requests per hour, there will be no limitation from the code. Note: Scrapy already does concurent requests, it's the best tool that I know at this. Vali
$100 USD in 1 day
5.0 (3 reviews)
2.6
2.6
User Avatar
I will develop a spider for you using Python Scrapy Framework. The framework supports asynchronous web requests which will pass the 70000/hr requirement. Text me to discuss further.
$120 USD in 3 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Sheb, United States
4.7
21
Payment method verified
Member since Jun 16, 2009

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.