Find Jobs
Hire Freelancers

Project for Dmitry A. -- phase 4 - (150 EUR)

€30-250 EUR

Completed
Posted about 6 years ago

€30-250 EUR

Paid on delivery
a) Download and convert pdf files to txt from [login to view URL] b) The program should iterate through a list of ID numbers and download, convert and store the corresponding file as separate files named OS+"ID number". c) The files to be converted are "official statements" from municipal bond issues, accessible at [login to view URL] d) Follow the next steps to download the statements: Step 1: Append the ID number to the end of the following link [login to view URL] Example: [login to view URL] Step 2: Go to the second tab, "official statement", located at the grey bar below the "Issue details" section. e) Program requisites: i) The program needs to be fast. I need to convert hundreds of thousands of documents. ii) Storage space is very important. File sizes should be as small as possible. The files contain only american english characters. Images and maps are not important. iii) The program should have the option to establish the maximum number of pages to be stored. The default should be to store every page. iv) The program should handle well the following cases: CASE 1: when there is no pdf to download. [login to view URL] CASE 2: when the whole pdf is an image. It is the typical case for "old" issues. Do whatever you can here, but don't waste your time if it is not possible to store the text. [login to view URL] CASE 3: when the pdf is not an image, and you can Ctrl+copy/Ctrl+paste it directly to any text program. [login to view URL] CASE 4: when the pdf is not an image, but you CANNOT Ctrl+copy/Ctrl+paste it directly to a text program. [login to view URL] CASE 5: TWO pdf files: Sub-case 5.1: one is the official statement and the other one is the preliminary official statement. Convert and store ONLY the official statement and NOT the "preliminary" one. [login to view URL] Sub-case 5.2: two "official statement posted...", i.e. the first three words of the file names are the same. THIS is what you should do: 1. Append the texts of the official statements if the size difference between files is more than 10% wrt the larger size one, AND the posted date difference is not more than 1 year. Examples: [login to view URL] [login to view URL] [login to view URL] [login to view URL] 2. OTHERWISE, only keep the most recently posted file. Examples: [login to view URL] [login to view URL] Sub-case 5.3: when the second file is neither another "official statement posted..." nor a preliminary one. Always ignore those files for which the first three words are not "official statement posted...", unless it says any synonym of "amendment" or "supplement" (I didn't find an example), in which in that case, you should proceed as in sub-case 5.2. Otherwise, always disregard them. [login to view URL] CASE 6: when there are more than 2 files. In this case proceed by combining the sub-cases 5.1 to 5.3. I have provided some examples with 3 files in the sub-case 5.2 section.
Project ID: 16161560

About the project

4 proposals
Remote project
Active 6 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
A proposal has not yet been provided
€150 EUR in 4 days
5.0 (23 reviews)
5.3
5.3

About the client

Flag of SPAIN
Segovia, Spain
5.0
2
Payment method verified
Member since May 4, 2017

Client Verification

Other jobs from this client

Web Scraping
€30-250 EUR
Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.