Write a script to parse image and unicode strings out of non-English (Hindi) PDF files

In Progress

Description

You will have to write a script to parse unicode strings out of a Marathi/Hindi PDF.

Here is the example PDF (attached as well):

[url removed, login to view]

This PDF has multiple pages. Each page has a top heading and then there are various cells arranged in tabular fashion.

For e.g. with this file (which is attached too):

[url removed, login to view]

I will like this file to be parsed to generate a CSV file with following fields:

1) "Assembly No" : 197 (highlighted portion in [url removed, login to view])

2) "Part No": 152 (highlighted portion in [url removed, login to view])

3) "Section No": 1 (highlighted portion in [url removed, login to view])

4) "Section Name": "मदकर टयदडर पदळडपरळगनगर रदजगपरनगर तद. खखड जज. पपणख जपनककड 410505" (will be in unicode and it is the highlighted portion in [url removed, login to view])

5) "Epic Id": KXH1173293 (highlighted portion in [url removed, login to view])

6) "Serial No": 5

7) "House No": 69 (highlighted portion in [url removed, login to view])

8) "Age": 60 (highlighted portion in [url removed, login to view])

9) "Sex": पपरष (Will be in unicode and is the highlighted portion in [url removed, login to view])

10) "Name" : पवदर सकनन लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

11) "Relative Name" : पवदर लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

So script should look like following

python [url removed, login to view] -i [url removed, login to view] -o [url removed, login to view]

CSV file generated should have all the fields properly quoted and escaped. It should also contain the header line.

Platform requirements (must)

1) Language: Any

2) OS: Any

Your script should work on attached files and files in similar format (and files in following links)

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

Skills: ASP, C# Programming, Java, Python

See more: i have 300 names in excel file need to done copy from excel file into notepad file, parse pdf files perl script, macro convert excel file pdf, scrap excel file pdf, convert excel file pdf keep hyperlinks, save excel file pdf active hyperlinks, free translation german english pdf files, convert pdf files 3d php script, convert excel file pdf pdfcreator vb6, convert excel file pdf ms access vb, convert excel file pdf file ner excel, convert english pdf files bengali, convert excel file pdf hyperlinks, attach excel file pdf, converting excel file pdf active hyperlinks, translation italian english pdf files, tamil word english word pdf files, english hindi translation pdf, python script parse csv files, english italian pdf files, macro reading excel file write text file, merge excel file pdf form, write pdf files using perl script, find strings pdf files, excel file program text file

Project ID: #12205330

Awarded to:

lucidprogrammer

i will be doing this elixir provided to you as an executable script.

₹72222 INR in 7 days
(0 Reviews)
0.0

3 freelancers are bidding on average ₹54664 for this job

cracken

Hi, I, based on my 5 years experience as a software engineer knowledgeable with unix and linux administration expert on commandline application, can take good care of your project. I love to discuss further on your re More

₹41769 INR in 2 days
(9 Reviews)
4.4
mascotsoft4

Dear Client, Greeting of the day ahead !!! Thanks for providing us opportunity to place bid over the project and communicate with you. I am a serious bidder here and i have already worked on a similar project befor More

₹50000 INR in 6 days
(0 Reviews)
0.0