In Progress

Write a script to parse image and unicode strings out of non-English (Hindi) PDF files

You will have to write a script to parse unicode strings out of a Marathi/Hindi PDF.

Here is the example PDF (attached as well):

[url removed, login to view]

This PDF has multiple pages. Each page has a top heading and then there are various cells arranged in tabular fashion.

For e.g. with this file (which is attached too):

[url removed, login to view]

I will like this file to be parsed to generate a CSV file with following fields:

1) "Assembly No" : 197 (highlighted portion in [url removed, login to view])

2) "Part No": 152 (highlighted portion in [url removed, login to view])

3) "Section No": 1 (highlighted portion in [url removed, login to view])

4) "Section Name": "मदकर टयदडर पदळडपरळगनगर रदजगपरनगर तद. खखड जज. पपणख जपनककड 410505" (will be in unicode and it is the highlighted portion in [url removed, login to view])

5) "Epic Id": KXH1173293 (highlighted portion in [url removed, login to view])

6) "Serial No": 5

7) "House No": 69 (highlighted portion in [url removed, login to view])

8) "Age": 60 (highlighted portion in [url removed, login to view])

9) "Sex": पपरष (Will be in unicode and is the highlighted portion in [url removed, login to view])

10) "Name" : पवदर सकनन लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

11) "Relative Name" : पवदर लकमण (Will be in unicode and is the highlighted portion in [url removed, login to view])

So script should look like following

python [url removed, login to view] -i [url removed, login to view] -o [url removed, login to view]

CSV file generated should have all the fields properly quoted and escaped. It should also contain the header line.

Platform requirements (must)

1) Language: Any

2) OS: Any

Your script should work on attached files and files in similar format (and files in following links)

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

[url removed, login to view]

Skills: ASP, C# Programming, Java, Python

See more: scrap excel file pdf, attach excel file pdf, english hindi translation pdf, english italian pdf files, find strings pdf files, convert excel file pdf

About the Employer:
( 0 reviews ) India

Project ID: #12205330

Awarded to:

lucidprogrammer

i will be doing this elixir provided to you as an executable script.

₹72222 INR in 7 days
(0 Reviews)
0.0

3 freelancers are bidding on average ₹54664 for this job

cracken

Hi, I, based on my 5 years experience as a software engineer knowledgeable with unix and linux administration expert on commandline application, can take good care of your project. I love to discuss further on your re More

₹41769 INR in 2 days
(9 Reviews)
4.4
mascotsoft4

Dear Client, Greeting of the day ahead !!! Thanks for providing us opportunity to place bid over the project and communicate with you. I am a serious bidder here and i have already worked on a similar project befor More

₹50000 INR in 6 days
(0 Reviews)
0.0