Changing 'toUnicode' Mapping in PDF
₹37500-75000 INR
Paid on delivery
The project requires programming a method to redefine the 'toUnicode' mapping in a large number of PDF documents written in Hindi. The current mapping is wrong and makes it impossible to extract the text. The mapping needs to ensure that the glyphs visible in the PDF are accurately captured when the text is extracted. The method should be easily replicable and scalable across thousands of PDFs. Try and copy-paste out of the sample attached to see the problem and what needs to be fixed.
I have some idea of how to approach this using fontforge, itext and xpdf but I dont have the programming skills to complete the task - I'm happy to work with you to find a solution.
Project ID: #9457545
About the project
1 freelancer is bidding on average ₹55555 for this job
you wanted the pdfs to convert to doc or a software to convert? could you please tell me. i have your sample in trf, txt and docx format now