For a publication library Win32 app, I am looking to extract data from its proprietary and undocumented file format.
* It is multilingual and precedes Unicode, most likely compressed.
* The individual documents are search indexed (indices in separate files).
* They are sorted into a hierarchy: Publication (file-level) > Chapter > Content.
* Some publications are magazines, their hierarchy is: Publication (file-level) > Year > Issue > Chapter > Content.
* The documents are interconnected with hyperlinks.
* Few publications contain images, most is formatted text.
Source:
I will provide you with the entire library viewer app including all of its publication files (1+GB).
Deliverables:
* The tool you develop to read and convert the files to the following format.
* I can work from a set of legible, interconnected HTML (and JPG) files with their TOC files, sorted into nested folders.
* All formatting, links and footnotes need to be retained.
* Indices should be on a separate file per publication, using HTML anchor tags < a id="uniqueID" > in the content files.
* I should be able to use the same tool on more files of the same specification.
* Delivering a command line tool for Win32, x64, Linux or macOS is fine.
Hello ,
I have bachelor's degree in computer science and i have been programming in C/C++ for both linux and windows
for more than 8 years.I have written various network applications some of which are down to the IP packet level.
I have written a linux based network packet analyzer without using except the standard C library.
and also written simple versions of network diagnostic tools ping and traceroute.
Additionally I have previous experience with python,php and intel assembly.
let me know if you're interested.