I registered at freelancer specifically in response to this posting. I"m an entrepeneur working in the last stages of a self-funded fintech project, needing to generate income to push through the last few months of development.
I've been writing empirical algorithms for data collection, feature extraction, and analysis since 2010. My first major project was a smart OCR utility for screen-scraping online gaming clients, parsing text and translating to a database in real-time. Currently I'm working on a end-to-end learner for futures market portfolio optimization by swarming recurrent neural networks.
My everyday work revolves around deep-learning and statistical optimization. I prefer Keras, an ML library wrapping Tensorflow and Theano, in most cases. Its an approachable, well-documented and well-maintained library with lots of power; and it's framework can be easily extended when circumstances require non-standard approaches.
Your pilot project needs this: convolutional neural nets for document classification, sequence-to-sequence (bidirectional) recurrent neural nets for parsing text files, Bayesian hyper-parameter search with Mint for best general settings with heterogeneous data . Your obstacle is document standardization, workflow, and preprocessing. SciKit Learn provides easily open-source facilities for this purpose.
I'd be happy to discuss further and provide code samples with tests upon request.
Best of luck on this project,
Josh W.