Implementing the procedures for creating and maintaining an index of scientific publications - 2
€750-1500 EUR
Paid on delivery
Employer:
Institute of Neuroinformatics
University of Zurich & ETH Zurich
Project Objective:
Creating an ElasticSearch index of scientific publications (metadata and full-text) by aggregating content from various data sources (i.e. scientific publication databases). Keeping the ElasticSearch index updated as new content is added to the data sources. Implementation of a flexible workflow to integrate additional future data sources. Being able to handle changes to the APIs of data sources.
Duration: ~1 month
Technologies to use:
Node.js, Python, Elastic Stack, Docker. Open for other suggestions
Data sources (example list):
1) Crossref ([login to view URL]): contains the metadata of all publications having a digital object identifier (DOI). Content can be downloaded by querying the database through a REST API in a rate-limited fashion.
2) MEDLINE/Pubmed ([login to view URL]): contains metadata and abstracts of most publications related to life sciences. Also contains publications not having DOIs. Content can be bulk downloaded.
3) CORE ([login to view URL]): An aggregate database of most open access publications, including full text of some. If full-text is not available (e.g. papers from arXiv), a link is provided to the original source, which should be crawled to fetch the full text content. Contains data from large databases such as arXiv, CiteSeerX. Database can be bulk downloaded.
The workflow should be flexible to include additional data sources as they become available.
Project Parts/Tasks:
Different tasks should be handled by individual Docker microservices
1) Downloading and parsing the entire content of listed data sources and indexing this in individual ElasticSearch indices. The implementation for parsing the data sources needs to be template based, i.e. same functions can be used with a different template for a different data source.
2) Extracting content of PDF files (in an unstructured-format) if data source only provides PDFs (e.g. CORE)
3) Aggregating downloaded content from data sources in an “meta” ElasticSearch index
4) Keeping meta index updated as new publications appear
5) Maintaining the meta index: handling duplicates, handling different versions of a publication (e.g. arXiv preprints vs their final publication in a journal), adding new fields to the index, etc.
We do not ask for the delivery of a database, but the tools to populate it. The source code of your implementation needs to be delivered. Please do not submit code with potential license issues. Third party software/libraries can be used if they are FOSS.
We do not ask for a GUI.
ElasticSearch index fields (not exhaustive):
Title, journal, page, publication date, authors, affiliations, abstract, full-text, references, figures, data source, data source ID, DOI
Project ID: #18920857
About the project
25 freelancers are bidding on average €1385 for this job
Hi there, I have checked the details I have great experience with Docker, Elasticsearch, node.js, Python. Please start the chat so we can discuss this job more in detail. Thanks
Hi, Dear Employer! I am really interested in your project. I have enough experience in Python, C/C++, C#, java programming. I am 100% sure I can satisfy your requirements perfectly. User-Friendly Interface And Cle More
https://www.freelancer.com/projects/software-architecture/Elastic-search-kibana-cuckoo-API https://www.freelancer.com/projects/javascript/Webdevelopment-Project-for-Shadab/ done the similar tasks using Python, Elasti More
Hello as a core developer i am having relevant skills and experience as you requested in your project description...i can share some demo as well in further chat. can we discuss more on this to get detail understandin More
Hello? How are you? I have seen the project - "Implementing the procedures for creating and maintaining an index of scientific publications - 2." I have been working in these fields((Docker, Elasticsearch, node.js, More
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.
Hi there, Your Job post has caught my attention and pleased to inform you that I can do this job of yours as I have excellent experience in mentioned technology. Thanks
Hey, there, Please if possible give me the list of features and also reference that would be great for me. Please come over the chat for the further detailed discussion. Thanks
hello,dear. I have read all your requirements for 'Implementing the procedures for creating and maintaining an index of scientific publications - 2' and I fully understood it. I am confident and I am sure that I am abl More
Hi,dear! I am quite interested in your project - 'Implementing the procedures for creating and maintaining an index of scientific publications - 2'. :) I am a skillful software developer who has rich experience in this More