Need to tune ElasticSearch Cluster Performance.
€250-750 EUR
Paid on delivery
ELASTIC SEARCH INTEGRATION
IN SHORT
We expanded our big data pipeline with a hot storage layer buillt in top of ElasticSearch. We aimed to query data and have fast, very fast response time, and make fast analytical decisions but we have a bad performance. Data Indexing is very slow and Data query have long response time. We need an ELK expert who can help us fix that.
DATA AND FORMAT
Our Data (mainly text), currently stored in parquet format (in S3) and raw (TXT, CSV, XLSX etc.) format is around 10 TB and will grow exponentially.
CURRENT ARCHITECTURE
In our current architecture, we have
• A Spark cluster of 10 nodes (16 CPU, 64 GB RAM, 256GB Disk) to process raw files
• A s3 storage to store processed data in parquet format
• A PostgreSQL Database to store sessions, history and some meta-data.
• A web app built with Play framework (Scala) from which all requests (Spark jobs included) are triggered.
• (Non Optimized) Elastic search cluster of 5 nodes (16 CPU, 64 GB RAM, 700GB Disk). Indexation of ~170GB of data (~900 millions of rows) takes 5 hours.
OUR APPROACH AND PROBLEMS
1. After data transfrmation, we save resuting data in S3 (in parquet format).
2. Then we read these parquet files with spark in a dataframe.
3. Then we save this dataframe to ElasticSearch index (we tried many sharding and replication configuration mix without gaining in perfrmance)
4. We query/search data from ES and feed Kibana/Graphana or display it in any required format by business needs.
While the first two steps are relatively fast (~1hour for 1billion rows), the third step takes around 5 hours for a 170GB file.
And Data query has awful response time
OUR REQUIREMENTS
• Set up a very cost-effective and efficient ELK(ElasticSearch-Logstash,-Kibana) cluster (or Optimize our existing one)
• Provide (Code) an indexer that can process migration of existing data from s3 to Elastic Search
• Fast Indexing of documents Elastic Search
• Very Fast Queries and data retrieval. This is very important for our business needs. 1-3 seconds is acceptable response time
• Improve Spark Cluster Communication with ES cluster. Any bottleneck in communication between Spark and ES Cluster should be detect and fixed
PROFILE NEEDED
You need to:
• Have a strong experience with Elastic Search (ELK) in Big Data processing environment .
• Be comfortable with play framework (a least scala)
• Have good experience working with Spark
IMPORTANT CONSIDERATIONS.
• Data is about 10 TB and is quickly growing.
• Spark jobs are triggered from a web App built with play framework (SCALA)
• Need the project to be done in a reasonably short time (no more that one week).
• You need to connect to our Internal network in order to work. You will need to have a very good internet bandwidth and TeamViewer Application installed.
Project ID: #35094952
About the project
12 freelancers are bidding on average €600 for this job
Hello I'm an elasticsearch developer with 6 years of experience. I have development experience in lucene as well. I'd encourage you to check out my profile reviews section to know my elasticsearch project reviews
Hey Good evening , Just finished reading the brief details and currently going through attached files . I see you have been looking for someone who has experience with these tech stacks Spark, Elasticsearch, Scala, Ki More
✔✔✔✔ Nice to see your posting ✔✔✔✔ Hi, Rafik G.. I read your job posting and feel I can help you successfully complete your project now. I am good at Big Data, Elasticsearch, Scala, Kibana and Spark and I have complete More
Greetings Dear Client, I welcome you to my profile, where quality and client satisfaction is my top priority with 100% guarantee. I am Expert Boniface, CERTIFIED & VERIFIED freelancer. I'M AN EXPERT IN LISTED PROJECT More
Hi there, thanks for your job posting and hope you are doing well As a senior software engineer, I can help you with this kind of task related to Elastic Stack. Especially, I offer you my 8 years of experience in the D More
Hello, I hope this finds you well. I have just seen your project requiring; Scala Elasticsearch Spark Kibana Big Data I believe that my 10-year experience in this field is what you need right away. Avoid the headache More