Processing 15 GB text file using GCP Dataproc with Spark (fixed width files)

Closed Posted 1 year ago Paid on delivery
Closed Paid on delivery

I have a pyspark code which is working for small files to process fixed width files on GCP dataproc cluster, but when I'm reading 15GB of compressed gzip text file, it is taking time to either save/load in BigQuery table and unable to fix this issue. Need someone to identify the root cause of this and resolved this issue

Google Cloud Platform PySpark

Project ID: #35815330

About the project

1 proposal Remote project Active 1 year ago

1 freelancer is bidding on average ₹3000 for this job

sriharivaila2000

I have posted the solution below but please let me know if you want me to solve your problem There could be several reasons why your PySpark code is taking a long time to process a 15GB gzip file on a DataProc cluster More

₹3000 INR in 3 days
(0 Reviews)
0.0