CSV Parse And Ingest to MySQL - Billion rows use case.
Use case: I have a table storing file information such as path, size etc. I need to read this table and grab the file path. Read those CSV files in parallel and ingest in MySQL in parallel.
I somehow designed this.
- Cron to read the file information table in every 30 minutes.
- I read those files in parallel and start a stream to read those files.
- Push each stream messages i.e file content in the Message queue to say RabbitMQ.
- Attach multiple listeners say 4 at the other end of the queue and fetch 100 messages at once i.e 400 messages at once.
- Perform the parallel MySQL query insertion and update the tables accordingly.
I need your suggestion and inputs to correct me if I am doing it wrong!
Thanks in advance.