How to efficiently manage the processing of thousands of db records in as little time as possible?
Posted on 2012-04-07
Currently I have thousands of records in a mysql database. Each record takes a few minutes to process but even 3 minutes multiplied by 10,000 records could take almost a month to process (i.e. 3 minutes x 10,000 records = 30,000 minutes / 60 minutes = 500 hours / 24 hours = ~21 days) and more records are coming.
In an Amazon EC2 LAMP environment, how can I accomplish this task at a much faster rate?
I could throw more instances at it but that seems inefficient and would drive up cost. I could have more scripts (or browser tabs?) processing the database but with a single database as the source I'm sure to overlap processes? Even if I split the data across databases, I'd still eventually run in to overlap the processes on that database? So maybe I need software to queue up database for each script? Anyway, what do you experts think?