Solved

How to efficiently manage the processing of thousands of db records in as little time as possible?

Posted on 2012-04-07
12
580 Views
Last Modified: 2014-11-12
Currently I have thousands of records in a mysql database. Each record takes a few minutes to process but even 3 minutes multiplied by 10,000 records could take almost a month to process (i.e. 3 minutes x 10,000 records = 30,000 minutes / 60 minutes = 500 hours / 24 hours = ~21 days) and more records are coming.

In an Amazon EC2 LAMP environment, how can I accomplish this task at a much faster rate?

I could throw more instances at it but that seems inefficient and would drive up cost. I could have more scripts (or browser tabs?) processing the database but with a single database as the source I'm sure to overlap processes? Even if I split the data across databases, I'd still eventually run in to overlap the processes on that database? So maybe I need software to queue up database for each script? Anyway, what do you experts think?
0
Comment
Question by:aristanoble
  • 3
  • 2
  • 2
  • +3
12 Comments
 
LVL 17

Expert Comment

by:ramrom
ID: 37820183
Please describe the processing. Perhaps there is a way to make if more efficient.
0
 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 37820202
I've never had a process on a row take even a second.  What kind of processing are you doing?
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37820206
Well the length of the process is because of the API call related to that record. The call is for a large amount of date that when returned gets processed also and then on to the next record.
0
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 37820261
Details are important in analyzing a problem.  But you're not giving us any so i don't know how we can help you.  We're usually pretty good with code but terrible at guessing.
0
 
LVL 7

Expert Comment

by:designatedinitializer
ID: 37820280
You have to analyse the process.
You have to break it down in steps and assess the duration of each step, and its dependencies.
Only then you will be able to know what you can do about it.
There will be things you can speed up, others you can do simultaneously, and others you can't.

If a woman takes 9 months to give birth to 1 child, that doesn't mean 2 women can do it in 4.5 months...
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37820281
Sorry I'm just holding back from being too talkative but here's how it works: each user currently reside in the users table.  As each record is selected, the user id is used in the API call and returns hundreds of records. Each record is then saved in a separate table and then on to the next record.
0
 
LVL 7

Expert Comment

by:designatedinitializer
ID: 37820287
API call to what?
How much time does it take, that API call?
How is the data from the API call formated, and how are you parsing it?
0
 
LVL 1

Author Comment

by:aristanoble
ID: 37820445
I got a chance to work on all of your suggestions and I was reminded of some things I knew all along; as goes the case at times. I was inserting each record rather than inserting batches of records. Things are speeding up now but I need more.

To answer your questions, I'm pulling date from the Facebook API; by iteration we obtain about 5,000 records in a php array per user and I "was" iterating over the data inserting each record but now I iterate over the data concatenating it into a variable and then I'd insert the data in chunks.
0
 
LVL 33

Accepted Solution

by:
shalomc earned 300 total points
ID: 37820737
To truly scale your operation and to be limited by budget (and not by technology) you should change your computing method to that of parallel/grid computing.

I see that you are already familiar with AWS, take a look at EMR, which is Amazon's implementation of Hadoop.

Look here http://aws.amazon.com/elasticmapreduce/  and return with questions :)
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 200 total points
ID: 37821419
It would sure help us help you if you could post the CREATE TABLE statements for all of the DB tables in question, post the code that interacts with the API and the data base, and post some of the data that your scripts fetch from the API.  Armed with some specific information we may be able to provide some immediately useful suggestions.

One suggestion may be to make the fetch-and-update into an asynchronous process.  I have used this technique when I need to provide rapid response, but also need to get information from an API.  To do this, your script will need to make a POST method request (using CURL or FSockOpen) to the asynchronous script.  You can start the process and disconnect without waiting for it to complete.  If the async tasks takes three minutes, who cares?  You can just start up a lot of them and let them run in the background.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 38186533
Going forward, please consider the EE grading guidelines.  If you're not going to interact with the experts who are trying to help you, please don't give a marked-down grade.  We need some dialog in order to be able to understand what you're dealing with -- this back-and-forth dialog requires interaction between the participants.  

Thanks, ~Ray

http://www.experts-exchange.com/help/viewHelpPage.jsp?helpPageID=26
0

Featured Post

ScreenConnect 6.0 Free Trial

Check out the updates in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI that improves session organization and overall user experience. See the enhancements for yourself!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Or at least that’s the word according to a new blog from Tech Target on AWS’s new Managed Services (MS) offering. According to the blog, AWS is launching their AWS MS program to expedite the adoption of cloud by Fortune 1000 and Global 2000 companie…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now