Java Design Question: Dealing with very large database table efficiently

Hello.

I need to implment a new application that reads and publishes the contents of 1 very large database table on start up. The application will be a standalone app, and when it starts, it must query a database, and publish 20 million rows to a downstream application. The application I am writing is not required to cache the rows, it will just take each database record, convert it to a new format, and then publish it to the network. It needs to get through the 20 million rows as quick as possible.

I would be particularly interested in any open source solutions that may exist.

Thanks in advance,

Ronan
RonanLAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

qasitouchCommented:
NO, you don't need to send 20 milion records at once.

Use the idea of pagination and fetch fewer records in one requst like 100, 50 or 10.


For quering, u can use sql-query and  fetch the only limited data from server.

sample query:

SELECT * FROM `student` WHERE mark BETWEEN 60 and 75
0
RonanLAuthor Commented:
Thanks for the reply. I think the idea of fetching a smaller number of records at a time went without saying. What I'm looking for is an elegant, preferably multi threaded, solution, that will help me get through the 20 million rows as quickly as possible. If I can't get something off the shelf, I'll code it myself.
0
rbrindlCommented:
You can use the jdbc fetch size feature for getting the data. This effectively bulks the data transfer from the database.
In your java code you then always read chunks of this bulksize in a separate SQL reader thread.
This creates simple objects from the ResultSet and gives back a List to the main thread, which distributes the list to worker threads that produce the output.
The SQL reader threads fetches the next bulk of data as soon as it gave the current one back to the main thread (it could read more, depending on the complexity of producing the output).
This way, you should never have to wait for the database very long (again this depends on the complexity of the output), at least you can fetch data concurrently to producing output, and you can control memory consumption with the prefetch (bulk) size.
Please tell me, if you need a short sample pseudo code.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

RonanLAuthor Commented:
Hey rbrindl.

What you describe sounds like it might be well suited to what we are trying to achieve. If you could provide anything more to help get me started that would be great. A link to a good example, or even some documentation I could read would do the trick!

Thanks,

Ronan
0
rbrindlCommented:
I dont know if there are examples for this use case. Basically all this is standard java and documented by Sun. If you give me some time, i can strip down a testcase i once wrote, which basically does what you need.
0
RonanLAuthor Commented:
I've done some digging around on the sun site, and I just can't see the api's you're referring to, and how they can be used in the manner you describe. I don't want you to go to too much effort, but if you can point me in the right direction, that would be great.

Thanks,

Ronan
0
rbrindlCommented:
I will get back to you this evening (Central European Time)
0
RonanLAuthor Commented:
Great! Thanks rbrindl
0
sarthy2008Commented:
Since you no need to cache the data and it is a stand alone application and you want to do it in the start-up of the application, following logic may help you to make quick solution.
Based on your scenario, you need to do the exception handling

1) Execute the SQL statement
2) While ResultSet has more records
   do
      2a) Read the record
      2b) Convert to the new format
     2c) Write the data to the network  
   end do

3) Close the resultset
4) close the statement
5) Close the connection
0
rbrindlCommented:
Sorry i did not get back, i was quite busy! but as i see, you accepted the answer, do you need further assistence?
0
RonanLAuthor Commented:
No problem rbrindl. I don't need any further assistance at the moment.

Thanks for you help.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.