Solved

java performance loading huge data

Posted on 2013-06-03
11
187 Views
Last Modified: 2013-06-11
hi guys

Currently in our application we have a remote database and our application database
The remote database has 800,000 rows and gets updated with more data every now and then.

When our java process runs we need to copy the data from the remote database
into our applicaiton database.
Since our java process runs 10 times a day we are seeing a performance issue.
Each time our java process runs we delete the data (800,000 + rows)  from our table and insert it again from the remote database to ensure we have current data.
This is affecting our performance since it takes 20 mins to insert 800,000 rows into our table.
We want to come up with a cache to avoid this problem

maybe have a java List or a Map which contains the 800,000 + objects
and whenever the remote database has some new data, we constantly listen to it and update our List/Map.

Any ideas?

thanks.
0
Comment
Question by:royjayd
  • 5
  • 3
  • 3
11 Comments
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
You should probably be looking at replication at the database level, not the application level
0
 

Author Comment

by:royjayd
Comment Utility
>>You should probably be looking at replication at the database level, not the application level

can you elaborate a little ?

thx.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
Difficult to do so, since you don't even say what db you're using. Most RDBMS have their own tools and utilities that can implement replication and syncronization across server instances
0
 

Author Comment

by:royjayd
Comment Utility
>>Most RDBMS have their own tools and utilities that can implement replication and syncronization across server instances

That is not the issue. how does my java application have up to date knowledge of the database table as and when it is updated?
Database is SQL server 2008.
0
 
LVL 26

Accepted Solution

by:
dpearson earned 500 total points
Comment Utility
If you want to do it in the Java application that should be possible.  You could use a Map.  First read the current local database and store an entry in the map for each row.  Let's assume your table has a bigint (long) as the primary key.

The map could be like this:
   Map<Long, String> m_Rows = new HashMap<Long, String>() ;

You'd read a row of data from the database, create a Hash (e.g. MD5) from all of the fields and then store this in the map, using the primary key as the index.  Something like:

   String hash = md5hash(name + "|" + address + "|" + phone) ; // Name, address, phone are fields in the db

   m_Rows.put(key, hash) ;

Now read each row in turn from the remote database.  As you read the row, create the same MD5 hash from the fields in the remote row.  Now look in the m_Rows map for the primary key.
   String existingHash = m_Rows.get(remoteKey) ;

  - If there is no entry in the map you should INSERT the row into your local database.
  - If there is an entry in the map and it has the same hash, you can ignore that row, it has not changed.
  - If there is an entry in the map and it has a different hash, then you should DELETE the existing row from the local database and then INSERT the new row.

You will end up only inserting rows that have been added or changed in the remote database - which will be a lot faster.  If you can keep the Java application loaded continuously (so it doesn't need to keep re-reading the local database each time it's making an update, it'll be faster still).

(If rows may also have been deleted in the remote db, then when you read a row from the remote database, you should check the local map and after doing the steps above, you should then remove the entry from the map.  This way at the end, any rows that remain in the map have been deleted from the remote database and so can then be deleted from the local database).

Hope that helps,

Doug
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
That is not the issue.
That very much is the issue, i'm afraid. Something like Doug's approach should work, but of course it's reinventing the techniques i was referring to at the application level.
0
 

Author Comment

by:royjayd
Comment Utility
>>but of course it's reinventing the techniques i was referring to at the application level
yes that is what i need :)
0
 

Author Comment

by:royjayd
Comment Utility
unfortunately i am running into performance issues with dpearson's logic
here is what i have

public List<MYVo> readSourceDatabase(){
		
		List<MYVo> paramList = jdbcTemplate.query(sql, rowMapper); //this returns 800,000 rows pretty fast < one minute
		Map<String, String> sourceMap = new HashMap<>();		
		for(MYVo vo : paramList ){
		String key = getKey(vo); //get the key since there is no primary key in table					 
		key = key.trim();
		String hash = DigestUtils.md5Hex(key);
		sourceMap.put(key,hash); //taking a long time iterating through 800,000 rows and ends with out of memory, my xmx is 1000 mb
			
		}
		return paramList;
	}
	
		private String getKey(MYVo vo){
		String key = vo.StaffID()+"|"+vo.getFirstName()+"|"+vo.getLastName()
		+"|"+vo.getArea()+"|"+vo.getDescription()+"|"+vo.getProduct()
		+"|"+vo.getProductMapSource()+"|"+vo.getProductAttributes()+"|"+vo.getMap()
		+"|"+vo.getComment()+"|"+vo.getCode()+"|"+vo.getSId()+"|"+vo.getCurrency()
		+"|"+vo.getUser()+"|"+vo.gettimestamp()+"|"+vo.getCreatedtimestamp()
		+"|"+vo.getpLastUpdatedUser()+"|"+vo.getPtimestamp()+"|"+vo.getPCreatedtimestamp();
		key = key.trim();
		return key;
	}

Open in new window


thx.
0
 
LVL 26

Assisted Solution

by:dpearson
dpearson earned 500 total points
Comment Utility
a) It sounds like you need to increase the memory for the Java process which you do from the command line:

E.g. java -jar my.jar -Xmx1500m

will set the Java heap to 1.5 GB

b) If there's no actual primary key in the database, you should choose another value as the key if possible.  E.g. I see
vo.StaffID() and vo.getSId()

in the list of fields.  If either of these is a unique value then you should use that as the key in the map and the whole process will run a lot faster.  Also you need this value to not change when the remote database changes.  What you have now won't do that since it's using everything in the key (so if I modified the name of "Sammy" to "Sam" for a record, the key won't match between the local db and the remote db).

c) Also the lack of a primary key in the database table is a concern.  That means when you do things like DELETE a specific row, each SQL "DELETE" will involve a full table scan of your database - since there's no key.  You may find that performance is poor as a result.

Anyway, you can explore that once you have the tool working.

Doug
0
 

Author Comment

by:royjayd
Comment Utility
>> It sounds like you need to increase the memory for the Java process
yeah already did that
 -Xmx1200m is the max i can go on my machine

>>If either of these is a unique value then you should use that as the key in the map and the whole process will run a lot faster

the problem is they are not unique. spoke to my dba, he says hard to assign a unique primary key due to the type of data (key will be a combination of all the columns in the table)

can we somehow do

for(MYVo vo : paramList ){
		//String key = getKey(vo); 	 
		 String hash = DigestUtils.md5Hex(vo); //use object here directly instead of key?
		sourceMap.put(key,hash); 			
		}

Open in new window

thanks
0
 
LVL 26

Expert Comment

by:dpearson
Comment Utility
the problem is they are not unique. spoke to my dba, he says hard to assign a unique primary key due to the type of data (key will be a combination of all the columns in the table)

Forget the software for a moment - if there's no unique key that you can establish then this idea of "only transferring the changes" cannot work.  Because there's no way to know what a "change" is.  It's like if I gave you a phone book for one year and then the next year and asked you what changed - but also told you that people can change their phone numbers AND their names.  Given that there is no way to know what changed.  It's just not a solvable problem.

So this means you really can't speed up the process - you must do a complete dump and rebuild.

The only other solution is the one Charles suggested - hook up database replication between the two databases - which means transferring the binary changes made to the remote database and then re-applying them to the local database.  Databases all support this so that you can do "on the fly" backups and replication.

But at the application level you're stuck.  Unless you can figure out some sort of unique key.

Doug
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

By the end of 1980s, object oriented programming using languages like C++, Simula69 and ObjectPascal gained momentum. It looked like programmers finally found the perfect language. C++ successfully combined the object oriented principles of Simula w…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now