Solved

optimizing text file read in java

Posted on 2006-06-14
12
306 Views
Last Modified: 2012-08-13
hi all--

Using JProfiler, I've figured out a major bottleneck in a program i've written is how it reads files. i've tried 2 different versions of my read code without much difference in the results (see code below.)

 I'm wondering if the resources used to read each file are never being released, thus causing bad memory leaks. Perhaps i'm missing the trick to 'finalizing' a file read? can anyone advise me on how to optimize these file reads....

here are the 2 verisons of my file read
(note: there's some other functionality included here, namely, i need to create a list of 'dogIDs' from the files)

1 (pretty standard)---

                   BufferedReader in = new BufferedReader(_fileReader);
                   StringBuffer sbAllOfLog = new StringBuffer("");
                   String line;
                   String sNowDogID;
                   int intDogIDCnt= 0;
                                     
                   while ((line  = in.readLine()) != null) {
                         //basically, i'm trying to kill 2 birds with one stone here. while i have each line of the text file handy, i check it for a dogID--
                        if ( line.indexOf("DogID:") != -1){
                                int stPos = line.indexOf("DogID:") + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }
                        sbAllOfLog.append(line + '\n');
                    }
                   in.close();
                   _fileReader.close();


version 2---

                   File file = new File(sNowFile);
                   InputStream stream = new BufferedInputStream(new FileInputStream(file));
                   byte[] data = new byte[(int) file.length()];
                   stream.read(data);
                   sAllOfLog= new String(data);
                   
                 
                   boolean allDogsFnd = false;
                   int stPos = 0;
                   int fndPos;
                   String sNowDogID;
                   while (!allDogsFnd ){
                       fndPos = sAllOfLog.indexOf("DogID:",stPos);
                       if (fndPos > -1){
                           sNowDogID= sAllOfLog.substring(fndPos+6,sAllOfLog.indexOf("\n",fndPos)).trim();
                           lstDogIDs.add(sNowDogID);
                       } else {
                           allDogsFnd = true;
                       }
                       stPos = fndPos + 6;
                   }
0
Comment
Question by:jacobbdrew
  • 5
  • 4
  • 3
12 Comments
 
LVL 16

Expert Comment

by:suprapto45
ID: 16908277
Well,

You can try to use Regex to do it. I think that RegEx is fully optimized for this thing.

David
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16908586
okay. I'll give it a whirl. can you point me any docs/tutorials with info on what you're taling about?
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16908682
How big is ur file? Is this memory hog staying for a while or does it peak during a certain period?

Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings. I see that you require a very simple string operation, so using the string class looks alright to me.

One quick optimization:

int ind = -1;
if ( (ind = line.indexOf("DogID:")) != -1){

                                int stPos = ind + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }

Basically, I've assigned your first indexOf call to a variable to reuse it later if required.

Is  lstDogIDs a vector?  One thing about using a Vector - a vector is thread-safe. Which means, it has in-built code to synchronize the access to its elements. Synchronization slows down your code drastically. If you are not worried about multiple threads accessing your code, using ArrayList should give you better performance.

0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 
LVL 16

Expert Comment

by:suprapto45
ID: 16908704
I think that v_karthik has the points.

>>"I've figured out a major bottleneck in a program i've written is how it reads files"
As you said that the bottleneck is on how it reads file and not in the parsing of string

>>"Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings."
You are right v_karthik.

Yes, how big is your file?

David
0
 
LVL 16

Expert Comment

by:suprapto45
ID: 16908752
http://www.precisejava.com/javaperf/j2se/IO.htm

If your file is really big, try to use buffering. Honestly, I have not really tried the above URL ;)

David
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16908778
>>If your file is really big, try to use buffering.

The link suggests the use of BufferedXXX classes. I see that the author is using BufferedReader, so hes on the right track. To my knowledge, BufferedReader is better than BufferedInputStream if you are sure your data is character data and not a mix and match (like int, string etc.)  
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16909055
the files are btween around 50k and 200k, but there are lots of them. up to 2000. the app runs a number of threads which parse the logs, but after a while i simply run out of memory. (in fact, i can't parse much more than a few hundred files in one go)

at this point, i'm testing out ways to limit the number of files each instance parses, letting the app 'finish' so that all the resources are released, then re-starting until all the files are parsed.

if there's nothing i'm missing with respect to 'closing'/'finalizing' how i read the text file, then maybe my memory leaks are coming from somewhere else.

i guess i'm thinking that a perfectly coded app woulf be able to keep reading files forever so long as the data in memory is released when no longer in use. is this a bogus assumption?

hmm. maybe i'll try reading all the files without actually parsing them and seeing how many i can get through.
0
 
LVL 4

Accepted Solution

by:
v_karthik earned 500 total points
ID: 16909069
The size is not much, apparently. You should probably try a thread pool model. You can google for a standard threadpool implementation in java and use that. Otherwise write your own. A thread pool is like this - a thread manager has control over, say n, threads.  As "jobs" come in, the manager picks one of the free threads and assigns the job to it. The thread finishes its job and returns to the free pool. This way the no. of threads spawned doesn't get out of control.
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16913123
yeah, i'm running a thread pool, but things are still spinning out of contorl. the 'fix' i've implement (more a hack) is to only run the app on a limited number of files at a time, let it close, then restart it. not the most elegant solution but time's awastin'. thanks for your help. more points for ya.

but just to check one more time, based on the code above (version 1), there's nothing more i need to do to 'finalize' how i'm reading the files?
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16913264
It looks ok to me ... just add my indexOf fix to see if it helps... just to get it right, that string buffer u have has the COMPLETE data contained in all ur 2000 files? or is it like each stringbuffer (and each vector) is for a single file?

and, an aside .. its not that useful now, but its better u know -

sbAllOfLog.append(line + '\n');

is faster if used this way -

sbAllOfLog.append(line).append("\n");

0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16913680
cool. no, the string buffer does not contain all the data from all 2000 files. this read is within a spawned thread, and there's a thread for each file (limited to 25 by the pool)

i will add the indexOf fix.

0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16913733
ok .. will post again if i have a brainwave later.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question