• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 312
  • Last Modified:

optimizing text file read in java

hi all--

Using JProfiler, I've figured out a major bottleneck in a program i've written is how it reads files. i've tried 2 different versions of my read code without much difference in the results (see code below.)

 I'm wondering if the resources used to read each file are never being released, thus causing bad memory leaks. Perhaps i'm missing the trick to 'finalizing' a file read? can anyone advise me on how to optimize these file reads....

here are the 2 verisons of my file read
(note: there's some other functionality included here, namely, i need to create a list of 'dogIDs' from the files)

1 (pretty standard)---

                   BufferedReader in = new BufferedReader(_fileReader);
                   StringBuffer sbAllOfLog = new StringBuffer("");
                   String line;
                   String sNowDogID;
                   int intDogIDCnt= 0;
                                     
                   while ((line  = in.readLine()) != null) {
                         //basically, i'm trying to kill 2 birds with one stone here. while i have each line of the text file handy, i check it for a dogID--
                        if ( line.indexOf("DogID:") != -1){
                                int stPos = line.indexOf("DogID:") + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }
                        sbAllOfLog.append(line + '\n');
                    }
                   in.close();
                   _fileReader.close();


version 2---

                   File file = new File(sNowFile);
                   InputStream stream = new BufferedInputStream(new FileInputStream(file));
                   byte[] data = new byte[(int) file.length()];
                   stream.read(data);
                   sAllOfLog= new String(data);
                   
                 
                   boolean allDogsFnd = false;
                   int stPos = 0;
                   int fndPos;
                   String sNowDogID;
                   while (!allDogsFnd ){
                       fndPos = sAllOfLog.indexOf("DogID:",stPos);
                       if (fndPos > -1){
                           sNowDogID= sAllOfLog.substring(fndPos+6,sAllOfLog.indexOf("\n",fndPos)).trim();
                           lstDogIDs.add(sNowDogID);
                       } else {
                           allDogsFnd = true;
                       }
                       stPos = fndPos + 6;
                   }
0
jacobbdrew
Asked:
jacobbdrew
  • 5
  • 4
  • 3
1 Solution
 
suprapto45Commented:
Well,

You can try to use Regex to do it. I think that RegEx is fully optimized for this thing.

David
0
 
jacobbdrewAuthor Commented:
okay. I'll give it a whirl. can you point me any docs/tutorials with info on what you're taling about?
0
 
v_karthikCommented:
How big is ur file? Is this memory hog staying for a while or does it peak during a certain period?

Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings. I see that you require a very simple string operation, so using the string class looks alright to me.

One quick optimization:

int ind = -1;
if ( (ind = line.indexOf("DogID:")) != -1){

                                int stPos = ind + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }

Basically, I've assigned your first indexOf call to a variable to reuse it later if required.

Is  lstDogIDs a vector?  One thing about using a Vector - a vector is thread-safe. Which means, it has in-built code to synchronize the access to its elements. Synchronization slows down your code drastically. If you are not worried about multiple threads accessing your code, using ArrayList should give you better performance.

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
suprapto45Commented:
I think that v_karthik has the points.

>>"I've figured out a major bottleneck in a program i've written is how it reads files"
As you said that the bottleneck is on how it reads file and not in the parsing of string

>>"Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings."
You are right v_karthik.

Yes, how big is your file?

David
0
 
suprapto45Commented:
http://www.precisejava.com/javaperf/j2se/IO.htm

If your file is really big, try to use buffering. Honestly, I have not really tried the above URL ;)

David
0
 
v_karthikCommented:
>>If your file is really big, try to use buffering.

The link suggests the use of BufferedXXX classes. I see that the author is using BufferedReader, so hes on the right track. To my knowledge, BufferedReader is better than BufferedInputStream if you are sure your data is character data and not a mix and match (like int, string etc.)  
0
 
jacobbdrewAuthor Commented:
the files are btween around 50k and 200k, but there are lots of them. up to 2000. the app runs a number of threads which parse the logs, but after a while i simply run out of memory. (in fact, i can't parse much more than a few hundred files in one go)

at this point, i'm testing out ways to limit the number of files each instance parses, letting the app 'finish' so that all the resources are released, then re-starting until all the files are parsed.

if there's nothing i'm missing with respect to 'closing'/'finalizing' how i read the text file, then maybe my memory leaks are coming from somewhere else.

i guess i'm thinking that a perfectly coded app woulf be able to keep reading files forever so long as the data in memory is released when no longer in use. is this a bogus assumption?

hmm. maybe i'll try reading all the files without actually parsing them and seeing how many i can get through.
0
 
v_karthikCommented:
The size is not much, apparently. You should probably try a thread pool model. You can google for a standard threadpool implementation in java and use that. Otherwise write your own. A thread pool is like this - a thread manager has control over, say n, threads.  As "jobs" come in, the manager picks one of the free threads and assigns the job to it. The thread finishes its job and returns to the free pool. This way the no. of threads spawned doesn't get out of control.
0
 
jacobbdrewAuthor Commented:
yeah, i'm running a thread pool, but things are still spinning out of contorl. the 'fix' i've implement (more a hack) is to only run the app on a limited number of files at a time, let it close, then restart it. not the most elegant solution but time's awastin'. thanks for your help. more points for ya.

but just to check one more time, based on the code above (version 1), there's nothing more i need to do to 'finalize' how i'm reading the files?
0
 
v_karthikCommented:
It looks ok to me ... just add my indexOf fix to see if it helps... just to get it right, that string buffer u have has the COMPLETE data contained in all ur 2000 files? or is it like each stringbuffer (and each vector) is for a single file?

and, an aside .. its not that useful now, but its better u know -

sbAllOfLog.append(line + '\n');

is faster if used this way -

sbAllOfLog.append(line).append("\n");

0
 
jacobbdrewAuthor Commented:
cool. no, the string buffer does not contain all the data from all 2000 files. this read is within a spawned thread, and there's a thread for each file (limited to 25 by the pool)

i will add the indexOf fix.

0
 
v_karthikCommented:
ok .. will post again if i have a brainwave later.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 5
  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now