?
Solved

optimizing text file read in java

Posted on 2006-06-14
12
Medium Priority
?
309 Views
Last Modified: 2012-08-13
hi all--

Using JProfiler, I've figured out a major bottleneck in a program i've written is how it reads files. i've tried 2 different versions of my read code without much difference in the results (see code below.)

 I'm wondering if the resources used to read each file are never being released, thus causing bad memory leaks. Perhaps i'm missing the trick to 'finalizing' a file read? can anyone advise me on how to optimize these file reads....

here are the 2 verisons of my file read
(note: there's some other functionality included here, namely, i need to create a list of 'dogIDs' from the files)

1 (pretty standard)---

                   BufferedReader in = new BufferedReader(_fileReader);
                   StringBuffer sbAllOfLog = new StringBuffer("");
                   String line;
                   String sNowDogID;
                   int intDogIDCnt= 0;
                                     
                   while ((line  = in.readLine()) != null) {
                         //basically, i'm trying to kill 2 birds with one stone here. while i have each line of the text file handy, i check it for a dogID--
                        if ( line.indexOf("DogID:") != -1){
                                int stPos = line.indexOf("DogID:") + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }
                        sbAllOfLog.append(line + '\n');
                    }
                   in.close();
                   _fileReader.close();


version 2---

                   File file = new File(sNowFile);
                   InputStream stream = new BufferedInputStream(new FileInputStream(file));
                   byte[] data = new byte[(int) file.length()];
                   stream.read(data);
                   sAllOfLog= new String(data);
                   
                 
                   boolean allDogsFnd = false;
                   int stPos = 0;
                   int fndPos;
                   String sNowDogID;
                   while (!allDogsFnd ){
                       fndPos = sAllOfLog.indexOf("DogID:",stPos);
                       if (fndPos > -1){
                           sNowDogID= sAllOfLog.substring(fndPos+6,sAllOfLog.indexOf("\n",fndPos)).trim();
                           lstDogIDs.add(sNowDogID);
                       } else {
                           allDogsFnd = true;
                       }
                       stPos = fndPos + 6;
                   }
0
Comment
Question by:jacobbdrew
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
12 Comments
 
LVL 16

Expert Comment

by:suprapto45
ID: 16908277
Well,

You can try to use Regex to do it. I think that RegEx is fully optimized for this thing.

David
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16908586
okay. I'll give it a whirl. can you point me any docs/tutorials with info on what you're taling about?
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16908682
How big is ur file? Is this memory hog staying for a while or does it peak during a certain period?

Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings. I see that you require a very simple string operation, so using the string class looks alright to me.

One quick optimization:

int ind = -1;
if ( (ind = line.indexOf("DogID:")) != -1){

                                int stPos = ind + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }

Basically, I've assigned your first indexOf call to a variable to reuse it later if required.

Is  lstDogIDs a vector?  One thing about using a Vector - a vector is thread-safe. Which means, it has in-built code to synchronize the access to its elements. Synchronization slows down your code drastically. If you are not worried about multiple threads accessing your code, using ArrayList should give you better performance.

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 16

Expert Comment

by:suprapto45
ID: 16908704
I think that v_karthik has the points.

>>"I've figured out a major bottleneck in a program i've written is how it reads files"
As you said that the bottleneck is on how it reads file and not in the parsing of string

>>"Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings."
You are right v_karthik.

Yes, how big is your file?

David
0
 
LVL 16

Expert Comment

by:suprapto45
ID: 16908752
http://www.precisejava.com/javaperf/j2se/IO.htm

If your file is really big, try to use buffering. Honestly, I have not really tried the above URL ;)

David
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16908778
>>If your file is really big, try to use buffering.

The link suggests the use of BufferedXXX classes. I see that the author is using BufferedReader, so hes on the right track. To my knowledge, BufferedReader is better than BufferedInputStream if you are sure your data is character data and not a mix and match (like int, string etc.)  
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16909055
the files are btween around 50k and 200k, but there are lots of them. up to 2000. the app runs a number of threads which parse the logs, but after a while i simply run out of memory. (in fact, i can't parse much more than a few hundred files in one go)

at this point, i'm testing out ways to limit the number of files each instance parses, letting the app 'finish' so that all the resources are released, then re-starting until all the files are parsed.

if there's nothing i'm missing with respect to 'closing'/'finalizing' how i read the text file, then maybe my memory leaks are coming from somewhere else.

i guess i'm thinking that a perfectly coded app woulf be able to keep reading files forever so long as the data in memory is released when no longer in use. is this a bogus assumption?

hmm. maybe i'll try reading all the files without actually parsing them and seeing how many i can get through.
0
 
LVL 4

Accepted Solution

by:
v_karthik earned 2000 total points
ID: 16909069
The size is not much, apparently. You should probably try a thread pool model. You can google for a standard threadpool implementation in java and use that. Otherwise write your own. A thread pool is like this - a thread manager has control over, say n, threads.  As "jobs" come in, the manager picks one of the free threads and assigns the job to it. The thread finishes its job and returns to the free pool. This way the no. of threads spawned doesn't get out of control.
0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16913123
yeah, i'm running a thread pool, but things are still spinning out of contorl. the 'fix' i've implement (more a hack) is to only run the app on a limited number of files at a time, let it close, then restart it. not the most elegant solution but time's awastin'. thanks for your help. more points for ya.

but just to check one more time, based on the code above (version 1), there's nothing more i need to do to 'finalize' how i'm reading the files?
0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16913264
It looks ok to me ... just add my indexOf fix to see if it helps... just to get it right, that string buffer u have has the COMPLETE data contained in all ur 2000 files? or is it like each stringbuffer (and each vector) is for a single file?

and, an aside .. its not that useful now, but its better u know -

sbAllOfLog.append(line + '\n');

is faster if used this way -

sbAllOfLog.append(line).append("\n");

0
 
LVL 1

Author Comment

by:jacobbdrew
ID: 16913680
cool. no, the string buffer does not contain all the data from all 2000 files. this read is within a spawned thread, and there's a thread for each file (limited to 25 by the pool)

i will add the indexOf fix.

0
 
LVL 4

Expert Comment

by:v_karthik
ID: 16913733
ok .. will post again if i have a brainwave later.
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is about my first experience with programming Arduino.
This article will show how Aten was able to supply easy management and control for Artear's video walls and wide range display configurations of their newsroom.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Introduction to Processes
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question