Solved

optimizing text file read in java

Posted on 2006-06-14
12
303 Views
Last Modified: 2012-08-13
hi all--

Using JProfiler, I've figured out a major bottleneck in a program i've written is how it reads files. i've tried 2 different versions of my read code without much difference in the results (see code below.)

 I'm wondering if the resources used to read each file are never being released, thus causing bad memory leaks. Perhaps i'm missing the trick to 'finalizing' a file read? can anyone advise me on how to optimize these file reads....

here are the 2 verisons of my file read
(note: there's some other functionality included here, namely, i need to create a list of 'dogIDs' from the files)

1 (pretty standard)---

                   BufferedReader in = new BufferedReader(_fileReader);
                   StringBuffer sbAllOfLog = new StringBuffer("");
                   String line;
                   String sNowDogID;
                   int intDogIDCnt= 0;
                                     
                   while ((line  = in.readLine()) != null) {
                         //basically, i'm trying to kill 2 birds with one stone here. while i have each line of the text file handy, i check it for a dogID--
                        if ( line.indexOf("DogID:") != -1){
                                int stPos = line.indexOf("DogID:") + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }
                        sbAllOfLog.append(line + '\n');
                    }
                   in.close();
                   _fileReader.close();


version 2---

                   File file = new File(sNowFile);
                   InputStream stream = new BufferedInputStream(new FileInputStream(file));
                   byte[] data = new byte[(int) file.length()];
                   stream.read(data);
                   sAllOfLog= new String(data);
                   
                 
                   boolean allDogsFnd = false;
                   int stPos = 0;
                   int fndPos;
                   String sNowDogID;
                   while (!allDogsFnd ){
                       fndPos = sAllOfLog.indexOf("DogID:",stPos);
                       if (fndPos > -1){
                           sNowDogID= sAllOfLog.substring(fndPos+6,sAllOfLog.indexOf("\n",fndPos)).trim();
                           lstDogIDs.add(sNowDogID);
                       } else {
                           allDogsFnd = true;
                       }
                       stPos = fndPos + 6;
                   }
0
Comment
Question by:jacobbdrew
  • 5
  • 4
  • 3
12 Comments
 
LVL 16

Expert Comment

by:suprapto45
Comment Utility
Well,

You can try to use Regex to do it. I think that RegEx is fully optimized for this thing.

David
0
 
LVL 1

Author Comment

by:jacobbdrew
Comment Utility
okay. I'll give it a whirl. can you point me any docs/tutorials with info on what you're taling about?
0
 
LVL 4

Expert Comment

by:v_karthik
Comment Utility
How big is ur file? Is this memory hog staying for a while or does it peak during a certain period?

Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings. I see that you require a very simple string operation, so using the string class looks alright to me.

One quick optimization:

int ind = -1;
if ( (ind = line.indexOf("DogID:")) != -1){

                                int stPos = ind + 6;
                                String sNowDogID = line.substring(stPos);
                                lstDogIDs.add(sNowDogID);
                                intDogIDCnt++;
                        }

Basically, I've assigned your first indexOf call to a variable to reuse it later if required.

Is  lstDogIDs a vector?  One thing about using a Vector - a vector is thread-safe. Which means, it has in-built code to synchronize the access to its elements. Synchronization slows down your code drastically. If you are not worried about multiple threads accessing your code, using ArrayList should give you better performance.

0
 
LVL 16

Expert Comment

by:suprapto45
Comment Utility
I think that v_karthik has the points.

>>"I've figured out a major bottleneck in a program i've written is how it reads files"
As you said that the bottleneck is on how it reads file and not in the parsing of string

>>"Regular expression should give you more speed and flexibility in string parsing, but I'm not sure about its memory savings."
You are right v_karthik.

Yes, how big is your file?

David
0
 
LVL 16

Expert Comment

by:suprapto45
Comment Utility
http://www.precisejava.com/javaperf/j2se/IO.htm

If your file is really big, try to use buffering. Honestly, I have not really tried the above URL ;)

David
0
 
LVL 4

Expert Comment

by:v_karthik
Comment Utility
>>If your file is really big, try to use buffering.

The link suggests the use of BufferedXXX classes. I see that the author is using BufferedReader, so hes on the right track. To my knowledge, BufferedReader is better than BufferedInputStream if you are sure your data is character data and not a mix and match (like int, string etc.)  
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 1

Author Comment

by:jacobbdrew
Comment Utility
the files are btween around 50k and 200k, but there are lots of them. up to 2000. the app runs a number of threads which parse the logs, but after a while i simply run out of memory. (in fact, i can't parse much more than a few hundred files in one go)

at this point, i'm testing out ways to limit the number of files each instance parses, letting the app 'finish' so that all the resources are released, then re-starting until all the files are parsed.

if there's nothing i'm missing with respect to 'closing'/'finalizing' how i read the text file, then maybe my memory leaks are coming from somewhere else.

i guess i'm thinking that a perfectly coded app woulf be able to keep reading files forever so long as the data in memory is released when no longer in use. is this a bogus assumption?

hmm. maybe i'll try reading all the files without actually parsing them and seeing how many i can get through.
0
 
LVL 4

Accepted Solution

by:
v_karthik earned 500 total points
Comment Utility
The size is not much, apparently. You should probably try a thread pool model. You can google for a standard threadpool implementation in java and use that. Otherwise write your own. A thread pool is like this - a thread manager has control over, say n, threads.  As "jobs" come in, the manager picks one of the free threads and assigns the job to it. The thread finishes its job and returns to the free pool. This way the no. of threads spawned doesn't get out of control.
0
 
LVL 1

Author Comment

by:jacobbdrew
Comment Utility
yeah, i'm running a thread pool, but things are still spinning out of contorl. the 'fix' i've implement (more a hack) is to only run the app on a limited number of files at a time, let it close, then restart it. not the most elegant solution but time's awastin'. thanks for your help. more points for ya.

but just to check one more time, based on the code above (version 1), there's nothing more i need to do to 'finalize' how i'm reading the files?
0
 
LVL 4

Expert Comment

by:v_karthik
Comment Utility
It looks ok to me ... just add my indexOf fix to see if it helps... just to get it right, that string buffer u have has the COMPLETE data contained in all ur 2000 files? or is it like each stringbuffer (and each vector) is for a single file?

and, an aside .. its not that useful now, but its better u know -

sbAllOfLog.append(line + '\n');

is faster if used this way -

sbAllOfLog.append(line).append("\n");

0
 
LVL 1

Author Comment

by:jacobbdrew
Comment Utility
cool. no, the string buffer does not contain all the data from all 2000 files. this read is within a spawned thread, and there's a thread for each file (limited to 25 by the pool)

i will add the indexOf fix.

0
 
LVL 4

Expert Comment

by:v_karthik
Comment Utility
ok .. will post again if i have a brainwave later.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now