Link to home
Start Free TrialLog in
Avatar of rstaveley
rstaveleyFlag for United Kingdom of Great Britain and Northern Ireland

asked on

BufferedReader with a restricted buffer size

I'm occasionally getting exceptionally long text files with no line terminators. This is causing problems in my application in the following code, which is designed to strip UUEncoded blocks from plain text messages:

--------8<--------
            File temp;                  // Temporary file for collecting stripped text
            boolean skip_uue = false;
            boolean doneFirst = false;

            try {


                  // Create temp file.
                  temp = File.createTempFile("PlainTextHandler",".txt");

                  // Delete temp file when program exits.
                  temp.deleteOnExit();

                  // Write to temp file
                  BufferedWriter writer = new BufferedWriter(new FileWriter(temp));

//System.out.println(new TimeStamp().toString()+getClass().getName()+": Opening BufferedReader");

                  // Use a BufferedReader for the input stream
                  BufferedReader reader = new BufferedReader(
                        new InputStreamReader(is)
                        );

                  String line = null;
                  int line_number = 0;
                  int uue_line_number = 0;
                  while ((line = reader.readLine()) != null) {
                        ++line_number;
                        if (skip_uue) {
                              ++uue_line_number;
                              if (line.length() > 2 && "end".equals(line.substring(0,3))) {

                                    // Show how many UUEncoded lines we've skipped
System.out.println(new TimeStamp().toString()+getClass().getName()+": Skipped "+uue_line_number+" lines of UUEncoded text");
                                    skip_uue = false;

                              }
                              continue;
                        }
                        else if (line.length() > 5 && "begin".equals(line.substring(0,5))) {

                              // Look for a UUEncoded block
                              if (line.matches("^begin\\s\\d{3}\\s.+$")) {
                                    skip_uue = true;
                                    uue_line_number = 1;
                                    continue;
                              }

                        }

                        // Subsequent lines need white space
                        if (doneFirst)
                              writer.newLine();      // Give Lucene some white space to separate the tokens
                        else
                              doneFirst = true;      // We have at least one line

                        writer.write(line);            // Write the non-UUE data to the temporary file
                  }

                  reader.close();
                  writer.close();

                  // Show how many UUEncoded lines we've skipped
                  if (skip_uue)
System.out.println(new TimeStamp().toString()+getClass().getName()+": Skipped "+uue_line_number+" lines of UUEncoded text");

            }
            catch (IOException e) {
//System.out.println(new TimeStamp().toString()+getClass().getName()+": IOException "+e.toString());
                  throw new StandardDocumentHandlerException("Cannot read the text document",e);
            }
            catch (Exception e) {
//System.out.println(new TimeStamp().toString()+getClass().getName()+": Exception "+e.toString());
                  throw new StandardDocumentHandlerException("Exception caught in PlainTextHandler",e);
            }
            // ... the plain text in the temp file is then passed to Lucene, before being deleted
--------8<--------

The trouble with the code above is that it may cause String line to be loaded with an unacceptably large string, which makes this thread a bad citizen in my MT application, using up too much of the heap and causing another thread to barf with an out of memory exception, when it temporarily needs heap space.

My question is this:

Can I use the BufferedStream constructor that specifies a buffer size to limit the maximum length of string read from the reader - i.e.  http://java.sun.com/j2se/1.5.0/docs/api/java/io/BufferedReader.html#BufferedReader%28java.io.Reader,%20int%29 ? If so, is the stream still readable after reading a partial line? [I can live with having the long line broken up in such a way that tokens are broken up, because it is a special case.]
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just do

if (line.length() > MAX_LINE_LENGTH) {
    line = line.substring(0, MAX_LINE_LENGTH);
}

otherwise you'd have to do your own line reading or override BufferedReader.readLine
Avatar of rstaveley

ASKER

I guess I need to implement my own line reader, then. Thanks for the quick response, CEHJ.
:-)