Solved

StringBuffer , StringWriter And Performance Problems

Posted on 2004-03-23
22
1,303 Views
Last Modified: 2013-12-29
Hi have a situation where performance is a major issue

I am using jakarta POI package for reading WordDocument and writing the parsed text into output buffer.  By Default jakarta is providing writeAllText(java.io.Writer) as the method to write the parsed text into the output buffer.

But my client needs the method something like getDocumentText() that should written String. This return String will be used for some other purposes. SO I have used StringWriter as an Object to writeAllText() Method and using StringWriter().getBuffer().toString() I am  returning the parsed text as a String Object.

But this is becoming problem. Writing parsed text to Writer and then returning it into String slows down the proccess.

So what I did is I have written Some Other method in Jakata class that returns String. But Intenally uses StringBuffer

( I don't know whether it is legal or not to write my own method in Jakarta Package - If any body knows please let me know regarding this )

OK coming to my technical problem, even this is also not a quite good Idea, becase StringBuffer.toString() reduces the proformance.

And also you know that using of String for internal proccesses is not a quite good idea.(As it is Immutable). hence I have used StringBuffer for internal proccess.

So what is the best proccess I have to follow to make my return method works with high performance?



Thanks,
Sudhakar

P.S : The return method should return String only.
0
Comment
Question by:sudhakar_koundinya
  • 11
  • 9
  • 2
22 Comments
 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10664868
FYI.  StringWriter is also using StringBuffer as its data structure.

StringBuffer's to String causes new String(this).

There is no other way to improve the performance on this case.

Coming to changing code (Jakarta) is not a good idea, since you can't apply any new version in future.


0
 
LVL 9

Expert Comment

by:mmuruganandam
ID: 10665003
Based on your next operation, you can optimize a bit.  But there is no way to improve directly from POI.
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666058
It's legal as it's open source although i'd take a look at the license anyway to make sure you're fully compliant.

StringBuffer operations can be optimised by allocating a buffer size of the correct length beforehand. If you don't know the length, and memory permits, over-allocate.
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666087
Foolowing is the licsence

/* ====================================================================
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 2003 The Apache Software Foundation.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in
 *    the documentation and/or other materials provided with the
 *    distribution.
 *
 * 3. The end-user documentation included with the redistribution,
 *    if any, must include the following acknowledgment:
 *       "This product includes software developed by the
 *        Apache Software Foundation (http://www.apache.org/)."
 *    Alternately, this acknowledgment may appear in the software itself,
 *    if and wherever such third-party acknowledgments normally appear.
 *
 * 4. The names "Apache" and "Apache Software Foundation" and
 *    "Apache POI" must not be used to endorse or promote products
 *    derived from this software without prior written permission. For
 *    written permission, please contact apache@apache.org.
 *
 * 5. Products derived from this software may not be called "Apache",
 *    "Apache POI", nor may "Apache" appear in their name, without
 *    prior written permission of the Apache Software Foundation.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 * ====================================================================
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation.  For more
 * information on the Apache Software Foundation, please see
 * <http://www.apache.org/>.
 */
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666112
and here is thje code



  public void read() throws IOException {
    int textStart = Utils.convertBytesToInt(_header, 24);
    int textEnd = Utils.convertBytesToInt(_header, 28);
    ArrayList textPieces = findProperties(textStart, textEnd, _text.root);
    int size = textPieces.size();
    for (int x = 0; x < size; x++) {
      TextPiece nextPiece = (TextPiece) textPieces.get(x);
      boolean ctrlFound = false;
      int start = nextPiece.getStart();
      int end = nextPiece.getEnd();
      boolean unicode = nextPiece.usesUnicode();
      int add = 1;
      if (unicode) {
        add = 2;

      }
      {
        for (int y = start; y < end; y += add) {
          char ch = '\0';
          char prev = '\0';
          if (unicode) {
            ch = (char) Utils.convertBytesToShort(_header, y);
          }
          else {
            ch = (char) _header[y];
          }
          if (ch == '') {
            prev = ch;
            continue;
          }
          else if (ch == '\007') {
            y = arrangetabledata(y, add, unicode);
            continue;
          }
          else if (ch == '\023') {
            ctrlFound = true;
            y = adjustTOC(start, end, y, add, unicode);
          }
          else {
            if (ch == '\r' && prev == '\r') {
              y += add;
              if (unicode) {
                ch = (char) Utils.convertBytesToShort(_header, y);
              }
              else {
                ch = (char) _header[y];
              }
              prev = ch;
              continue;
            }
            if (!ctrlFound) {
              if (ch == '\r' && prev == '\r') {
                _word.append("\r\n");
              }
              else
              if (ch == '\n' && prev == '\n') {
                _word.append("\r\n");
              }
              else
              if ( (ch == '\n' || ch == '\r') && ch != '\t') {
                _word.append("\r\n");
              }
              else {
                _word.append( (ch));
              }
            }
          }

          ctrlFound = false;
          prev = ch;
        }
      }

    }
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666175
AND RETURN METHOD WAS SOME THING LIKE THIS

public byte[] getBytes() {
    return _word.toString().getBytes();

  }
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666200
Not quite sure why you posted that... The license is simple. Just make sure you comply with it. The posted code doesn't throw much light on the StringBuffer issue...
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666221
>>AND RETURN METHOD WAS SOME THING ...

I posted my last comment before seeing that. So _word is a StringBuffer?
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666246
Yes

>> Not quite sure why you posted that... The license is simple.

I didn't get you here
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666305
Don't worry. Just comply with it - don't post it ;-)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666329
As for the StringBuffer thing, you don't show where it's allocated, which is of some importance.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666343
OK,

Then how can I optimize this code?

Perviously where ever you find StringBuffer related, there it was Writer related code is there. Remaining is same as of POI
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666360
It is created at constructor level

StringBuffer _word =new StringBuffer();
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666414
Well if that's what is there, you can see that it's already going against what i said earlier. That's a non-optimal way, in terms of later processing,  of creating a StringBuffer. Next thing to do is an analysis of the average final size of that buffer, then allocate space equal to the next Kb-aligned (1024) size above that or something.
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666472
When I tested the code just now,

the size of StringBuffer and it's capacity are some thing like this
Size   Length
6794 12234
6795 12234
6796 12234
6797 12234
6798 12234
6799 12234
6800 12234
6801 12234
6802 12234
6804 12234
6805 12234
6806 12234
6807 12234
6808 12234
6809 12234
6810 12234
6812 12234
6814 12234
6816 12234
6818 12234

means the capcity is almost the double of actual size .
So what should i do next ?
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666485
OOP soory
it is
length  Capacity
6794 12234
6795 12234
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10666569
Please pay close attention to what i'm saying. What you've just shown is precisely why i'm giving the advice i posted. All you need to do is follow it ;-)
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10666729
Truely speaking I am not understanding your point. Sorry for my poor English understanding. can you elaborate with simple thoughts what you are trying to say to me :(
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 250 total points
ID: 10666757
final int BUFFER_SIZE = 1 << 20; // 1Mb

StringBuffer sb = new StringBuffer(BUFFER_SIZE);

Will solve the growing problem. Now find the *real* value of BUFFER_SIZE ;-)
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10711945
FYI,
I communicated the POI Developer who worked on Word API,

He said that the class what I am using is going to be depricated and suggested to use HWPFDocument and also gave the URL

www.textmining.org where i Have downloaded WordParser API and it is quite better than the one that I am using :-)

Thanks,
Sudhakar
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 10711957
The latest is efficient in terms of parsing and handling the documents. TextMining is able to handle Word 6.0 Documents also.
Which is quite intresting.

:)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 10756320
:-)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

For customizing the look of your lightweight component and making it look lucid like it was made of glass. Or: how to make your component more Apple-ish ;) This tip assumes your component to be of rectangular shape and completely opaque. (COD…
This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now