We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

StringBuffer , StringWriter And Performance Problems

sudhakar_koundinya
on
Medium Priority
2,087 Views
Last Modified: 2013-12-29
Hi have a situation where performance is a major issue

I am using jakarta POI package for reading WordDocument and writing the parsed text into output buffer.  By Default jakarta is providing writeAllText(java.io.Writer) as the method to write the parsed text into the output buffer.

But my client needs the method something like getDocumentText() that should written String. This return String will be used for some other purposes. SO I have used StringWriter as an Object to writeAllText() Method and using StringWriter().getBuffer().toString() I am  returning the parsed text as a String Object.

But this is becoming problem. Writing parsed text to Writer and then returning it into String slows down the proccess.

So what I did is I have written Some Other method in Jakata class that returns String. But Intenally uses StringBuffer

( I don't know whether it is legal or not to write my own method in Jakarta Package - If any body knows please let me know regarding this )

OK coming to my technical problem, even this is also not a quite good Idea, becase StringBuffer.toString() reduces the proformance.

And also you know that using of String for internal proccesses is not a quite good idea.(As it is Immutable). hence I have used StringBuffer for internal proccess.

So what is the best proccess I have to follow to make my return method works with high performance?



Thanks,
Sudhakar

P.S : The return method should return String only.
Comment
Watch Question

FYI.  StringWriter is also using StringBuffer as its data structure.

StringBuffer's to String causes new String(this).

There is no other way to improve the performance on this case.

Coming to changing code (Jakarta) is not a good idea, since you can't apply any new version in future.


Based on your next operation, you can optimize a bit.  But there is no way to improve directly from POI.
CERTIFIED EXPERT
Top Expert 2016

Commented:
It's legal as it's open source although i'd take a look at the license anyway to make sure you're fully compliant.

StringBuffer operations can be optimised by allocating a buffer size of the correct length beforehand. If you don't know the length, and memory permits, over-allocate.

Author

Commented:
Foolowing is the licsence

/* ====================================================================
 * The Apache Software License, Version 1.1
 *
 * Copyright (c) 2003 The Apache Software Foundation.  All rights
 * reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in
 *    the documentation and/or other materials provided with the
 *    distribution.
 *
 * 3. The end-user documentation included with the redistribution,
 *    if any, must include the following acknowledgment:
 *       "This product includes software developed by the
 *        Apache Software Foundation (http://www.apache.org/)."
 *    Alternately, this acknowledgment may appear in the software itself,
 *    if and wherever such third-party acknowledgments normally appear.
 *
 * 4. The names "Apache" and "Apache Software Foundation" and
 *    "Apache POI" must not be used to endorse or promote products
 *    derived from this software without prior written permission. For
 *    written permission, please contact apache@apache.org.
 *
 * 5. Products derived from this software may not be called "Apache",
 *    "Apache POI", nor may "Apache" appear in their name, without
 *    prior written permission of the Apache Software Foundation.
 *
 * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
 * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
 * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
 * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
 * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
 * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 * ====================================================================
 *
 * This software consists of voluntary contributions made by many
 * individuals on behalf of the Apache Software Foundation.  For more
 * information on the Apache Software Foundation, please see
 * <http://www.apache.org/>.
 */

Author

Commented:
and here is thje code



  public void read() throws IOException {
    int textStart = Utils.convertBytesToInt(_header, 24);
    int textEnd = Utils.convertBytesToInt(_header, 28);
    ArrayList textPieces = findProperties(textStart, textEnd, _text.root);
    int size = textPieces.size();
    for (int x = 0; x < size; x++) {
      TextPiece nextPiece = (TextPiece) textPieces.get(x);
      boolean ctrlFound = false;
      int start = nextPiece.getStart();
      int end = nextPiece.getEnd();
      boolean unicode = nextPiece.usesUnicode();
      int add = 1;
      if (unicode) {
        add = 2;

      }
      {
        for (int y = start; y < end; y += add) {
          char ch = '\0';
          char prev = '\0';
          if (unicode) {
            ch = (char) Utils.convertBytesToShort(_header, y);
          }
          else {
            ch = (char) _header[y];
          }
          if (ch == '') {
            prev = ch;
            continue;
          }
          else if (ch == '\007') {
            y = arrangetabledata(y, add, unicode);
            continue;
          }
          else if (ch == '\023') {
            ctrlFound = true;
            y = adjustTOC(start, end, y, add, unicode);
          }
          else {
            if (ch == '\r' && prev == '\r') {
              y += add;
              if (unicode) {
                ch = (char) Utils.convertBytesToShort(_header, y);
              }
              else {
                ch = (char) _header[y];
              }
              prev = ch;
              continue;
            }
            if (!ctrlFound) {
              if (ch == '\r' && prev == '\r') {
                _word.append("\r\n");
              }
              else
              if (ch == '\n' && prev == '\n') {
                _word.append("\r\n");
              }
              else
              if ( (ch == '\n' || ch == '\r') && ch != '\t') {
                _word.append("\r\n");
              }
              else {
                _word.append( (ch));
              }
            }
          }

          ctrlFound = false;
          prev = ch;
        }
      }

    }

Author

Commented:
AND RETURN METHOD WAS SOME THING LIKE THIS

public byte[] getBytes() {
    return _word.toString().getBytes();

  }
CERTIFIED EXPERT
Top Expert 2016

Commented:
Not quite sure why you posted that... The license is simple. Just make sure you comply with it. The posted code doesn't throw much light on the StringBuffer issue...
CERTIFIED EXPERT
Top Expert 2016

Commented:
>>AND RETURN METHOD WAS SOME THING ...

I posted my last comment before seeing that. So _word is a StringBuffer?

Author

Commented:
Yes

>> Not quite sure why you posted that... The license is simple.

I didn't get you here
CERTIFIED EXPERT
Top Expert 2016

Commented:
Don't worry. Just comply with it - don't post it ;-)
CERTIFIED EXPERT
Top Expert 2016

Commented:
As for the StringBuffer thing, you don't show where it's allocated, which is of some importance.

Author

Commented:
OK,

Then how can I optimize this code?

Perviously where ever you find StringBuffer related, there it was Writer related code is there. Remaining is same as of POI

Author

Commented:
It is created at constructor level

StringBuffer _word =new StringBuffer();
CERTIFIED EXPERT
Top Expert 2016

Commented:
Well if that's what is there, you can see that it's already going against what i said earlier. That's a non-optimal way, in terms of later processing,  of creating a StringBuffer. Next thing to do is an analysis of the average final size of that buffer, then allocate space equal to the next Kb-aligned (1024) size above that or something.

Author

Commented:
When I tested the code just now,

the size of StringBuffer and it's capacity are some thing like this
Size   Length
6794 12234
6795 12234
6796 12234
6797 12234
6798 12234
6799 12234
6800 12234
6801 12234
6802 12234
6804 12234
6805 12234
6806 12234
6807 12234
6808 12234
6809 12234
6810 12234
6812 12234
6814 12234
6816 12234
6818 12234

means the capcity is almost the double of actual size .
So what should i do next ?

Author

Commented:
OOP soory
it is
length  Capacity
6794 12234
6795 12234
CERTIFIED EXPERT
Top Expert 2016

Commented:
Please pay close attention to what i'm saying. What you've just shown is precisely why i'm giving the advice i posted. All you need to do is follow it ;-)

Author

Commented:
Truely speaking I am not understanding your point. Sorry for my poor English understanding. can you elaborate with simple thoughts what you are trying to say to me :(
CERTIFIED EXPERT
Top Expert 2016
Commented:
Unlock this solution with a free trial preview.
(No credit card required)
Get Preview

Author

Commented:
FYI,
I communicated the POI Developer who worked on Word API,

He said that the class what I am using is going to be depricated and suggested to use HWPFDocument and also gave the URL

www.textmining.org where i Have downloaded WordParser API and it is quite better than the one that I am using :-)

Thanks,
Sudhakar

Author

Commented:
The latest is efficient in terms of parsing and handling the documents. TextMining is able to handle Word 6.0 Documents also.
Which is quite intresting.

:)
CERTIFIED EXPERT
Top Expert 2016

Commented:
:-)
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.