Link to home
Start Free TrialLog in
Avatar of nikhilbansal
nikhilbansal

asked on

Java I/O Performace

Hi All,

I was stuck up in a problem regarding Arabic text. My task was to transfer files from MainFrame containing English and Arabic data (in EBCIDC) to UTF-8 format. I am using a utility which converts this EDBCIDC data to UTF-8. However the orientation of Arabic data in Mainframe file was wrong (L-R). Hence the o/p of the utlitity was also wrong.

I've built my own program which reverses the Arabic text. But I am facing some problem with the I/O performace. My program reads a line from a file processes it and writes it to the o/p file.
O/p is being written at 16 kb/sec which is too less.

I want to improve the performance.

Plz help

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;

/*
 * Created on Jun 17, 2006
 *
 * TODO To change the template for this generated file go to
 * Window - Preferences - Java - Code Style - Code Templates
 */

/**
 * @author shaikhat
 *
 * TODO To change the template for this generated type comment go to
 * Window - Preferences - Java - Code Style - Code Templates
 */
public class ReverseArabicNum {
      
      static boolean      ascFlg      =      false;
      static boolean      comFlg      =      false;
      static boolean      arabicFlg      =      false;
      
      static boolean      tempAscFlg      =      false;
      static boolean      tempComFlg      =      false;
      static boolean      tempArabicFlg      =      false;
      
      
      static boolean      treatNumArabic      =      false;
      
      static char[]      arabicArray1      =      new char[10000];
      static char[]      commonArray1      =      new char[10000];
      static char[]       tempArabicArray      =      new char[10000];
      static char[]       tempCommonArray      =      new char[10000];
      
      static  int[]      numArray1            =      new int[10];
      
      static      char      prevChar            =      'E';
      static String       finalStr      =      "";
      
      static int arrayLength;
      
      static int posArb      =      0;
      static int posCom      =      0;
      static int posNum      =      0;
      
      static int tempArbLength      =      0;
      static int tempComLength      =      0;
      
      static String inFilePath      ;
      static String outFilePath      ;      
      static String       strToFile;
      
      static BufferedWriter      bw      =      null;
      
      static{
            try{
//                  bw      =      new BufferedWriter(new FileWriter("D://SourceFiles//test.txt"));
            }
                  catch(Exception e){
                        System.out.println("Exception in main "+e.toString());

                  }
            }

      

      public static void main(String[] args) {
            
            String       str;

            char      charFrmLine;
            
            int lineLength;
            int asciiVal      =      0;
            int lineNumber      =      1;
            
            inFilePath      =      args[0];
            outFilePath      =      args[1];
            
//            inFilePath      =      "D:\\pcMain1ENDV.NATSIS.AFI.CPYBK.VER0102(AFIDOW)";
//            outFilePath      =      "D:\\test123.txt";

            try{
                  BufferedReader      br      =      new BufferedReader(new FileReader(inFilePath));
                  bw      =      new BufferedWriter(new FileWriter(outFilePath));                  

                        while((str      =      br.readLine()) != null){
//                              System.out.println(str);
                              lineLength      =      str.length();
                              System.out.println(lineLength);
                              for(int i =0;i<lineLength;i++){
                                    
                                    if(treatNumArabic)
                                    {
                                          comFlg      =      true;
                                          ascFlg      =      false;
                                          arabicFlg      =      false;                                          
                                    }
                                    if(ascFlg){
                                          tempAscFlg      =      ascFlg;      
                                          prevChar      =      'E';
                                    }if(comFlg)
                                    {
                                          tempComFlg      =      comFlg;                                          
                                          prevChar      =      'C';                                          
                                    }if(arabicFlg)
                                    {
                                          tempArabicFlg      =      arabicFlg;                                          
                                          prevChar      =      'A';                                          
                                    }
                                    
                                    ascFlg      =      false;
                                    arabicFlg      =      false;
                                    comFlg      =      false;      
                                    
                                    charFrmLine            =      str.charAt(i);
//                                    System.out.println("charFrmLine "+charFrmLine);
                                    asciiVal            =      (int)charFrmLine;

/*                                    if((i==(lineLength-1)) &&  (asciiVal==36)){
                                          arabicFlg      =      tempArabicFlg;
                                          comFlg      =      tempComFlg;
                                          ascFlg      =      tempAscFlg;
                                          
                                          break;
                                    }*/

                                    chkValBelongsToWhichArray(asciiVal);
                                    
                                    if(ascFlg && prevChar=='E'){
                                          formFinalStr(asciiVal);
                                    }else if(ascFlg && prevChar=='A'){
                                          // reverse Arabic Array
                                          flipArray(posArb);                                          
                                          // write arabic to Final Str
                                          formFinalStr(tempArabicArray,tempArbLength);
                                          // write English to final str
                                          formFinalStr(asciiVal);
                                    }else if(ascFlg && prevChar=='C'){

/*
 *       code being added to handle english
 *  numbers embedded in Arabic text
 */
                                          for(int iNum=0;iNum<numArray.length;iNum++){
                                                if(asciiVal==numArray[iNum]){
                                                      treatNumArabic      =      true;
                                                      comFlg      =      false;                                                      
                                                      break;
                                                }else{
                                                      treatNumArabic      =      false;
                                                }
                                          }
                                          if(treatNumArabic && (posArb >0)){
                                                formCommonArray(asciiVal);
                                          }else{
                                          
                                          if(posArb>0){
                                                flipArray(posArb);
                                                formFinalStr(tempArabicArray,tempArbLength);
                                                formFinalStr(tempCommonArray,tempComLength);
                                                posArb      =      0;
                                                posCom      =      0;
                                          }if(posNum>0){
                                                for(int num=(posNum-1);num>=0;num--){
//                                                      formCommonArray(numArray1[num]);
                                                      formFinalStr((char)commonArray1[num]);
                                                }
                                                posNum      =      0;
                                          }

                                          formFinalStr(asciiVal);
                                          }

                                    }else if(arabicFlg && prevChar=='E'){
                                          // form Arabic array
                                          formArabicArray(asciiVal);
                                    }else if(arabicFlg && prevChar=='A'){
                                          // append Arabic array
                                          formArabicArray(asciiVal);
                                    }else if(arabicFlg && prevChar=='C'){
                                          // if arabicArray started then
                                          // unload commonArray in arabicArray
                                          // and append current arabic char in
                                          // Arabic array
                                          
                                          if(posArb>0)
                                          {
                                                for(int com=(posCom-1);com>=0;com--){
                                                      formArabicArray(tempCommonArray[com]);
                                                }
                                                posCom      =      0;
                                          }
                                          formArabicArray(asciiVal);
                                          
                                    }else if(comFlg && prevChar=='E'){
                                          // write to File
                                          formFinalStr(asciiVal);
                                    }else if(comFlg && prevChar=='A'){
//                                          System.out.println("calling formCommonArray");
                                          // form commonArray
                                          formCommonArray(asciiVal);
                                    }else if(comFlg && prevChar=='C'){
                                          // if arabicArray already started then
                                          // put this char in commonArray
                                          //else send to file
                                          if(posNum>0){
                                                for(int num=(posNum-1);num>=0;num--){
                                                      formCommonArray(numArray1[num]);
                                                }
                                                posNum      =      0;
                                          }
                                          if(posArb>0){
                                                formCommonArray(asciiVal);
                                          }else{
                                                formFinalStr(asciiVal);
                                          }
                                    }
                              
                              } // End of  For
                              
                              //Write the Common and Arabic Array to file if exist
                              // Check the condition whether to write common or arabic first
//                              System.out.println("line over");
                              if (posArb>0 && posCom ==0){
//                                    System.out.println("line over 1");
                                    // reverse Arabic Array
                                    flipArray(posArb);                                          
                                    // write arabic to Final Str
                                    formFinalStr(tempArabicArray,tempArbLength);
                                    // write English to final str
                              }else if (posArb==0 && posCom >0){
//                                    System.out.println("line over 2");
                                    formFinalStr(tempCommonArray,tempComLength);
                              }else if (posArb>0 && posCom >0){
//                                    System.out.println("line over 3");                                    
                                    if (arabicFlg){
//                                          System.out.println("line over 3a");
                                          flipArray(posArb);                                          
                                          // write arabic to Final Str
                                          formFinalStr(tempArabicArray,tempArbLength);
                                          // write English to final str
                                    }
                                    if(comFlg){
//                                          System.out.println("line over 3b");                                          
                                          flipArray(posArb);                                          
                                          // write arabic to Final Str
                                          formFinalStr(tempArabicArray,tempArbLength);
                                          // write English to final str
                                          
                                          formFinalStr(tempCommonArray,tempComLength);
                                          
                                    }
                              }
                              
                              posCom      =      0;
                              posArb      =      0;                                    
                                    
                              writeToFile(finalStr);
                              finalStr      =      "";
                        } // End Of While
                        }catch(Exception e){
                              System.out.println("Exception in main "+e.toString());

                        }
      }
      
      public static void formNumberArray(int asciiVal){

//            System.out.println("number array"+ (char)asciiVal);
            numArray1[posNum]      =      (char)asciiVal;
            posNum++;
      }
      
      
      public static void formArabicArray(int asciiVal){

//            System.out.println("arabic array"+ (char)asciiVal);
            arabicArray1[posArb]      =      (char)asciiVal;
            posArb++;
      }
      
      public static void formCommonArray(int asciiVal){
//            System.out.println("posCom"+posCom+" asciiVal "+asciiVal);
            commonArray1[posCom]      =      (char)asciiVal;
            tempCommonArray[posCom]      =      (char)asciiVal;      
            posCom++;
            tempComLength      =      posCom;
      }
      
      public static void flipArray(int posArb){
            tempArbLength      =      posArb;
//            System.out.println("flipArray::posArb"+posArb);
            for(int i=(posArb-1),j=0;i>=0;i--,j++)
            {
                  tempArabicArray[j]      =      arabicArray1[i];
//                  System.out.println("tempArabicArray "+tempArabicArray[j]);
            }
            // re - initialized posArb to zero.
            posArb      =      0;
            
      }

      
      public static void writeToFile(String str)
      {
            try{
                  System.out.println("str "+str);
                  strToFile      =      str.replace('$',' ');

            bw.write(strToFile);
            bw.write("\r\n");
            bw.flush();
            }
            catch(Exception e){

                  System.out.println("Exception in writeToFile "+e.toString());

            }
      }
      
      public static void formFinalStr(int asciiVal)
      {
//            System.out.println("Writing to file"+(char)asciiVal);
            finalStr      =      finalStr      + (char)asciiVal;      
      }
      
      public static void formFinalStr(char []array,int arrLen)
      {
//            System.out.println("formFinalStr");
//            System.out.println("arr length"+array.length);
            for(int arrLength =      0;arrLength<arrLen;arrLength++)
            {
//                  System.out.println("inside for");
//                  System.out.println("arr val"+(char)array[arrLength]);
                  finalStr      =      finalStr      + (char)array[arrLength];                  
            }
      
      }
      
      public static void chkValBelongsToWhichArray(int asciiVal){
            
            for(int i=0;i<EngArray.length;i++){
                  if(asciiVal==EngArray[i])
                  {
                        ascFlg      =      true;
                        comFlg      =      false;                        
                        arabicFlg      =      false;                        
                        break;
                  }
            }
            for(int j=0;j<CommonArray.length;j++){
                  if(asciiVal==CommonArray[j])
                  {
                        comFlg      =      true;
                        ascFlg      =      false;
                        arabicFlg      =      false;                        
                        break;
                  }
            }
            if(!ascFlg && !comFlg)
            {
//                  System.out.println("Setting arabic flg true");
                  arabicFlg      =      true;
                  comFlg      =      false;
                  ascFlg      =      false;
                  
            }
            
            
      }
      
      
      static int[] EngArray      =      {65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,      //A-Z
                                                 80,81,82,83,84,85,86,87,88,89,90,
                                                97,98,99,100,101,102,103,104,105,106,107,108,      //a-z
                                                109,110,111,112,113,114,115,116,117,118,119,120,121,122,
                                                48,49,50,51,52,53,54,55,56,57      // 0-9
                                                };

      static int[] CommonArray      =      {
                                                       45,            // hyphen
                                                       32,            // white space
                                                       58,                  // colon :
                                                       42,                  //asterik
                                                       34,                  // double quotes
                                                       46,                  // decimal point
                                                       40,                  // opening bracket (
                                                       41,                  // closing bracket )
                                                       33,                  // Exclaimation mark !
                                                       35,                  // Hash #
                                                       36,                  // Dollar sign $
                                                       37,                  // Percentage %
                                                       38,                  // Ampersand &
                                                       39,                  // Single quotes '
                                                       43,                  // + sign
                                                       44,                  // comma ,
                                                       45,                  // minus sign
                                                       47,                  // forward slash /       
                                                       91,                  // sqaure opening brackets [
                                                       93,                  // sqaure closing brackets ]
                                                       92,                  // back slash      \
                                                       94,                  // ^
                                                       95,                  // underscore _
                                                       96,                  // `
                                                       123,                  // opening curly brace {
                                                       125,                  // closing curly brace }
                                                       124,                  // pipe |
                                                       126,                   //      ~
                                                       63,                  //  question mark
                                                       60,                  //      less than <
                                                       62,                  //      greater than >
                                                       61,                  //      = equal to
                                                       166,                  
                                                       220,                  //      underscore or dash - unsure
                                                       64                        //      @
                                                };
      
      static int[] numArray            =      {48,49,50,51,52,53,54,55,56,57}      ;      //      0-9
      
      static int[] ArabicArray      ={};

}



 
ASKER CERTIFIED SOLUTION
Avatar of Mayank S
Mayank S
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Nikil,

2 things.

1. Initialize your bufferedWriter with appropriate buffere size [ public BufferedWriter(Writer out, int size) ]. Second parameter defines the amount of data that needs to be buffered. Try using 64*1024 in that.

2. Even though you have 64kb in the above, it all depends on the amount of data that can be written to your disk at the physical level. I do understand 16kb/sec is too less. How did you come up with this figure. That will give some more idea into the problem.


Since its single thread, changing from String to StringBuilder etc wont have much impact provided you are not running short of CPU etc..

Also, can you try to measure the time it takes for your write to file method alone?. That would be helpful.

~Rajesh.B
>> Since its single thread, changing from String to StringBuilder etc wont have much impact provided you are not running short of CPU

What makes you say that? Whether it is single threaded or not, it doesn't matter - StringBuffer or StringBuilder will always give much better performance than Strings while concatenation.
Mayankeagle,

I do agree that String to StringBuilder will boost performance. Reason - String concatenation would create more objects and StringBuilder wont.

However, whether its related to the problem specified in the question or not?.. By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program. Thatz why I made that comment.

We need realistic data points from Nikil for nailing down to whether IO is the real bottleneck or not. Unless otherwise he provides the time taken for his write operation to complete..its hard.

One quick thing is that he hasn't specified the buffer size in the writer, which by default is 8k in size.
>> I would like to ask you both to calm down and stop spamming here. Thanks for starting to behave as professionalists

I did not post any personal comments or any comments against his comments until I was attacked first.

>> By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program

I had posted a comment related to this as a topic-related technical comment and that was deleted so I will post it again.

nikhilbansal, you cannot improve simply I/O by changing from String to StringBuilder but still its a good practice so there is no harm in learning that practice generally. Plus if one is out of ideas, one has to try several options so try doing it (or something else) instead of just doing nothing. writeToFile () does the I/O and is called in a loop, so the delay in subsequent writeToFile () calls will be caused by the time taken by the rest of the loop to execute. If the rest of the loop executes faster due to StringBuffers/ StringBuilders, it might be that the calls to the I/O method writeToFile () are made faster hence there could be some performance advantage. The performance of the method which does string concatenation in a loop will be improved by using StringBuffer/ StringBuilder so I see no disadvantage in doing it though it might not solve the actual problem (its only a suggestion, it need not be the nail-on-the-head answer).
Avatar of nikhilbansal
nikhilbansal

ASKER

Hi Mayank and Rajesh,

Thx for posting suggestions. I tried using StringBuffer and it has greatly improved the performance. I also increased the buffer size to 1024 * 64. This however did not have much effect on the performance.

Regards

Nikhil Bansal
>> I also increased the buffer size to 1024 * 64. This however did not have much effect on the performance

I knew it won't ;-)

>> I tried using StringBuffer and it has greatly improved the performance

Thanks for accepting and proving my point - I knew it would. Hopefully this will also be a lessen for rajesh_bala to stop commenting on others correct answers and confuse the questioner by leading him in the wrong direction.
String concatenation is not necessarily *always* optimized to use StringBuffer, and when it is, the StringBuffer may not be used optimally. To remove all doubt, decompile the bytecode and see if it

a. is used
b. if so, is sized optimally (default buffer reallocations are time- and memory-expensive)
>> String concatenation is not necessarily *always* optimized to use StringBuffer

Correct, thanks for confirming CEHJ.

>> if so, is sized optimally (default buffer reallocations are time- and memory-expensive)

Exactly, which is what I mentioned in my very first comment - >> Initialize it with the estimated size so that the number of expansions is less.

>> Let's move the discussion from this thread to the appropriate place to discuss this issue :)

Thanks.
mayank, a few more notes on StringBuffer usage for you to file away:

Basic StringBuffer v. String concatenation rules. In the following, "Static" refers to a String that is constant at compile time while "Dynamic" refers to a String or other append-accepted argument type whose content cannot be known until run time.

1) StaticA + StaticB + StaticC
Don't bother. The compiler optimizes this into a single String constant. If you change it to use StringBuffer you just defer the cost to run time. And perhaps multiply it if this is done repetitively.

2) stringvar += DynamicA
Rarely a good reason for this. Almost always seen in loops. Each iteration results in at least the allocation of a temporary StringBuffer with internal char[] to perform the concatentation and then the construction of a String object to assign back to stringvar as the result. That's the best case.

3) StaticA + DynamicB or DynamicA + DynamicB
Harder case. If you know the result of the concatenation is definitely <= 16 characters then go ahead and do concatenation. It's more succinct and easier to read. However, if the result might be > 16 characters then intelligent StringBuffer use will optimize this case. The reason is that the compiler uses the no-arg constructor of StringBuffer which creates the object with an internal char[] of 16 characters (known as the capacity). When the current capacity is exceeded, the append method determines the minimum capacity necessary and a new capacity is calculated as the larger of the minimum necessary (current capacity + append argument length) or twice the current capacity + 2. That means the capacity will always at least double. Of course, when a new char[] is allocated, the first thing that happens is the JVM initializes it to zeroes and then the old char[] contents must be copied into it after which the value being appended is finally copied in. This happens each time the capacity is exceeded. But it can all be avoided if you can reasonably guess the necessary capacity of the StringBuffer when you create it and use the StringBuffer(int capacity) argument. Of course, too large isn't good either as that requires overhead initializing parts of the char[] that are never used. Finally, don't forget that naive StringBuffer usage (always using the no-arg constructor) usually results in a lot of temporary objects that have to be garbage collected.

4) stringbuffer.append(StaticA + DynamicB + StaticC)
Please don't do this. It makes me want to vomit whenever I see it. Always use just one argument to append:
stringbuffer.append(StaticA).append(DynamicB).append(Static) which, BTW, raises another interesting point. The StringBuffer append method permits method-chaining since it returns the StringBuffer ref. Method chaining is another optimization that avoids a small amount of run time overhead. Reason is, if you write each append call as a separate statement then the compiler has to issue an instruction to load the StringBuffer ref into a register so it can invoke a method on it. When you stack the calls with method-chaining the compiler knows the return object on which the next method will be called is already in the right register so it can just push the args and call the method. Could save a little time if you do /lots/ of appending. Besides which I like the way it looks better. :-)

Regards to all,
Jim
>> StaticA + StaticB + StaticC

Of course, for static ones it doesn't make sense. But in this case they were dynamic.

>> stringbuffer.append(StaticA).append(DynamicB).append(Static)

That's one of my favourite ones too.

>> if you write each append call as a separate statement then the compiler has to issue an instruction to load the StringBuffer ref into a register

Good point.
Not enough method chaining in the API IMHO ;-)
I agree, CEHJ. Way too many methods returning void when they could return at least the object ref. Same kind of thing happening in jakarta commons. API designers (self included) need to seriously reevaluate any decision to have a method with a void return.