nikhilbansal
asked on
Java I/O Performace
Hi All,
I was stuck up in a problem regarding Arabic text. My task was to transfer files from MainFrame containing English and Arabic data (in EBCIDC) to UTF-8 format. I am using a utility which converts this EDBCIDC data to UTF-8. However the orientation of Arabic data in Mainframe file was wrong (L-R). Hence the o/p of the utlitity was also wrong.
I've built my own program which reverses the Arabic text. But I am facing some problem with the I/O performace. My program reads a line from a file processes it and writes it to the o/p file.
O/p is being written at 16 kb/sec which is too less.
I want to improve the performance.
Plz help
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
/*
* Created on Jun 17, 2006
*
* TODO To change the template for this generated file go to
* Window - Preferences - Java - Code Style - Code Templates
*/
/**
* @author shaikhat
*
* TODO To change the template for this generated type comment go to
* Window - Preferences - Java - Code Style - Code Templates
*/
public class ReverseArabicNum {
static boolean ascFlg = false;
static boolean comFlg = false;
static boolean arabicFlg = false;
static boolean tempAscFlg = false;
static boolean tempComFlg = false;
static boolean tempArabicFlg = false;
static boolean treatNumArabic = false;
static char[] arabicArray1 = new char[10000];
static char[] commonArray1 = new char[10000];
static char[] tempArabicArray = new char[10000];
static char[] tempCommonArray = new char[10000];
static int[] numArray1 = new int[10];
static char prevChar = 'E';
static String finalStr = "";
static int arrayLength;
static int posArb = 0;
static int posCom = 0;
static int posNum = 0;
static int tempArbLength = 0;
static int tempComLength = 0;
static String inFilePath ;
static String outFilePath ;
static String strToFile;
static BufferedWriter bw = null;
static{
try{
// bw = new BufferedWriter(new FileWriter("D://SourceFile s//test.tx t"));
}
catch(Exception e){
System.out.println("Except ion in main "+e.toString());
}
}
public static void main(String[] args) {
String str;
char charFrmLine;
int lineLength;
int asciiVal = 0;
int lineNumber = 1;
inFilePath = args[0];
outFilePath = args[1];
// inFilePath = "D:\\pcMain1ENDV.NATSIS.AF I.CPYBK.VE R0102(AFID OW)";
// outFilePath = "D:\\test123.txt";
try{
BufferedReader br = new BufferedReader(new FileReader(inFilePath));
bw = new BufferedWriter(new FileWriter(outFilePath));
while((str = br.readLine()) != null){
// System.out.println(str);
lineLength = str.length();
System.out.println(lineLen gth);
for(int i =0;i<lineLength;i++){
if(treatNumArabic)
{
comFlg = true;
ascFlg = false;
arabicFlg = false;
}
if(ascFlg){
tempAscFlg = ascFlg;
prevChar = 'E';
}if(comFlg)
{
tempComFlg = comFlg;
prevChar = 'C';
}if(arabicFlg)
{
tempArabicFlg = arabicFlg;
prevChar = 'A';
}
ascFlg = false;
arabicFlg = false;
comFlg = false;
charFrmLine = str.charAt(i);
// System.out.println("charFr mLine "+charFrmLine);
asciiVal = (int)charFrmLine;
/* if((i==(lineLength-1)) && (asciiVal==36)){
arabicFlg = tempArabicFlg;
comFlg = tempComFlg;
ascFlg = tempAscFlg;
break;
}*/
chkValBelongsToWhichArray( asciiVal);
if(ascFlg && prevChar=='E'){
formFinalStr(asciiVal);
}else if(ascFlg && prevChar=='A'){
// reverse Arabic Array
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr ay,tempArb Length);
// write English to final str
formFinalStr(asciiVal);
}else if(ascFlg && prevChar=='C'){
/*
* code being added to handle english
* numbers embedded in Arabic text
*/
for(int iNum=0;iNum<numArray.lengt h;iNum++){
if(asciiVal==numArray[iNum ]){
treatNumArabic = true;
comFlg = false;
break;
}else{
treatNumArabic = false;
}
}
if(treatNumArabic && (posArb >0)){
formCommonArray(asciiVal);
}else{
if(posArb>0){
flipArray(posArb);
formFinalStr(tempArabicArr ay,tempArb Length);
formFinalStr(tempCommonArr ay,tempCom Length);
posArb = 0;
posCom = 0;
}if(posNum>0){
for(int num=(posNum-1);num>=0;num- -){
// formCommonArray(numArray1[ num]);
formFinalStr((char)commonA rray1[num] );
}
posNum = 0;
}
formFinalStr(asciiVal);
}
}else if(arabicFlg && prevChar=='E'){
// form Arabic array
formArabicArray(asciiVal);
}else if(arabicFlg && prevChar=='A'){
// append Arabic array
formArabicArray(asciiVal);
}else if(arabicFlg && prevChar=='C'){
// if arabicArray started then
// unload commonArray in arabicArray
// and append current arabic char in
// Arabic array
if(posArb>0)
{
for(int com=(posCom-1);com>=0;com- -){
formArabicArray(tempCommon Array[com] );
}
posCom = 0;
}
formArabicArray(asciiVal);
}else if(comFlg && prevChar=='E'){
// write to File
formFinalStr(asciiVal);
}else if(comFlg && prevChar=='A'){
// System.out.println("callin g formCommonArray");
// form commonArray
formCommonArray(asciiVal);
}else if(comFlg && prevChar=='C'){
// if arabicArray already started then
// put this char in commonArray
//else send to file
if(posNum>0){
for(int num=(posNum-1);num>=0;num- -){
formCommonArray(numArray1[ num]);
}
posNum = 0;
}
if(posArb>0){
formCommonArray(asciiVal);
}else{
formFinalStr(asciiVal);
}
}
} // End of For
//Write the Common and Arabic Array to file if exist
// Check the condition whether to write common or arabic first
// System.out.println("line over");
if (posArb>0 && posCom ==0){
// System.out.println("line over 1");
// reverse Arabic Array
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr ay,tempArb Length);
// write English to final str
}else if (posArb==0 && posCom >0){
// System.out.println("line over 2");
formFinalStr(tempCommonArr ay,tempCom Length);
}else if (posArb>0 && posCom >0){
// System.out.println("line over 3");
if (arabicFlg){
// System.out.println("line over 3a");
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr ay,tempArb Length);
// write English to final str
}
if(comFlg){
// System.out.println("line over 3b");
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr ay,tempArb Length);
// write English to final str
formFinalStr(tempCommonArr ay,tempCom Length);
}
}
posCom = 0;
posArb = 0;
writeToFile(finalStr);
finalStr = "";
} // End Of While
}catch(Exception e){
System.out.println("Except ion in main "+e.toString());
}
}
public static void formNumberArray(int asciiVal){
// System.out.println("number array"+ (char)asciiVal);
numArray1[posNum] = (char)asciiVal;
posNum++;
}
public static void formArabicArray(int asciiVal){
// System.out.println("arabic array"+ (char)asciiVal);
arabicArray1[posArb] = (char)asciiVal;
posArb++;
}
public static void formCommonArray(int asciiVal){
// System.out.println("posCom "+posCom+" asciiVal "+asciiVal);
commonArray1[posCom] = (char)asciiVal;
tempCommonArray[posCom] = (char)asciiVal;
posCom++;
tempComLength = posCom;
}
public static void flipArray(int posArb){
tempArbLength = posArb;
// System.out.println("flipAr ray::posAr b"+posArb) ;
for(int i=(posArb-1),j=0;i>=0;i--, j++)
{
tempArabicArray[j] = arabicArray1[i];
// System.out.println("tempAr abicArray "+tempArabicArray[j]);
}
// re - initialized posArb to zero.
posArb = 0;
}
public static void writeToFile(String str)
{
try{
System.out.println("str "+str);
strToFile = str.replace('$',' ');
bw.write(strToFile);
bw.write("\r\n");
bw.flush();
}
catch(Exception e){
System.out.println("Except ion in writeToFile "+e.toString());
}
}
public static void formFinalStr(int asciiVal)
{
// System.out.println("Writin g to file"+(char)asciiVal);
finalStr = finalStr + (char)asciiVal;
}
public static void formFinalStr(char []array,int arrLen)
{
// System.out.println("formFi nalStr");
// System.out.println("arr length"+array.length);
for(int arrLength = 0;arrLength<arrLen;arrLeng th++)
{
// System.out.println("inside for");
// System.out.println("arr val"+(char)array[arrLength ]);
finalStr = finalStr + (char)array[arrLength];
}
}
public static void chkValBelongsToWhichArray( int asciiVal){
for(int i=0;i<EngArray.length;i++) {
if(asciiVal==EngArray[i])
{
ascFlg = true;
comFlg = false;
arabicFlg = false;
break;
}
}
for(int j=0;j<CommonArray.length;j ++){
if(asciiVal==CommonArray[j ])
{
comFlg = true;
ascFlg = false;
arabicFlg = false;
break;
}
}
if(!ascFlg && !comFlg)
{
// System.out.println("Settin g arabic flg true");
arabicFlg = true;
comFlg = false;
ascFlg = false;
}
}
static int[] EngArray = {65,66,67,68,69,70,71,72,7 3,74,75,76 ,77,78,79, //A-Z
80,81,82,83,84,85,86,87,88 ,89,90,
97,98,99,100,101,102,103,1 04,105,106 ,107,108, //a-z
109,110,111,112,113,114,11 5,116,117, 118,119,12 0,121,122,
48,49,50,51,52,53,54,55,56 ,57 // 0-9
};
static int[] CommonArray = {
45, // hyphen
32, // white space
58, // colon :
42, //asterik
34, // double quotes
46, // decimal point
40, // opening bracket (
41, // closing bracket )
33, // Exclaimation mark !
35, // Hash #
36, // Dollar sign $
37, // Percentage %
38, // Ampersand &
39, // Single quotes '
43, // + sign
44, // comma ,
45, // minus sign
47, // forward slash /
91, // sqaure opening brackets [
93, // sqaure closing brackets ]
92, // back slash \
94, // ^
95, // underscore _
96, // `
123, // opening curly brace {
125, // closing curly brace }
124, // pipe |
126, // ~
63, // question mark
60, // less than <
62, // greater than >
61, // = equal to
166,
220, // underscore or dash - unsure
64 // @
};
static int[] numArray = {48,49,50,51,52,53,54,55,5 6,57} ; // 0-9
static int[] ArabicArray ={};
}
I was stuck up in a problem regarding Arabic text. My task was to transfer files from MainFrame containing English and Arabic data (in EBCIDC) to UTF-8 format. I am using a utility which converts this EDBCIDC data to UTF-8. However the orientation of Arabic data in Mainframe file was wrong (L-R). Hence the o/p of the utlitity was also wrong.
I've built my own program which reverses the Arabic text. But I am facing some problem with the I/O performace. My program reads a line from a file processes it and writes it to the o/p file.
O/p is being written at 16 kb/sec which is too less.
I want to improve the performance.
Plz help
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
/*
* Created on Jun 17, 2006
*
* TODO To change the template for this generated file go to
* Window - Preferences - Java - Code Style - Code Templates
*/
/**
* @author shaikhat
*
* TODO To change the template for this generated type comment go to
* Window - Preferences - Java - Code Style - Code Templates
*/
public class ReverseArabicNum {
static boolean ascFlg = false;
static boolean comFlg = false;
static boolean arabicFlg = false;
static boolean tempAscFlg = false;
static boolean tempComFlg = false;
static boolean tempArabicFlg = false;
static boolean treatNumArabic = false;
static char[] arabicArray1 = new char[10000];
static char[] commonArray1 = new char[10000];
static char[] tempArabicArray = new char[10000];
static char[] tempCommonArray = new char[10000];
static int[] numArray1 = new int[10];
static char prevChar = 'E';
static String finalStr = "";
static int arrayLength;
static int posArb = 0;
static int posCom = 0;
static int posNum = 0;
static int tempArbLength = 0;
static int tempComLength = 0;
static String inFilePath ;
static String outFilePath ;
static String strToFile;
static BufferedWriter bw = null;
static{
try{
// bw = new BufferedWriter(new FileWriter("D://SourceFile
}
catch(Exception e){
System.out.println("Except
}
}
public static void main(String[] args) {
String str;
char charFrmLine;
int lineLength;
int asciiVal = 0;
int lineNumber = 1;
inFilePath = args[0];
outFilePath = args[1];
// inFilePath = "D:\\pcMain1ENDV.NATSIS.AF
// outFilePath = "D:\\test123.txt";
try{
BufferedReader br = new BufferedReader(new FileReader(inFilePath));
bw = new BufferedWriter(new FileWriter(outFilePath));
while((str = br.readLine()) != null){
// System.out.println(str);
lineLength = str.length();
System.out.println(lineLen
for(int i =0;i<lineLength;i++){
if(treatNumArabic)
{
comFlg = true;
ascFlg = false;
arabicFlg = false;
}
if(ascFlg){
tempAscFlg = ascFlg;
prevChar = 'E';
}if(comFlg)
{
tempComFlg = comFlg;
prevChar = 'C';
}if(arabicFlg)
{
tempArabicFlg = arabicFlg;
prevChar = 'A';
}
ascFlg = false;
arabicFlg = false;
comFlg = false;
charFrmLine = str.charAt(i);
// System.out.println("charFr
asciiVal = (int)charFrmLine;
/* if((i==(lineLength-1)) && (asciiVal==36)){
arabicFlg = tempArabicFlg;
comFlg = tempComFlg;
ascFlg = tempAscFlg;
break;
}*/
chkValBelongsToWhichArray(
if(ascFlg && prevChar=='E'){
formFinalStr(asciiVal);
}else if(ascFlg && prevChar=='A'){
// reverse Arabic Array
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr
// write English to final str
formFinalStr(asciiVal);
}else if(ascFlg && prevChar=='C'){
/*
* code being added to handle english
* numbers embedded in Arabic text
*/
for(int iNum=0;iNum<numArray.lengt
if(asciiVal==numArray[iNum
treatNumArabic = true;
comFlg = false;
break;
}else{
treatNumArabic = false;
}
}
if(treatNumArabic && (posArb >0)){
formCommonArray(asciiVal);
}else{
if(posArb>0){
flipArray(posArb);
formFinalStr(tempArabicArr
formFinalStr(tempCommonArr
posArb = 0;
posCom = 0;
}if(posNum>0){
for(int num=(posNum-1);num>=0;num-
// formCommonArray(numArray1[
formFinalStr((char)commonA
}
posNum = 0;
}
formFinalStr(asciiVal);
}
}else if(arabicFlg && prevChar=='E'){
// form Arabic array
formArabicArray(asciiVal);
}else if(arabicFlg && prevChar=='A'){
// append Arabic array
formArabicArray(asciiVal);
}else if(arabicFlg && prevChar=='C'){
// if arabicArray started then
// unload commonArray in arabicArray
// and append current arabic char in
// Arabic array
if(posArb>0)
{
for(int com=(posCom-1);com>=0;com-
formArabicArray(tempCommon
}
posCom = 0;
}
formArabicArray(asciiVal);
}else if(comFlg && prevChar=='E'){
// write to File
formFinalStr(asciiVal);
}else if(comFlg && prevChar=='A'){
// System.out.println("callin
// form commonArray
formCommonArray(asciiVal);
}else if(comFlg && prevChar=='C'){
// if arabicArray already started then
// put this char in commonArray
//else send to file
if(posNum>0){
for(int num=(posNum-1);num>=0;num-
formCommonArray(numArray1[
}
posNum = 0;
}
if(posArb>0){
formCommonArray(asciiVal);
}else{
formFinalStr(asciiVal);
}
}
} // End of For
//Write the Common and Arabic Array to file if exist
// Check the condition whether to write common or arabic first
// System.out.println("line over");
if (posArb>0 && posCom ==0){
// System.out.println("line over 1");
// reverse Arabic Array
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr
// write English to final str
}else if (posArb==0 && posCom >0){
// System.out.println("line over 2");
formFinalStr(tempCommonArr
}else if (posArb>0 && posCom >0){
// System.out.println("line over 3");
if (arabicFlg){
// System.out.println("line over 3a");
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr
// write English to final str
}
if(comFlg){
// System.out.println("line over 3b");
flipArray(posArb);
// write arabic to Final Str
formFinalStr(tempArabicArr
// write English to final str
formFinalStr(tempCommonArr
}
}
posCom = 0;
posArb = 0;
writeToFile(finalStr);
finalStr = "";
} // End Of While
}catch(Exception e){
System.out.println("Except
}
}
public static void formNumberArray(int asciiVal){
// System.out.println("number
numArray1[posNum] = (char)asciiVal;
posNum++;
}
public static void formArabicArray(int asciiVal){
// System.out.println("arabic
arabicArray1[posArb] = (char)asciiVal;
posArb++;
}
public static void formCommonArray(int asciiVal){
// System.out.println("posCom
commonArray1[posCom] = (char)asciiVal;
tempCommonArray[posCom] = (char)asciiVal;
posCom++;
tempComLength = posCom;
}
public static void flipArray(int posArb){
tempArbLength = posArb;
// System.out.println("flipAr
for(int i=(posArb-1),j=0;i>=0;i--,
{
tempArabicArray[j] = arabicArray1[i];
// System.out.println("tempAr
}
// re - initialized posArb to zero.
posArb = 0;
}
public static void writeToFile(String str)
{
try{
System.out.println("str "+str);
strToFile = str.replace('$',' ');
bw.write(strToFile);
bw.write("\r\n");
bw.flush();
}
catch(Exception e){
System.out.println("Except
}
}
public static void formFinalStr(int asciiVal)
{
// System.out.println("Writin
finalStr = finalStr + (char)asciiVal;
}
public static void formFinalStr(char []array,int arrLen)
{
// System.out.println("formFi
// System.out.println("arr length"+array.length);
for(int arrLength = 0;arrLength<arrLen;arrLeng
{
// System.out.println("inside
// System.out.println("arr val"+(char)array[arrLength
finalStr = finalStr + (char)array[arrLength];
}
}
public static void chkValBelongsToWhichArray(
for(int i=0;i<EngArray.length;i++)
if(asciiVal==EngArray[i])
{
ascFlg = true;
comFlg = false;
arabicFlg = false;
break;
}
}
for(int j=0;j<CommonArray.length;j
if(asciiVal==CommonArray[j
{
comFlg = true;
ascFlg = false;
arabicFlg = false;
break;
}
}
if(!ascFlg && !comFlg)
{
// System.out.println("Settin
arabicFlg = true;
comFlg = false;
ascFlg = false;
}
}
static int[] EngArray = {65,66,67,68,69,70,71,72,7
80,81,82,83,84,85,86,87,88
97,98,99,100,101,102,103,1
109,110,111,112,113,114,11
48,49,50,51,52,53,54,55,56
};
static int[] CommonArray = {
45, // hyphen
32, // white space
58, // colon :
42, //asterik
34, // double quotes
46, // decimal point
40, // opening bracket (
41, // closing bracket )
33, // Exclaimation mark !
35, // Hash #
36, // Dollar sign $
37, // Percentage %
38, // Ampersand &
39, // Single quotes '
43, // + sign
44, // comma ,
45, // minus sign
47, // forward slash /
91, // sqaure opening brackets [
93, // sqaure closing brackets ]
92, // back slash \
94, // ^
95, // underscore _
96, // `
123, // opening curly brace {
125, // closing curly brace }
124, // pipe |
126, // ~
63, // question mark
60, // less than <
62, // greater than >
61, // = equal to
166,
220, // underscore or dash - unsure
64 // @
};
static int[] numArray = {48,49,50,51,52,53,54,55,5
static int[] ArabicArray ={};
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>> Since its single thread, changing from String to StringBuilder etc wont have much impact provided you are not running short of CPU
What makes you say that? Whether it is single threaded or not, it doesn't matter - StringBuffer or StringBuilder will always give much better performance than Strings while concatenation.
What makes you say that? Whether it is single threaded or not, it doesn't matter - StringBuffer or StringBuilder will always give much better performance than Strings while concatenation.
Mayankeagle,
I do agree that String to StringBuilder will boost performance. Reason - String concatenation would create more objects and StringBuilder wont.
However, whether its related to the problem specified in the question or not?.. By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program. Thatz why I made that comment.
We need realistic data points from Nikil for nailing down to whether IO is the real bottleneck or not. Unless otherwise he provides the time taken for his write operation to complete..its hard.
One quick thing is that he hasn't specified the buffer size in the writer, which by default is 8k in size.
I do agree that String to StringBuilder will boost performance. Reason - String concatenation would create more objects and StringBuilder wont.
However, whether its related to the problem specified in the question or not?.. By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program. Thatz why I made that comment.
We need realistic data points from Nikil for nailing down to whether IO is the real bottleneck or not. Unless otherwise he provides the time taken for his write operation to complete..its hard.
One quick thing is that he hasn't specified the buffer size in the writer, which by default is 8k in size.
>> I would like to ask you both to calm down and stop spamming here. Thanks for starting to behave as professionalists
I did not post any personal comments or any comments against his comments until I was attacked first.
>> By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program
I had posted a comment related to this as a topic-related technical comment and that was deleted so I will post it again.
nikhilbansal, you cannot improve simply I/O by changing from String to StringBuilder but still its a good practice so there is no harm in learning that practice generally. Plus if one is out of ideas, one has to try several options so try doing it (or something else) instead of just doing nothing. writeToFile () does the I/O and is called in a loop, so the delay in subsequent writeToFile () calls will be caused by the time taken by the rest of the loop to execute. If the rest of the loop executes faster due to StringBuffers/ StringBuilders, it might be that the calls to the I/O method writeToFile () are made faster hence there could be some performance advantage. The performance of the method which does string concatenation in a loop will be improved by using StringBuffer/ StringBuilder so I see no disadvantage in doing it though it might not solve the actual problem (its only a suggestion, it need not be the nail-on-the-head answer).
I did not post any personal comments or any comments against his comments until I was attacked first.
>> By just fixing String to StringBuilder in this program, you cant improve the IO performance in this program
I had posted a comment related to this as a topic-related technical comment and that was deleted so I will post it again.
nikhilbansal, you cannot improve simply I/O by changing from String to StringBuilder but still its a good practice so there is no harm in learning that practice generally. Plus if one is out of ideas, one has to try several options so try doing it (or something else) instead of just doing nothing. writeToFile () does the I/O and is called in a loop, so the delay in subsequent writeToFile () calls will be caused by the time taken by the rest of the loop to execute. If the rest of the loop executes faster due to StringBuffers/ StringBuilders, it might be that the calls to the I/O method writeToFile () are made faster hence there could be some performance advantage. The performance of the method which does string concatenation in a loop will be improved by using StringBuffer/ StringBuilder so I see no disadvantage in doing it though it might not solve the actual problem (its only a suggestion, it need not be the nail-on-the-head answer).
ASKER
Hi Mayank and Rajesh,
Thx for posting suggestions. I tried using StringBuffer and it has greatly improved the performance. I also increased the buffer size to 1024 * 64. This however did not have much effect on the performance.
Regards
Nikhil Bansal
Thx for posting suggestions. I tried using StringBuffer and it has greatly improved the performance. I also increased the buffer size to 1024 * 64. This however did not have much effect on the performance.
Regards
Nikhil Bansal
>> I also increased the buffer size to 1024 * 64. This however did not have much effect on the performance
I knew it won't ;-)
>> I tried using StringBuffer and it has greatly improved the performance
Thanks for accepting and proving my point - I knew it would. Hopefully this will also be a lessen for rajesh_bala to stop commenting on others correct answers and confuse the questioner by leading him in the wrong direction.
I knew it won't ;-)
>> I tried using StringBuffer and it has greatly improved the performance
Thanks for accepting and proving my point - I knew it would. Hopefully this will also be a lessen for rajesh_bala to stop commenting on others correct answers and confuse the questioner by leading him in the wrong direction.
String concatenation is not necessarily *always* optimized to use StringBuffer, and when it is, the StringBuffer may not be used optimally. To remove all doubt, decompile the bytecode and see if it
a. is used
b. if so, is sized optimally (default buffer reallocations are time- and memory-expensive)
a. is used
b. if so, is sized optimally (default buffer reallocations are time- and memory-expensive)
>> String concatenation is not necessarily *always* optimized to use StringBuffer
Correct, thanks for confirming CEHJ.
>> if so, is sized optimally (default buffer reallocations are time- and memory-expensive)
Exactly, which is what I mentioned in my very first comment - >> Initialize it with the estimated size so that the number of expansions is less.
>> Let's move the discussion from this thread to the appropriate place to discuss this issue :)
Thanks.
Correct, thanks for confirming CEHJ.
>> if so, is sized optimally (default buffer reallocations are time- and memory-expensive)
Exactly, which is what I mentioned in my very first comment - >> Initialize it with the estimated size so that the number of expansions is less.
>> Let's move the discussion from this thread to the appropriate place to discuss this issue :)
Thanks.
mayank, a few more notes on StringBuffer usage for you to file away:
Basic StringBuffer v. String concatenation rules. In the following, "Static" refers to a String that is constant at compile time while "Dynamic" refers to a String or other append-accepted argument type whose content cannot be known until run time.
1) StaticA + StaticB + StaticC
Don't bother. The compiler optimizes this into a single String constant. If you change it to use StringBuffer you just defer the cost to run time. And perhaps multiply it if this is done repetitively.
2) stringvar += DynamicA
Rarely a good reason for this. Almost always seen in loops. Each iteration results in at least the allocation of a temporary StringBuffer with internal char[] to perform the concatentation and then the construction of a String object to assign back to stringvar as the result. That's the best case.
3) StaticA + DynamicB or DynamicA + DynamicB
Harder case. If you know the result of the concatenation is definitely <= 16 characters then go ahead and do concatenation. It's more succinct and easier to read. However, if the result might be > 16 characters then intelligent StringBuffer use will optimize this case. The reason is that the compiler uses the no-arg constructor of StringBuffer which creates the object with an internal char[] of 16 characters (known as the capacity). When the current capacity is exceeded, the append method determines the minimum capacity necessary and a new capacity is calculated as the larger of the minimum necessary (current capacity + append argument length) or twice the current capacity + 2. That means the capacity will always at least double. Of course, when a new char[] is allocated, the first thing that happens is the JVM initializes it to zeroes and then the old char[] contents must be copied into it after which the value being appended is finally copied in. This happens each time the capacity is exceeded. But it can all be avoided if you can reasonably guess the necessary capacity of the StringBuffer when you create it and use the StringBuffer(int capacity) argument. Of course, too large isn't good either as that requires overhead initializing parts of the char[] that are never used. Finally, don't forget that naive StringBuffer usage (always using the no-arg constructor) usually results in a lot of temporary objects that have to be garbage collected.
4) stringbuffer.append(Static A + DynamicB + StaticC)
Please don't do this. It makes me want to vomit whenever I see it. Always use just one argument to append:
stringbuffer.append(Static A).append( DynamicB). append(Sta tic) which, BTW, raises another interesting point. The StringBuffer append method permits method-chaining since it returns the StringBuffer ref. Method chaining is another optimization that avoids a small amount of run time overhead. Reason is, if you write each append call as a separate statement then the compiler has to issue an instruction to load the StringBuffer ref into a register so it can invoke a method on it. When you stack the calls with method-chaining the compiler knows the return object on which the next method will be called is already in the right register so it can just push the args and call the method. Could save a little time if you do /lots/ of appending. Besides which I like the way it looks better. :-)
Regards to all,
Jim
Basic StringBuffer v. String concatenation rules. In the following, "Static" refers to a String that is constant at compile time while "Dynamic" refers to a String or other append-accepted argument type whose content cannot be known until run time.
1) StaticA + StaticB + StaticC
Don't bother. The compiler optimizes this into a single String constant. If you change it to use StringBuffer you just defer the cost to run time. And perhaps multiply it if this is done repetitively.
2) stringvar += DynamicA
Rarely a good reason for this. Almost always seen in loops. Each iteration results in at least the allocation of a temporary StringBuffer with internal char[] to perform the concatentation and then the construction of a String object to assign back to stringvar as the result. That's the best case.
3) StaticA + DynamicB or DynamicA + DynamicB
Harder case. If you know the result of the concatenation is definitely <= 16 characters then go ahead and do concatenation. It's more succinct and easier to read. However, if the result might be > 16 characters then intelligent StringBuffer use will optimize this case. The reason is that the compiler uses the no-arg constructor of StringBuffer which creates the object with an internal char[] of 16 characters (known as the capacity). When the current capacity is exceeded, the append method determines the minimum capacity necessary and a new capacity is calculated as the larger of the minimum necessary (current capacity + append argument length) or twice the current capacity + 2. That means the capacity will always at least double. Of course, when a new char[] is allocated, the first thing that happens is the JVM initializes it to zeroes and then the old char[] contents must be copied into it after which the value being appended is finally copied in. This happens each time the capacity is exceeded. But it can all be avoided if you can reasonably guess the necessary capacity of the StringBuffer when you create it and use the StringBuffer(int capacity) argument. Of course, too large isn't good either as that requires overhead initializing parts of the char[] that are never used. Finally, don't forget that naive StringBuffer usage (always using the no-arg constructor) usually results in a lot of temporary objects that have to be garbage collected.
4) stringbuffer.append(Static
Please don't do this. It makes me want to vomit whenever I see it. Always use just one argument to append:
stringbuffer.append(Static
Regards to all,
Jim
>> StaticA + StaticB + StaticC
Of course, for static ones it doesn't make sense. But in this case they were dynamic.
>> stringbuffer.append(Static A).append( DynamicB). append(Sta tic)
That's one of my favourite ones too.
>> if you write each append call as a separate statement then the compiler has to issue an instruction to load the StringBuffer ref into a register
Good point.
Of course, for static ones it doesn't make sense. But in this case they were dynamic.
>> stringbuffer.append(Static
That's one of my favourite ones too.
>> if you write each append call as a separate statement then the compiler has to issue an instruction to load the StringBuffer ref into a register
Good point.
Not enough method chaining in the API IMHO ;-)
I agree, CEHJ. Way too many methods returning void when they could return at least the object ref. Same kind of thing happening in jakarta commons. API designers (self included) need to seriously reevaluate any decision to have a method with a void return.
2 things.
1. Initialize your bufferedWriter with appropriate buffere size [ public BufferedWriter(Writer out, int size) ]. Second parameter defines the amount of data that needs to be buffered. Try using 64*1024 in that.
2. Even though you have 64kb in the above, it all depends on the amount of data that can be written to your disk at the physical level. I do understand 16kb/sec is too less. How did you come up with this figure. That will give some more idea into the problem.
Since its single thread, changing from String to StringBuilder etc wont have much impact provided you are not running short of CPU etc..
Also, can you try to measure the time it takes for your write to file method alone?. That would be helpful.
~Rajesh.B