MehtaJasmin
asked on
CSV file parsing thru Java
I have text file (csv) with about 1000 lines. Each row is like below
I have to grab 7th token which is 16 digits number, and mask that for few digits and place it back at the same position. and write that updated file to the disk.
I am exploring 2 options:
1) use either StringTokenizer
2) String class's 'split(arg0)' method
I guess choice 2) is better. If I take that choice, then in order to replace 7th token back with updated string, I have to concate all tokens in order to make updated row? Is that the only choice or is there any other better performance solution?
Something like this:
08172013,8040,520193200,0001,05702,1 10331,6019 4944007617 92,041552, 0920,00000 1751,,CHA, D
I have to grab 7th token which is 16 digits number, and mask that for few digits and place it back at the same position. and write that updated file to the disk.
I am exploring 2 options:
1) use either StringTokenizer
2) String class's 'split(arg0)' method
I guess choice 2) is better. If I take that choice, then in order to replace 7th token back with updated string, I have to concate all tokens in order to make updated row? Is that the only choice or is there any other better performance solution?
Something like this:
String tokens[] = null;
// scan each file, line by line
while (fileScanner.hasNextLine())
{
// FIND RECORD TYPE OF CURRENT RECORD
String currentLine = fileScanner.nextLine();
if (StringUtils.isNotBlank(currentLine))
{
tokens = currentLine.split(",");
}
// update token[6] by needful masking
// String updatedRecord = add up all token[i] using loop
Personally i always use a proper CSV library like OpenCSV
ASKER
I understand. I don't have too much programming needed.
Only the above is my requirement. Do you have a suggestion ?
In order to introduce another open source library, my organization requires long paper work.
Only the above is my requirement. Do you have a suggestion ?
In order to introduce another open source library, my organization requires long paper work.
Well, fortunately in this case, it looks like your csv might be simple. If so, you can use String.split
ASKER
Ok great. What about the second part of my question
If I take that choice, then in order to replace 7th token back with updated string, I have to concate all tokens in order to make updated row? Is that the only choice or is there any other better performance solution?
Well you can try to optimise it. Don't forget the line will be held initially in a String anyway. You could acutally do a replace but you'd have to be careful you were not replacing at the wrong position
The class shown here http://technojeeves.com/index.php/70-java-list-to-string has a join method if you need to join it back together again
The class shown here http://technojeeves.com/index.php/70-java-list-to-string has a join method if you need to join it back together again
After you have created a string array from the split function, You might consider creating a StringBuilder, looping through your string array using the index to get the individual values and append them to the StringBuilder, with the exception that when the index is 6 (arrays begin with index 0), do what you need to modify the value and append the new value. You can then convert the StringBuilder to a string using the toString() method. Let us know if you want help in coding that.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This solution worked best with combination of Pattern.compile() method and some other String replace methods. Thanks a lot.
But do bear in mind that what that gives you in convenience is traded off to robustness and (probably) overhead
ASKER
But I thought you said,
Thus I used Pattern. This is what I ended up doing:
Keeping the regex as a compiled java.util.regex.Pattern would probably optimize that.
Thus I used Pattern. This is what I ended up doing:
InputStream fileInputStream = new FileInputStream("C:/temp/Test.txt");
Scanner fileScanner = new Scanner(new ByteArrayInputStream(IOUtils.toByteArray(fileInputStream)), "UTF-8");
while (fileScanner.hasNextLine())
{
String currentLine = fileScanner.nextLine();
Pattern p = Pattern.compile("(?:[\\w]*,){6}(\\w*),.*");
Matcher m = p.matcher(currentLine);
String MSC = m.replaceAll("$1");
String seventhField = MSC.substring(6, 12);
String updatedMC = StringUtils.replace(MSC, seventhField, "111111") ;
String updatedline = StringUtils.replace(currentLine, MSC, updatedMC);
System.out.println(currentLine);
System.out.println(updatedline);
}
}
But I thought you said,I did. That doesn't negate the downside. It only mitigates it. But it certainly doesn't in way you've used it - you're compiling the Pattern inside the loop. Why?
Keeping the regex as a compiled java.util.regex.Pattern would probably optimize that
ASKER
Good eye on catching compiling the Pattern inside the loop. I put it outside. It was mistake, not on purpose. So there is no other optimize way other than how I coded right now?
There are other iffy parts too, but i'm afraid i'm a bit busy right now