Solved

Java parsing, better way to use tokenizer?

Posted on 2007-11-23
10
1,021 Views
Last Modified: 2008-09-20
I'm trying to parse a csv file (probably made in excel, or access).

Each field is separated by a "," (comma) but some fields are empty, so I'm not getting the data correctly.

Below is what I'm doing now. Is there a better way to parse?


This line works fine:
23, something, aTitle, something, Smith

This line gives me wrong values:

21,,anotherTitle,,Jones

Both come from the same table with 4 fields.

Thanks.
//...			
String line = in.readLine();
while(line!=null)
{
	StringTokenizer st = new StringTokenizer(line,",");
	int fieldCount=0;
			
	while(st.hasMoreTokens())
	{
		String nextFieldData = st.nextToken();
		++fieldCount;
				
					
		switch(fieldCount)
		{
			case 1: 
			String ID = (nextFieldData);
			break;
						
			case 3: 
			String Title  = (nextFieldData);
			break;
 
			case 5: 
			String Name  = (nextFieldData);
			break;
		}
//...

Open in new window

0
Comment
Question by:polkadot
10 Comments
 
LVL 26

Expert Comment

by:ksivananth
ID: 20339104
there are lot of issues which you have to take care if you do it that ways...

instead try some readily available parsers,

http://opencsv.sourceforge.net/
http://www.csvreader.com/
0
 

Author Comment

by:polkadot
ID: 20339344
csvreader works great, but its a bit bulky, I just wanted some ideas in just parsing it simply ... any other ideas
0
 
LVL 27

Expert Comment

by:mrcoffee365
ID: 20339554
Then you can't use StringTokenizer, you have to write your own parser.  Read the line character by character, check to see if there's a comma, and parse accordingly.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 86

Expert Comment

by:CEHJ
ID: 20339614
http://ostermiller.org/utils/CSV.html

I would not use home-brewed parsing. If it were that simple, there would not be any need for classes such as these
0
 
LVL 26

Expert Comment

by:ksivananth
ID: 20339620
>>StringTokenizer st = new StringTokenizer(line,",");

to

StringTokenizer st = new StringTokenizer( line,",", true );

and discard the odd( which is the comma ) tokens in the traversing process...
0
 
LVL 92

Expert Comment

by:objects
ID: 20340802
you can find a lightweight csv parser here:

http://mindprod.com/zips/csv24.zip
0
 
LVL 16

Accepted Solution

by:
gnoon earned 500 total points
ID: 20344564
I've surveyed the source code of ST. If delemeters are successive (having blank field(s)), they will be group as a delemeter. For example

data: 1,,3,   will become   1,3,   (ST acts with ,, as ,)
return: {1,3,}

ST is not appropriate to parse CSV file.
If you will use JRE 1.4+, use String.split() instead    String[] fields = line.split(",").
If you will use JRE 1.3-, write your own function to parse the line (never use StringTokenizer) and returns array of fields.
0
 
LVL 16

Expert Comment

by:gnoon
ID: 20344587
public String[] parse(String s, char delim)
{
    StringBuffer b = new StringBuffer();
    ArrayList a = new ArrayList();
    char c;
    for(int i=0; i<s.length(); i++)
    {
        c = s.charAt(i);
        if(c == delim)
        {
            a.add(b);
            b = new StringBuffer();
        }
        else b.append(c);
    }
    String[] r = new String[a.size()];
    for(int i=0; i<r.length; i++)
        r[i] = (String) a.get(i);
    return r;
}
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to parse custom JSON to POJO java 4 74
Spring Framework HTTPSession management 1 36
diffSum example 4 37
Way to decrease size of apk file 9 70
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question