Solved

Java parsing, better way to use tokenizer?

Posted on 2007-11-23
10
1,015 Views
Last Modified: 2008-09-20
I'm trying to parse a csv file (probably made in excel, or access).

Each field is separated by a "," (comma) but some fields are empty, so I'm not getting the data correctly.

Below is what I'm doing now. Is there a better way to parse?


This line works fine:
23, something, aTitle, something, Smith

This line gives me wrong values:

21,,anotherTitle,,Jones

Both come from the same table with 4 fields.

Thanks.
//...			

String line = in.readLine();

while(line!=null)

{

	StringTokenizer st = new StringTokenizer(line,",");

	int fieldCount=0;

			

	while(st.hasMoreTokens())

	{

		String nextFieldData = st.nextToken();

		++fieldCount;

				

					

		switch(fieldCount)

		{

			case 1: 

			String ID = (nextFieldData);

			break;

						

			case 3: 

			String Title  = (nextFieldData);

			break;
 

			case 5: 

			String Name  = (nextFieldData);

			break;

		}

//...

Open in new window

0
Comment
Question by:polkadot
10 Comments
 
LVL 26

Expert Comment

by:ksivananth
Comment Utility
there are lot of issues which you have to take care if you do it that ways...

instead try some readily available parsers,

http://opencsv.sourceforge.net/
http://www.csvreader.com/
0
 

Author Comment

by:polkadot
Comment Utility
csvreader works great, but its a bit bulky, I just wanted some ideas in just parsing it simply ... any other ideas
0
 
LVL 26

Expert Comment

by:mrcoffee365
Comment Utility
Then you can't use StringTokenizer, you have to write your own parser.  Read the line character by character, check to see if there's a comma, and parse accordingly.
0
 
LVL 86

Expert Comment

by:CEHJ
Comment Utility
http://ostermiller.org/utils/CSV.html

I would not use home-brewed parsing. If it were that simple, there would not be any need for classes such as these
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 26

Expert Comment

by:ksivananth
Comment Utility
>>StringTokenizer st = new StringTokenizer(line,",");

to

StringTokenizer st = new StringTokenizer( line,",", true );

and discard the odd( which is the comma ) tokens in the traversing process...
0
 
LVL 92

Expert Comment

by:objects
Comment Utility
you can find a lightweight csv parser here:

http://mindprod.com/zips/csv24.zip
0
 
LVL 16

Accepted Solution

by:
gnoon earned 500 total points
Comment Utility
I've surveyed the source code of ST. If delemeters are successive (having blank field(s)), they will be group as a delemeter. For example

data: 1,,3,   will become   1,3,   (ST acts with ,, as ,)
return: {1,3,}

ST is not appropriate to parse CSV file.
If you will use JRE 1.4+, use String.split() instead    String[] fields = line.split(",").
If you will use JRE 1.3-, write your own function to parse the line (never use StringTokenizer) and returns array of fields.
0
 
LVL 16

Expert Comment

by:gnoon
Comment Utility
public String[] parse(String s, char delim)
{
    StringBuffer b = new StringBuffer();
    ArrayList a = new ArrayList();
    char c;
    for(int i=0; i<s.length(); i++)
    {
        c = s.charAt(i);
        if(c == delim)
        {
            a.add(b);
            b = new StringBuffer();
        }
        else b.append(c);
    }
    String[] r = new String[a.size()];
    for(int i=0; i<r.length; i++)
        r[i] = (String) a.get(i);
    return r;
}
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

For customizing the look of your lightweight component and making it look lucid like it was made of glass. Or: how to make your component more Apple-ish ;) This tip assumes your component to be of rectangular shape and completely opaque. (COD…
Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now