We help IT Professionals succeed at work.

Missing field and StringTokenizer

Koka
Koka asked
on
Well, I need to split comma delimited line like:
1,2,,4
and get '1' '2' '' '4' , i.e. I need an empty string returned if there are two subsequent delimiters, but nextToken method just skips that subsequent delimiters  producing '1' '2' '4'.

Any ideas? Or should I forget about Tokenizer and write my own (which will be slowlier then Tokenizer I guess, so I'd like to stay with built-in Tokenizer methods)?

Comment
Watch Question

What about doing a substring search for the pattern ',,' and then breaking up into two strings, run the tokenizer on each and then concatenate your two results?
Commented:
Or how about using the constructor:

public StringTokenizer(String str,String delim,boolean returnTokens);

with the third (returntokens) set to true. Then you could notice the comma's going by.
Jim CakalicSenior Engineer

Commented:
Exactly what I was thinking imladris. Although I was late to the party because I was writing a code fragment to demonstrate:

    private String[] tokenize(String str, String delim) {
        StringTokenizer tokenizer = new StringTokenizer(str, delim, true);
        ArrayList list = new ArrayList();
        boolean lastWasDelim = false;
        while (tokenizer.hasMoreTokens()) {
            String token = tokenizer.nextToken();
            if (delim.indexOf(token) >= 0) {
                // found a delimiter
                if (lastWasDelim) {
                    // two or more consecutive delimiters means an empty token
                    list.add("");
                }
                lastWasDelim = true;
            } else {
                list.add(token);
                lastWasDelim = false;
            }
        }
        return (String[])list.toArray(new String[list.size()]);
    }

Jim

Author

Commented:
Well, yes, returntokens will do the trick with minimal effort. So, I give points to imladris, as he was the first to suggest it.
I suspect it will be only slower by a factor of 2 than 'pure' Tokenizer (as returning delimiters in fact doubles number of tokens to process), anyway my files are not so large to bother and there seems to be no better approach.
Thanks to Jim and Kanthonym too.

Explore More ContentExplore courses, solutions, and other research materials related to this topic.