Link to home
Start Free TrialLog in
Avatar of Koka
Koka

asked on

Missing field and StringTokenizer

Well, I need to split comma delimited line like:
1,2,,4
and get '1' '2' '' '4' , i.e. I need an empty string returned if there are two subsequent delimiters, but nextToken method just skips that subsequent delimiters  producing '1' '2' '4'.

Any ideas? Or should I forget about Tokenizer and write my own (which will be slowlier then Tokenizer I guess, so I'd like to stay with built-in Tokenizer methods)?

Avatar of kanthonym
kanthonym

What about doing a substring search for the pattern ',,' and then breaking up into two strings, run the tokenizer on each and then concatenate your two results?
ASKER CERTIFIED SOLUTION
Avatar of imladris
imladris
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Jim Cakalic
Exactly what I was thinking imladris. Although I was late to the party because I was writing a code fragment to demonstrate:

    private String[] tokenize(String str, String delim) {
        StringTokenizer tokenizer = new StringTokenizer(str, delim, true);
        ArrayList list = new ArrayList();
        boolean lastWasDelim = false;
        while (tokenizer.hasMoreTokens()) {
            String token = tokenizer.nextToken();
            if (delim.indexOf(token) >= 0) {
                // found a delimiter
                if (lastWasDelim) {
                    // two or more consecutive delimiters means an empty token
                    list.add("");
                }
                lastWasDelim = true;
            } else {
                list.add(token);
                lastWasDelim = false;
            }
        }
        return (String[])list.toArray(new String[list.size()]);
    }

Jim
Avatar of Koka

ASKER

Well, yes, returntokens will do the trick with minimal effort. So, I give points to imladris, as he was the first to suggest it.
I suspect it will be only slower by a factor of 2 than 'pure' Tokenizer (as returning delimiters in fact doubles number of tokens to process), anyway my files are not so large to bother and there seems to be no better approach.
Thanks to Jim and Kanthonym too.