We help IT Professionals succeed at work.

Java StringTokenizer question

JeroldYoc
JeroldYoc asked
on
638 Views
Last Modified: 2012-05-11
     public static void main(String[] args) {
            // TODO Auto-generated method stub
            BufferedReader in;
            String input;
            StringTokenizer st;
            
            try {
                  String filter;
                  in = new BufferedReader(new FileReader("c:\\testtoken.txt"));
                  input = in.readLine();
                  st = new StringTokenizer(input," ",true);
                  while(st.hasMoreTokens()){
                        filter = st.nextToken();
                        if(filter.length()== 1) filter = st.nextToken();
                              System.out.println(filter + " : " + filter.length());                  
                  }
                  
            } catch (FileNotFoundException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
            } catch (IOException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
            }

      }

Above was just a simple test of the String tokenizer class.  I am working on doing some efficiency tests to see for myself what method is the fastest for splitting a string.  My problem is I don't know why the code above works and produces the following results

This : 4
is : 2
a : 1
test : 4
of : 2
the : 3
Tokenizer : 9
class. : 6
  : 1
I : 1
hope : 4
I : 1
am : 2
wrong : 5
and : 3
all : 3
these : 5
words : 5
should : 6
appear : 6


Why does the If statement catch the strings that only contain the token but not skip the tokens with only 1 character(ie.  "a", "i").  The String.length function appears to return a 1 in both cases(shown by the 1 token I allowed to be displayed ) so why doesn't the if statement skip all single character lines??

I admit this is kind of a remedial question but I seem to be missing something simple and that makes me kind of annoyed...
Comment
Watch Question

Awarded 2011
Awarded 2011

Commented:


What happens if you use trim()

st.nextToken().trim();

do uyou need the eparators returned - maybe you don't want to have true
in the constructor - makes thinikng more difficult
Amitkumar PSr. Consultant
CERTIFIED EXPERT

Commented:
JeroldYoc,

st = new StringTokenizer(input," ",true);
the above statement is responsible for the mentioned output.

Try with the following one.
st = new StringTokenizer(input," ",false);


Refer the StringTokenizer API Doc :
1. http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
2. http://download.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html
Awarded 2011
Awarded 2011

Commented:
You coode works for me

post your input
Awarded 2011
Awarded 2011

Commented:

I don't see any one-charatcter words:

My input:
i erter
a dfgdfg
sad  fdgdf
asa fdgd

Open in new window


My output:

  : 1
erter : 5

Open in new window

Awarded 2011
Awarded 2011

Commented:

This one skipps one-charcater items for sure
(didn't realize you are reading on,ly one line before)


import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

public class StrTokenTest {


    public static void main(String[] args) {
               // TODO Auto-generated method stub
               BufferedReader in;
               String input;
               StringTokenizer st;

               try {
                     String filter;
                     in = new BufferedReader(new FileReader("c:\\temp\\test\\testtoken.txt"));
                     input = in.readLine();
                     st = new StringTokenizer(input);
                     while(st.hasMoreTokens()){
                           filter = st.nextToken();
                           if(filter.length()== 1) filter = st.nextToken();
                                 System.out.println(filter + " : " + filter.length());
                     }

               } catch (FileNotFoundException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               }

         }


}

Open in new window

input
i 3343

Open in new window


output:
3343 : 4

Open in new window

Awarded 2011
Awarded 2011

Commented:
In most normal cases I just use

new StringTokenizer(input)

and it usually serves very well
skipping all white spaces and returning only real items



Author

Commented:
I used the
 st = new StringTokenizer(input," ",true);
so I could distinguish multiple tokens in a row.  if I set it to false the tokenizer ignores all but the first.

I just don't understand why the If statement isn't catching the single letter words. but it is catching the strings that only contain the token.  This whole thing could probably be avoided by using a different token that isn't whitespace but that wouldn't explain why this is happening.

Input from file is as below.  No empty rows just a simple .txt file

This is a test of the Tokenizer class.   I hope I am wrong and all these words should appear
on separate lines.
Awarded 2011
Awarded 2011

Commented:
Could you please, explain, waht do you meant by
"so I could distinguish multiple tokens in a row" ?

If you just use StringTokenizer(input) it will distinguish
all tokens separated by spaces, tabs, etc.

Do you need to have these spaces returned to you as separate tokens - just space or tab?

Do you care is two tokens are seprated by two spaces rather than one?

Awarded 2011
Awarded 2011

Commented:
by "in a row" - do you mean "in one line"?
It definitely distingushes as amy tokens on the line as you want
Awarded 2011
Awarded 2011

Commented:
Please, post the input line which causes your doubts,
I'd like to try to tokenize it with the constructor which
you use, and to understand what is it that puzzles you.
To my mind tokenizer works as expected.

Author

Commented:
Eventually I will add new cases for multiple tonens in a succession.  At the moment I just needed to make sure that I could tell and if I change the true to false all consecutive tokens are treated as a single token.

I'm not even sure the StringTOkenizer class is the issue here.  Is it possible the problem is with the way the String.length() function is returning the length? or the way the String.length().toString()  function is handling the length so it is comparing against a different value then it is displaying in the toString function since the value of the string is a string of all whitespace?

Author

Commented:
In the first line tonens should have been tokens
Awarded 2011
Awarded 2011

Commented:

There is output there in your posting, but not what was your input.
Please post the input line which to your mind tokenizer handles improperly - I want to
investigate

Author

Commented:
Input at the bottom :

During parsing the first line contains 3 tokens that are 1 character long "a", "I", and "I"

File contains only the 2 lines of text below:
This is a test of the Tokenizer class.   I hope I am wrong and all these words should appear
on separate lines.


Awarded 2011
Awarded 2011

Commented:

When I feed line "a I I"
it all happens as expected.

This is the code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

public class StrTokenTest {


    public static void main(String[] args) {
               // TODO Auto-generated method stub
               BufferedReader in;
               String input;
               StringTokenizer st;

               try {
                     String filter;
                     in = new BufferedReader(new FileReader("c:\\temp\\test\\testtoken.txt"));
                     input = in.readLine();
                     st = new StringTokenizer(input," ",true);
                     while(st.hasMoreTokens()){

                           filter = st.nextToken();
                         System.out.println("token: " + filter);
                           if(filter.length()== 1) {filter = st.nextToken();
                                 System.out.println("token: " + filter);
                           }
                                // System.out.println(filter + " : " + filter.length());
                     }

               } catch (FileNotFoundException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               }

         }


}

Open in new window




this is output:
token: a
token:  
token: I
token:  
token: I
        Exception in thread "main" java.util.NoSuchElementException
	at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)

	at StrTokenTest.main(StrTokenTest.java:25)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:110)

Open in new window


So each item and each space in bewteen are tokens
Exception happens because we have two nextToken() methods
per one hasMoreTokens() method

If I use constructor which I normally use
  st = new StringTokenizer(input);

then I just have output:

token: a
token: I
token: I
   Exception in thread "main" java.util.NoSuchElementException
          at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)

	at StrTokenTest.main(StrTokenTest.java:26)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:110)

Open in new window


bacuause in this case delimiting spaces are not considered items




Author

Commented:
That's correct and it appears to be working the same way for you as it is for me.  

I realize that I will need to adjust for a file that ends in a space.

That doesn't really explain why the "if" statement in the code displays the single letter tokens when it should skip them...
Awarded 2011
Awarded 2011
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
simple flaw caused by a combination of a minor logic flaw and a poor set of test data that did not catch the underlying logic flaw.

Author

Commented:
Simple logic flaw due to getting in a hurry when setting up the simplest test case...
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.