• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 585
  • Last Modified:

Java StringTokenizer question

     public static void main(String[] args) {
            // TODO Auto-generated method stub
            BufferedReader in;
            String input;
            StringTokenizer st;
            
            try {
                  String filter;
                  in = new BufferedReader(new FileReader("c:\\testtoken.txt"));
                  input = in.readLine();
                  st = new StringTokenizer(input," ",true);
                  while(st.hasMoreTokens()){
                        filter = st.nextToken();
                        if(filter.length()== 1) filter = st.nextToken();
                              System.out.println(filter + " : " + filter.length());                  
                  }
                  
            } catch (FileNotFoundException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
            } catch (IOException e) {
                  // TODO Auto-generated catch block
                  e.printStackTrace();
            }

      }

Above was just a simple test of the String tokenizer class.  I am working on doing some efficiency tests to see for myself what method is the fastest for splitting a string.  My problem is I don't know why the code above works and produces the following results

This : 4
is : 2
a : 1
test : 4
of : 2
the : 3
Tokenizer : 9
class. : 6
  : 1
I : 1
hope : 4
I : 1
am : 2
wrong : 5
and : 3
all : 3
these : 5
words : 5
should : 6
appear : 6


Why does the If statement catch the strings that only contain the token but not skip the tokens with only 1 character(ie.  "a", "i").  The String.length function appears to return a 1 in both cases(shown by the 1 token I allowed to be displayed ) so why doesn't the if statement skip all single character lines??

I admit this is kind of a remedial question but I seem to be missing something simple and that makes me kind of annoyed...
0
JeroldYoc
Asked:
JeroldYoc
  • 11
  • 8
2 Solutions
 
for_yanCommented:


What happens if you use trim()

st.nextToken().trim();

do uyou need the eparators returned - maybe you don't want to have true
in the constructor - makes thinikng more difficult
0
 
Amitkumar PSr. ConsultantCommented:
JeroldYoc,

st = new StringTokenizer(input," ",true);
the above statement is responsible for the mentioned output.

Try with the following one.
st = new StringTokenizer(input," ",false);


Refer the StringTokenizer API Doc :
1. http://download.oracle.com/javase/1.4.2/docs/api/java/util/StringTokenizer.html
2. http://download.oracle.com/javase/1.5.0/docs/api/java/util/StringTokenizer.html
0
 
for_yanCommented:
You coode works for me

post your input
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
for_yanCommented:

I don't see any one-charatcter words:

My input:
i erter
a dfgdfg
sad  fdgdf
asa fdgd

Open in new window


My output:

  : 1
erter : 5

Open in new window

0
 
for_yanCommented:

This one skipps one-charcater items for sure
(didn't realize you are reading on,ly one line before)


import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

public class StrTokenTest {


    public static void main(String[] args) {
               // TODO Auto-generated method stub
               BufferedReader in;
               String input;
               StringTokenizer st;

               try {
                     String filter;
                     in = new BufferedReader(new FileReader("c:\\temp\\test\\testtoken.txt"));
                     input = in.readLine();
                     st = new StringTokenizer(input);
                     while(st.hasMoreTokens()){
                           filter = st.nextToken();
                           if(filter.length()== 1) filter = st.nextToken();
                                 System.out.println(filter + " : " + filter.length());
                     }

               } catch (FileNotFoundException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               }

         }


}

Open in new window

input
i 3343

Open in new window


output:
3343 : 4

Open in new window

0
 
for_yanCommented:
In most normal cases I just use

new StringTokenizer(input)

and it usually serves very well
skipping all white spaces and returning only real items



0
 
JeroldYocAuthor Commented:
I used the
 st = new StringTokenizer(input," ",true);
so I could distinguish multiple tokens in a row.  if I set it to false the tokenizer ignores all but the first.

I just don't understand why the If statement isn't catching the single letter words. but it is catching the strings that only contain the token.  This whole thing could probably be avoided by using a different token that isn't whitespace but that wouldn't explain why this is happening.

Input from file is as below.  No empty rows just a simple .txt file

This is a test of the Tokenizer class.   I hope I am wrong and all these words should appear
on separate lines.
0
 
for_yanCommented:
Could you please, explain, waht do you meant by
"so I could distinguish multiple tokens in a row" ?

If you just use StringTokenizer(input) it will distinguish
all tokens separated by spaces, tabs, etc.

Do you need to have these spaces returned to you as separate tokens - just space or tab?

Do you care is two tokens are seprated by two spaces rather than one?

0
 
for_yanCommented:
by "in a row" - do you mean "in one line"?
It definitely distingushes as amy tokens on the line as you want
0
 
for_yanCommented:
Please, post the input line which causes your doubts,
I'd like to try to tokenize it with the constructor which
you use, and to understand what is it that puzzles you.
To my mind tokenizer works as expected.
0
 
JeroldYocAuthor Commented:
Eventually I will add new cases for multiple tonens in a succession.  At the moment I just needed to make sure that I could tell and if I change the true to false all consecutive tokens are treated as a single token.

I'm not even sure the StringTOkenizer class is the issue here.  Is it possible the problem is with the way the String.length() function is returning the length? or the way the String.length().toString()  function is handling the length so it is comparing against a different value then it is displaying in the toString function since the value of the string is a string of all whitespace?
0
 
JeroldYocAuthor Commented:
In the first line tonens should have been tokens
0
 
for_yanCommented:

There is output there in your posting, but not what was your input.
Please post the input line which to your mind tokenizer handles improperly - I want to
investigate
0
 
JeroldYocAuthor Commented:
Input at the bottom :

During parsing the first line contains 3 tokens that are 1 character long "a", "I", and "I"

File contains only the 2 lines of text below:
This is a test of the Tokenizer class.   I hope I am wrong and all these words should appear
on separate lines.


0
 
for_yanCommented:

When I feed line "a I I"
it all happens as expected.

This is the code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

public class StrTokenTest {


    public static void main(String[] args) {
               // TODO Auto-generated method stub
               BufferedReader in;
               String input;
               StringTokenizer st;

               try {
                     String filter;
                     in = new BufferedReader(new FileReader("c:\\temp\\test\\testtoken.txt"));
                     input = in.readLine();
                     st = new StringTokenizer(input," ",true);
                     while(st.hasMoreTokens()){

                           filter = st.nextToken();
                         System.out.println("token: " + filter);
                           if(filter.length()== 1) {filter = st.nextToken();
                                 System.out.println("token: " + filter);
                           }
                                // System.out.println(filter + " : " + filter.length());
                     }

               } catch (FileNotFoundException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               } catch (IOException e) {
                     // TODO Auto-generated catch block
                     e.printStackTrace();
               }

         }


}

Open in new window




this is output:
token: a
token:  
token: I
token:  
token: I
        Exception in thread "main" java.util.NoSuchElementException
	at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)

	at StrTokenTest.main(StrTokenTest.java:25)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:110)

Open in new window


So each item and each space in bewteen are tokens
Exception happens because we have two nextToken() methods
per one hasMoreTokens() method

If I use constructor which I normally use
  st = new StringTokenizer(input);

then I just have output:

token: a
token: I
token: I
   Exception in thread "main" java.util.NoSuchElementException
          at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)

	at StrTokenTest.main(StrTokenTest.java:26)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:110)

Open in new window


bacuause in this case delimiting spaces are not considered items




0
 
JeroldYocAuthor Commented:
That's correct and it appears to be working the same way for you as it is for me.  

I realize that I will need to adjust for a file that ends in a space.

That doesn't really explain why the "if" statement in the code displays the single letter tokens when it should skip them...
0
 
for_yanCommented:
I'm sure if you go through the code step by step all that will find the annswer - all if's
are working correctly - guaranteed -
but probably it is not worth it - just use tokenizer according to your now clear understanding

I usually avoid situations whhen
calling nextToken() method witohout prior checkin hasnextTkoen()
- no matter how you think you know all cases - in the end you end up
with exception
0
 
JeroldYocAuthor Commented:
Ok, it did turn out to be a remedial question.  

The reason I was getting the 1 letter tokens is because they are always preceeded by one token.  Since I was already in the if statement due to the preceeding token and performed the nextToken and then immediately displayed I always saw the 1 character tokens.  

Thank you for your help
0
 
JeroldYocAuthor Commented:
simple flaw caused by a combination of a minor logic flaw and a poor set of test data that did not catch the underlying logic flaw.
0
 
JeroldYocAuthor Commented:
Simple logic flaw due to getting in a hurry when setting up the simplest test case...
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 11
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now