Solved

StringTokenizer Vs String.split

Posted on 2004-08-26
10
6,277 Views
Last Modified: 2007-11-27
Hello All,

Till date my perception is StringTokenizer is slower than String.split();

Is that correct??



If so,

Here is a sequence of methods that need to be executed in sequence for splitting the String

In String class
String.split(regex) calls String.split(regex,limit)
String.split(regex,limit)  calls Pattern.compile(regex).split(this, limit)

In pattern class
Pattern.comile(regex) calls new Pattern(regex, flag)
in turn this constructor calls compile() method for creating an Instance of Pattern
after that.

Now in split method of pattern class, it created Matcher instance and until Matcher.find() it is doing normal split logic using string.substring()



So for simple splitting, It created an instance of Pattern and matcher and to get final result it need to pass all the above methods.

whereas In StringTokenizer it just has 3 calls internally and ofcourse hasMoreTokens() and nextToken()


scanToken() and string.substring() and setMaxDelimChar() and it is not creating any other class instances also like one it has done with string.split();

Constructior calls setMaxDelimChar
nextToken calls scanToken and string.substring()

So I would like to know which do you think is the better one.

Thanks
Sudhakar
0
Comment
Question by:sudhakar_koundinya
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 20 total points
ID: 11910220
I use both.  It is usually pretty minor so I have not noticed any difference in performance.  I tend to use split when I am looking at the solution with an array in my head.  Namely, I will need the first and third and the rest don't matter to me.  One interesting aspect with the split is with the variable arguements with Java5.  I will have to play around with that some time but it kinda gives me some ideas.

The StringTokenizer is great when wanting to work with multiple deliminators and allows control similar to the collections package.  I kinda feel that using it requires that you have a more fine-grained control over what you do/need with the string.
 
0
 
LVL 35

Assisted Solution

by:girionis
girionis earned 20 total points
ID: 11910252
Some info:

StringTokenizer vs. Splitting Strings

This feature actually appeared in J2SE 1.4, but is noteworthy anyway as a new method for splitting strings. The legacy class StringTokenizer (in the java.util package) breaks a given string into distinct String elements, using whitespace or some other delimiter specified explicitly. For example,

    StringTokenizer st = new StringTokenizer("This is a string");
    while (st.hasMoreTokens()) {
        System.out.println(st.nextToken());
    }

results in the output

    This
    is
    a
    string

However, one problem is that it discards tokens that are empty strings, and is unreliable when meant to be used as a string iterator. For example,

    // note two consecutive commas separating an empty string
    StringTokenizer st = new StringTokenizer("This,is,,string", ","); // specify comma delimiter
    while (st.hasMoreTokens()) {
        System.out.println(st.nextToken());
    }

results in the output

    This
    is
    string

This introduces a problem parsing comma-separated values where empty strings are valid and should be preserved. With StringTokenizer, the only method for detecting them is to use the variant that also returns the delimiters, making the code more complex.

A much better solution comes with the split() method provided in the String class, which is supported by the regular expression (regex) facilities introduced in J2SE 1.4

    String[] tokens = String.split(",","This,is,,string");
    for (int i = 0, i < tokens.length; ++i) {
        System.out.println(tokens[i]);
    }

results in output preserving the empty string

    This
    is
         
    String

The split() method also has the added advantage that it takes a regular expression for the delimiter argument. For example, if one has comma-separated values with whitespace following commas that should be discarded, it's simple to accomplish.

    // note space after commas, regex for delimiter arg
    String[] tokens = String.split("\\s*,\\s*"," This , is ,  , csv string ".trim());
    for (int i = 0, i < tokens.length; ++i) {
        System.out.println(tokens[i]);
    }

This technique trims all the elements of leading and trailing whitespace while preserving internal word spaces with the output:

    This
    is
         
    csv string

The J2SE API Specification now documents StringTokenizer as a legacy class and discourages its use, but doesn't go so far as to deprecate it. Personally, I would also like to see a counterpart for joining an array of String objects into a delimited string, much like Perl provides.

    // note: this is not in the J2SE API
    /** Joins the specified tokens into a delimited string. */
    public static String join(String delimiter, Object[] tokens);

from: http://www.ociweb.com/jnb/jnbAug2004.html
0
 
LVL 18

Accepted Solution

by:
armoghan earned 30 total points
ID: 11910302
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910469
Hello All,

Thanks for your inputs. I know StringTokenizer and String.split has their own advantages. But coming to simple splits which occurs  multiple times I believe String.split is much slower

After seeing armoghan's first link i have created following test case. And the results shows my belief is correct

And results are like this
My StringTokenizer :90 -  (May be this can be improved with much efficient proccess)
java.util.StringTokenizer :10
String Split :141



import java.util.*;
class StringTokener implements Enumeration
{
      private Vector tokens=null;
      private int count=0;
      public StringTokener(String main,String sep)
      {
            String token="";
            tokens=new Vector();
            int posi=0;
            while(true)
            {
                  posi=main.indexOf(sep);
                  if(posi<0) break;
                  token=main.substring(0,posi);
                  tokens.add(token);
                  main=main.substring(posi+1);

            }
            tokens.add(main);      
      }
      public boolean hasMoreElements()
      {
            return count<tokens.size();
      }
      public Object nextElement()
      {
            count++;
            if(count>tokens.size() || count<=0)
             throw new NoSuchElementException();
            return tokens.get(count-1);
      }
      public boolean hasMoreTokens()
      {
                  return hasMoreElements();
      }
      public String nextToken()
      {
            return (String)nextElement();
      }
      


      public static void main(String[] args)
      {
            
            String aBigString="";
            for(int i=0;i<10000;i++)
            {
                  aBigString=aBigString+",";
            }

            Date start=new Date();
            StringTokener st=new StringTokener(aBigString,",");
            while(st.hasMoreElements())
            {
                  String str=st.nextToken();
            }
            Date end=new Date();
            System.err.println("My StringTokenizer :"+(end.getTime()-start.getTime()));
             st=null;

            start=new Date();
            StringTokenizer st1=new StringTokenizer(aBigString,",");
            while(st1.hasMoreElements())
            {
                  String str=st1.nextToken();
            }
            end=new Date();
            st1=null;
;
            System.err.println("java.util.StringTokenizer :"+(end.getTime()-start.getTime()));

            start=new Date();
            String array[]=aBigString.split(",");
            for(int i=0;i<array.length;++i)
            {

                  String str=array[i];
            }
            end=new Date();
            array=null;
            aBigString=null;
            System.err.println("String Split :"+(end.getTime()-start.getTime()));




      }
}


0
 
LVL 35

Expert Comment

by:girionis
ID: 11910550
So are you only interested in the speed?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910553
yes
0
 
LVL 35

Expert Comment

by:girionis
ID: 11910567
I am not sure which one is faster but if your results show StringTokenizer faster then I'd go with it.
0
 
LVL 7

Expert Comment

by:tomboshell
ID: 11910778
If you are not interested in losing tokens then go with the stringtokenizer for the speed.  If it is an empty token (two separators with nothing inbetween) then it wont return it by default.  You can have the StringTokenizer return the delimanator as a token and then handle such situations, but then you will loose the speed gains.  Suppose that (actuall program not just a test for speed) that you have a program that parses the data and *needs* the first two tokens.  It may happen that one time the second token actually does not have an entry, but the third does.  So, the StringTokenizer gives you the third token, and unless you add some extra handling (like a counter, or consider the actual deliminators) the method that you need to enter the first two parameters receives the false value for the second.  Hopefully, the receiving method would have some validation routines, but it is a bit late for that.  The problem would be best found where it would occur.  

My point here is to not only to know that the StringTokenizer may be faster, but to also know that it does have some limitations.  And at times you may decide to use split just for its simplicity.
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910844
>>My point here is to not only to know that the StringTokenizer may be faster, but to also know that it does have some  limitations.

Yes, I do agree. But for my test scenrios, I will not face any problem if I go with StringTokenizer.

Actually I am looking at optimization of my code in terms of fast execution.

Now in my current project, String.split will be called atleast 2 million times. So on Higher end, this will no doubt effects the my application performance.For my current application, depending on situation, Some times I need to go for StringTokenizer and sometimes I need to go for String.split.

Similar to this I have some more problems that effects the application performance. Some of them I have already posted and I am still identifying the problems.

thanks,
sudhakar
Thanks
Sudhakar
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 11912535
In my test, StringTokenizer, for splitting urls, comes out about 6 times faster
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now