Solved

StringTokenizer Vs String.split

Posted on 2004-08-26
10
6,294 Views
Last Modified: 2007-11-27
Hello All,

Till date my perception is StringTokenizer is slower than String.split();

Is that correct??



If so,

Here is a sequence of methods that need to be executed in sequence for splitting the String

In String class
String.split(regex) calls String.split(regex,limit)
String.split(regex,limit)  calls Pattern.compile(regex).split(this, limit)

In pattern class
Pattern.comile(regex) calls new Pattern(regex, flag)
in turn this constructor calls compile() method for creating an Instance of Pattern
after that.

Now in split method of pattern class, it created Matcher instance and until Matcher.find() it is doing normal split logic using string.substring()



So for simple splitting, It created an instance of Pattern and matcher and to get final result it need to pass all the above methods.

whereas In StringTokenizer it just has 3 calls internally and ofcourse hasMoreTokens() and nextToken()


scanToken() and string.substring() and setMaxDelimChar() and it is not creating any other class instances also like one it has done with string.split();

Constructior calls setMaxDelimChar
nextToken calls scanToken and string.substring()

So I would like to know which do you think is the better one.

Thanks
Sudhakar
0
Comment
Question by:sudhakar_koundinya
  • 3
  • 3
  • 2
  • +2
10 Comments
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 20 total points
ID: 11910220
I use both.  It is usually pretty minor so I have not noticed any difference in performance.  I tend to use split when I am looking at the solution with an array in my head.  Namely, I will need the first and third and the rest don't matter to me.  One interesting aspect with the split is with the variable arguements with Java5.  I will have to play around with that some time but it kinda gives me some ideas.

The StringTokenizer is great when wanting to work with multiple deliminators and allows control similar to the collections package.  I kinda feel that using it requires that you have a more fine-grained control over what you do/need with the string.
 
0
 
LVL 35

Assisted Solution

by:girionis
girionis earned 20 total points
ID: 11910252
Some info:

StringTokenizer vs. Splitting Strings

This feature actually appeared in J2SE 1.4, but is noteworthy anyway as a new method for splitting strings. The legacy class StringTokenizer (in the java.util package) breaks a given string into distinct String elements, using whitespace or some other delimiter specified explicitly. For example,

    StringTokenizer st = new StringTokenizer("This is a string");
    while (st.hasMoreTokens()) {
        System.out.println(st.nextToken());
    }

results in the output

    This
    is
    a
    string

However, one problem is that it discards tokens that are empty strings, and is unreliable when meant to be used as a string iterator. For example,

    // note two consecutive commas separating an empty string
    StringTokenizer st = new StringTokenizer("This,is,,string", ","); // specify comma delimiter
    while (st.hasMoreTokens()) {
        System.out.println(st.nextToken());
    }

results in the output

    This
    is
    string

This introduces a problem parsing comma-separated values where empty strings are valid and should be preserved. With StringTokenizer, the only method for detecting them is to use the variant that also returns the delimiters, making the code more complex.

A much better solution comes with the split() method provided in the String class, which is supported by the regular expression (regex) facilities introduced in J2SE 1.4

    String[] tokens = String.split(",","This,is,,string");
    for (int i = 0, i < tokens.length; ++i) {
        System.out.println(tokens[i]);
    }

results in output preserving the empty string

    This
    is
         
    String

The split() method also has the added advantage that it takes a regular expression for the delimiter argument. For example, if one has comma-separated values with whitespace following commas that should be discarded, it's simple to accomplish.

    // note space after commas, regex for delimiter arg
    String[] tokens = String.split("\\s*,\\s*"," This , is ,  , csv string ".trim());
    for (int i = 0, i < tokens.length; ++i) {
        System.out.println(tokens[i]);
    }

This technique trims all the elements of leading and trailing whitespace while preserving internal word spaces with the output:

    This
    is
         
    csv string

The J2SE API Specification now documents StringTokenizer as a legacy class and discourages its use, but doesn't go so far as to deprecate it. Personally, I would also like to see a counterpart for joining an array of String objects into a delimited string, much like Perl provides.

    // note: this is not in the J2SE API
    /** Joins the specified tokens into a delimited string. */
    public static String join(String delimiter, Object[] tokens);

from: http://www.ociweb.com/jnb/jnbAug2004.html
0
 
LVL 18

Accepted Solution

by:
armoghan earned 30 total points
ID: 11910302
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910469
Hello All,

Thanks for your inputs. I know StringTokenizer and String.split has their own advantages. But coming to simple splits which occurs  multiple times I believe String.split is much slower

After seeing armoghan's first link i have created following test case. And the results shows my belief is correct

And results are like this
My StringTokenizer :90 -  (May be this can be improved with much efficient proccess)
java.util.StringTokenizer :10
String Split :141



import java.util.*;
class StringTokener implements Enumeration
{
      private Vector tokens=null;
      private int count=0;
      public StringTokener(String main,String sep)
      {
            String token="";
            tokens=new Vector();
            int posi=0;
            while(true)
            {
                  posi=main.indexOf(sep);
                  if(posi<0) break;
                  token=main.substring(0,posi);
                  tokens.add(token);
                  main=main.substring(posi+1);

            }
            tokens.add(main);      
      }
      public boolean hasMoreElements()
      {
            return count<tokens.size();
      }
      public Object nextElement()
      {
            count++;
            if(count>tokens.size() || count<=0)
             throw new NoSuchElementException();
            return tokens.get(count-1);
      }
      public boolean hasMoreTokens()
      {
                  return hasMoreElements();
      }
      public String nextToken()
      {
            return (String)nextElement();
      }
      


      public static void main(String[] args)
      {
            
            String aBigString="";
            for(int i=0;i<10000;i++)
            {
                  aBigString=aBigString+",";
            }

            Date start=new Date();
            StringTokener st=new StringTokener(aBigString,",");
            while(st.hasMoreElements())
            {
                  String str=st.nextToken();
            }
            Date end=new Date();
            System.err.println("My StringTokenizer :"+(end.getTime()-start.getTime()));
             st=null;

            start=new Date();
            StringTokenizer st1=new StringTokenizer(aBigString,",");
            while(st1.hasMoreElements())
            {
                  String str=st1.nextToken();
            }
            end=new Date();
            st1=null;
;
            System.err.println("java.util.StringTokenizer :"+(end.getTime()-start.getTime()));

            start=new Date();
            String array[]=aBigString.split(",");
            for(int i=0;i<array.length;++i)
            {

                  String str=array[i];
            }
            end=new Date();
            array=null;
            aBigString=null;
            System.err.println("String Split :"+(end.getTime()-start.getTime()));




      }
}


0
 
LVL 35

Expert Comment

by:girionis
ID: 11910550
So are you only interested in the speed?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910553
yes
0
 
LVL 35

Expert Comment

by:girionis
ID: 11910567
I am not sure which one is faster but if your results show StringTokenizer faster then I'd go with it.
0
 
LVL 7

Expert Comment

by:tomboshell
ID: 11910778
If you are not interested in losing tokens then go with the stringtokenizer for the speed.  If it is an empty token (two separators with nothing inbetween) then it wont return it by default.  You can have the StringTokenizer return the delimanator as a token and then handle such situations, but then you will loose the speed gains.  Suppose that (actuall program not just a test for speed) that you have a program that parses the data and *needs* the first two tokens.  It may happen that one time the second token actually does not have an entry, but the third does.  So, the StringTokenizer gives you the third token, and unless you add some extra handling (like a counter, or consider the actual deliminators) the method that you need to enter the first two parameters receives the false value for the second.  Hopefully, the receiving method would have some validation routines, but it is a bit late for that.  The problem would be best found where it would occur.  

My point here is to not only to know that the StringTokenizer may be faster, but to also know that it does have some limitations.  And at times you may decide to use split just for its simplicity.
0
 
LVL 14

Author Comment

by:sudhakar_koundinya
ID: 11910844
>>My point here is to not only to know that the StringTokenizer may be faster, but to also know that it does have some  limitations.

Yes, I do agree. But for my test scenrios, I will not face any problem if I go with StringTokenizer.

Actually I am looking at optimization of my code in terms of fast execution.

Now in my current project, String.split will be called atleast 2 million times. So on Higher end, this will no doubt effects the my application performance.For my current application, depending on situation, Some times I need to go for StringTokenizer and sometimes I need to go for String.split.

Similar to this I have some more problems that effects the application performance. Some of them I have already posted and I am still identifying the problems.

thanks,
sudhakar
Thanks
Sudhakar
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 11912535
In my test, StringTokenizer, for splitting urls, comes out about 6 times faster
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Introduction Java can be integrated with native programs using an interface called JNI(Java Native Interface). Native programs are programs which can directly run on the processor. JNI is simply a naming and calling convention so that the JVM (Java…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now