?
Solved

how do i know if part of one string and part of other string are same

Posted on 2005-03-16
24
Medium Priority
?
242 Views
Last Modified: 2010-03-31
HI all,

here is an example

I have two strings

1)String str1 = "abc1, def*12*ghi";

second String is

2)String str2 = "zxy,def, mno,qrs,tuv,str";

if i compare these two string i should get true based on both strings have "def". how to achieve this.

Thanks in advance.
0
Comment
Question by:SriVelagapudi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 7
  • 3
  • +2
24 Comments
 
LVL 29

Expert Comment

by:bloodredsun
ID: 13560605
Use String.contains()
0
 
LVL 29

Expert Comment

by:bloodredsun
ID: 13560613
or String.matches() and a regular expression.
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13561456
Also the pre-5.0 way, use String.indexOf:

    boolean matched = str1.indexOf("def") >=0 && str2.indexOf("def") >= 0;

Regards
Jim Cakalic
0
Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

 

Expert Comment

by:implementIT
ID: 13562589
You can use StringTokenizer class based on "," delimiter.
check for your token.

also could use indexOf() for other string.

http://www.tanul.info
0
 
LVL 2

Expert Comment

by:mdebiasi
ID: 13564937
Hi,
If both strings are comma separated tokens use StringTokenizer() + indexOf() function:

String s1 = "ab,cd,de";
String s2 = "xx,yy,cd";
// add these if you want case insensitive comparison:
// s1 = s1.toLowerCase();
// s2 = s1.toLowerCase();
StringTokenizer st = new StringTokenizer(s1, ", " );  // commas, and spaces are used as delimiters
while ( st.hasMoreTokens() ) {
  String r = st.nextToken();
  if ( s2.indexOf( r ) >= 0) {
    // WE CATCHED THE BEAST!
    System.out.rpintln( r );
  }
}

0
 
LVL 29

Expert Comment

by:bloodredsun
ID: 13565062
>> Also the pre-5.0 way, use String.indexOf:

String.contains() is 5.0 but String().matches() is 1.4, but you're absolutely correct in that indexOf will do the job very well and may be easier in some respects for someone new to Java :-)
0
 

Author Comment

by:SriVelagapudi
ID: 13565289
My problem is the first string what i mentioned in my last mail is a dynamic string. it keeps on changing and the second string is a big string with "," delimited.

what i would like to know is how to know any part of the first string is matching any part of the second string?

Right now i was already using StringTokenizer like mdebiasi mentioned but since second string is very big string, processing time is more. i need a short cut way to do this.
0
 
LVL 2

Expert Comment

by:mdebiasi
ID: 13565423
Uhm,
Pre-tokenize the second string and put the tokens in a String array:

String tokens[] = new String[ 1024 ];  // up to 1024 tokens, use Vector if you don't know the MAX number of tokens of the second string
int tokencount = 0;
StringTokenizer st = new StringTokenizer( s2 , ", " );
while ( st.hasMoreTokens() ) {
  tokens[ tokencount++ ] = st.nextToken();
}

// now use indexOf on the "mutant" s1 (DON't TOKENIZE s2 AGAIN)
for (int i = 0; i < tokencount; i++ ) {
  if ( s1.indexOf( tokens[i] ) >= 0 ) {
    // FOUND IT ...
  }
}

If this solution is still too slow then you must use regular expressions: build a regular expression from s2 and test if "mutant" s1 matches it.

0
 

Author Comment

by:SriVelagapudi
ID: 13565687
Right now my code like below

"formulaCells" below is the "," delimited string, which will have around 300 strings.

for(int i=0; i<2000; i++){
                                               StringTokenizer st = new StringTokenizer(formulaCells, ",");
                    boolean formulaFlag = false;
                    while(formulaCells.equals("") == false && st.hasMoreTokens()){
                          String cellName = st.nextToken().toString();
                          String formulaLower = formula.toLowerCase();
                          if(formulaLower.indexOf(cellName) != -1){
                                formulaFlag = true;
                                break;
                          }
                    }

}

the process has to run for 2000 times with different formula and different formulaCells. how to do this to make it faster.
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13566023
How about:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(cells);
    }
    private boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(cells, formula) >= 0;
    }

I would anticipate your using the class this way:

    CellMatcher matcher = new CellMatcher("some,comma,delim,string");
    loop across formulas
        if (matcher.matches(formula)) {
            ...
        }
    ...
    // oops, formulaCells changed!
    matcher.setFormulaCells(formulaCells);

Hope that make sense.
           
FYI, upon the introduction of regex in JDK 1.4, use of StringTokenizer was officially "discouraged" in favor of String.split. It is retained for compatibility purposes.

Regards,
Jim
0
 

Author Comment

by:SriVelagapudi
ID: 13566104
Thanks for your reply mdebiasi.


I tried as you said like pre-tokenize but still it is not making lot of difference.


0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13566490
Oops. That should have been:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
}
0
 

Author Comment

by:SriVelagapudi
ID: 13566596
Thanks for your response Jim.

I tried your solution but i am not getting true at any point of time though i have the string match in the formula.

in "formulaCells" i have "grid_device_adds__5__1,grid_device_adds__5__2,grid_device_adds__5__3,grid_device_adds__5__4,grid_device_adds__5__5" though this is part of the whole string.

in "formula"  i have "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar"

but still i am not getting true from the matches() method. I don't know why? any ideas?
0
 

Author Comment

by:SriVelagapudi
ID: 13566879
I think because of the number in the string binarysearch is not working fine.

any other ideas?
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13567167
Hang on. I'm putting a more complete (working) example together.
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13567340
OK. Here's what I was thinking. Let's start from the beginning with the CellMatcher class:

import java.util.Arrays;

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    // this method is for when you want to see if the formula
    // exactly matches one of the formulaCell values
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
    // this method is for when you want to see if the any of
    // the defined formulaCells is contained in the formula
    public boolean contains(String formula) {
          for (int i = 0; i < _cells.length; ++i) {
                if (formula.indexOf(_cells[i]) >= 0)
                      return true;
          }
          return false;
    }
}

And here's a test class I wrote:

public class CellMatcherTest {

      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(100);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
}

The CellMatcher.matches method was looking for an exact match. So I wrote the contains method to invert this and check whether a formulaCell string appeared anywhere in the formula string. That's at least closer. But you may possibly run into problems with this. For example, what if you had the two formulaCells values grid_device_adds__5__1 and grid_device_adds__5__11? Then if a formula was "=grid_device_adds__5__111", and grid_device_adds__5__111 had not been defined as a formulaCell, contains would return true anyway because it would be doing a simple string match. If that's isn't a problem then I think you're good to go. If it is a problem we'll need to talk more.

Jim
0
 

Author Comment

by:SriVelagapudi
ID: 13567427
I was doing the same thing Jim. I thought that we can do the same thing using binarySearch also.

Because of the for loops it is taking lot of time.

if you see my earlier code which i mentiond, i was almost doing the same thing. Is there any way to avoid the for loop for searching the string?

i am using the outer for loop also which will loop for nearly 2000 times and apart from that if i use for loop for searching, it is taking lot of time. did you get me?
0
 
LVL 19

Expert Comment

by:Jim Cakalic
ID: 13567871
Well, we could try using regular expressions. The brute force way of doing this is to take the formulaCells and do something like this:
    formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

That would yield a regular expression which would match any of the formula cell identifiers anywhere in the formula string. In some testing I did it was about twice as fast as a brute force search iterating across all the formula cell identifiers. But only when I manipulated the formula string in such a way that forced it to search at least half of the array of possible values. But the regex method was very predictable. :-)

If your cell identifiers are very regular in form then we can be a lot smarter about the pattern. For example, if the cell identifier is always grid_device_adds__5__ followed by 1 to 4 digits then we could use a pattern like this:
    .*grid_device_adds__5__1\d{1,4}.*

Now we're really using the power of regular expressions. In my test, which I'll post shortly, this pattern yielded a constant search time that was substantially lower than the brute force search.

Here's the code -- two classes this time.

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CellMatcher {
    private String[] _cells;
    private Pattern _pattern;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
        //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";
        formulaCells = ".*grid_device_adds__5__\\d{1,4}.*";
        _pattern = Pattern.compile(formulaCells, Pattern.CASE_INSENSITIVE);
    }
    // regex force :-)
    public boolean matches(String formula) {
          Matcher matcher = _pattern.matcher(formula);
        return matcher.matches();
    }
    // brute force :-(
    public boolean contains(String formula) {
          formula = formula.toLowerCase();
        for (int i = 0; i < _cells.length; ++i) {
              if (formula.indexOf(_cells[i]) >= 0)
                    return true;
        }
        return false;
    }
}

public class CellMatcherTest {
      private static final int LoopCount = 10000;
      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(3000);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
            System.out.println("matches: " + testMatches(matcher, formula) + "ms");
            System.out.println("contains: " + testContains(matcher, formula) + "ms");
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
      private static long testMatches(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.matches(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }
      
      private static long testContains(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.contains(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }

}
0
 

Author Comment

by:SriVelagapudi
ID: 13569996
Thanks for the code you sent me, I will try this code and let you know about it.
0
 

Author Comment

by:SriVelagapudi
ID: 13574895
Thanks for the reply jim.

I get different formulas with different strings. it won't be same as "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar" this all the time.

Also in the above example what happens after setFormulaCells() call?

what does this line do?   //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

Thanks once again.
0
 
LVL 19

Accepted Solution

by:
Jim Cakalic earned 80 total points
ID: 13579771
I was experimenting with two different ways of building the expression. The way that is commented out:
    formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

creates a regular expression from the comma-delimited cell identifiers by simply putting them in a group (with parens), replacing the commas with the vertical bar (which is OR in regular expressions) and then allowing for any character to precede and succeed the group.

The line after it:
     formulaCells = ".*grid_device_adds__5__\\d{1,4}.*";

is building a different expression that assumes the cell identifiers follow a pattern. According to your last post, that isn't true so that option is out.

I was pondering whether we could parse the formula itself and extract candidate cell identifiers ...
0
 

Author Comment

by:SriVelagapudi
ID: 13595211
Thanks for your response jim.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses
Course of the Month10 days, 8 hours left to enroll

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question