Link to home
Start Free TrialLog in
Avatar of SriVelagapudi
SriVelagapudi

asked on

how do i know if part of one string and part of other string are same

HI all,

here is an example

I have two strings

1)String str1 = "abc1, def*12*ghi";

second String is

2)String str2 = "zxy,def, mno,qrs,tuv,str";

if i compare these two string i should get true based on both strings have "def". how to achieve this.

Thanks in advance.
Avatar of bloodredsun
bloodredsun
Flag of Australia image

Use String.contains()
or String.matches() and a regular expression.
Also the pre-5.0 way, use String.indexOf:

    boolean matched = str1.indexOf("def") >=0 && str2.indexOf("def") >= 0;

Regards
Jim Cakalic
Avatar of implementIT
implementIT

You can use StringTokenizer class based on "," delimiter.
check for your token.

also could use indexOf() for other string.

http://www.tanul.info
Hi,
If both strings are comma separated tokens use StringTokenizer() + indexOf() function:

String s1 = "ab,cd,de";
String s2 = "xx,yy,cd";
// add these if you want case insensitive comparison:
// s1 = s1.toLowerCase();
// s2 = s1.toLowerCase();
StringTokenizer st = new StringTokenizer(s1, ", " );  // commas, and spaces are used as delimiters
while ( st.hasMoreTokens() ) {
  String r = st.nextToken();
  if ( s2.indexOf( r ) >= 0) {
    // WE CATCHED THE BEAST!
    System.out.rpintln( r );
  }
}

>> Also the pre-5.0 way, use String.indexOf:

String.contains() is 5.0 but String().matches() is 1.4, but you're absolutely correct in that indexOf will do the job very well and may be easier in some respects for someone new to Java :-)
Avatar of SriVelagapudi

ASKER

My problem is the first string what i mentioned in my last mail is a dynamic string. it keeps on changing and the second string is a big string with "," delimited.

what i would like to know is how to know any part of the first string is matching any part of the second string?

Right now i was already using StringTokenizer like mdebiasi mentioned but since second string is very big string, processing time is more. i need a short cut way to do this.
Uhm,
Pre-tokenize the second string and put the tokens in a String array:

String tokens[] = new String[ 1024 ];  // up to 1024 tokens, use Vector if you don't know the MAX number of tokens of the second string
int tokencount = 0;
StringTokenizer st = new StringTokenizer( s2 , ", " );
while ( st.hasMoreTokens() ) {
  tokens[ tokencount++ ] = st.nextToken();
}

// now use indexOf on the "mutant" s1 (DON't TOKENIZE s2 AGAIN)
for (int i = 0; i < tokencount; i++ ) {
  if ( s1.indexOf( tokens[i] ) >= 0 ) {
    // FOUND IT ...
  }
}

If this solution is still too slow then you must use regular expressions: build a regular expression from s2 and test if "mutant" s1 matches it.

Right now my code like below

"formulaCells" below is the "," delimited string, which will have around 300 strings.

for(int i=0; i<2000; i++){
                                               StringTokenizer st = new StringTokenizer(formulaCells, ",");
                    boolean formulaFlag = false;
                    while(formulaCells.equals("") == false && st.hasMoreTokens()){
                          String cellName = st.nextToken().toString();
                          String formulaLower = formula.toLowerCase();
                          if(formulaLower.indexOf(cellName) != -1){
                                formulaFlag = true;
                                break;
                          }
                    }

}

the process has to run for 2000 times with different formula and different formulaCells. how to do this to make it faster.
How about:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(cells);
    }
    private boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(cells, formula) >= 0;
    }

I would anticipate your using the class this way:

    CellMatcher matcher = new CellMatcher("some,comma,delim,string");
    loop across formulas
        if (matcher.matches(formula)) {
            ...
        }
    ...
    // oops, formulaCells changed!
    matcher.setFormulaCells(formulaCells);

Hope that make sense.
           
FYI, upon the introduction of regex in JDK 1.4, use of StringTokenizer was officially "discouraged" in favor of String.split. It is retained for compatibility purposes.

Regards,
Jim
Thanks for your reply mdebiasi.


I tried as you said like pre-tokenize but still it is not making lot of difference.


Oops. That should have been:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
}
Thanks for your response Jim.

I tried your solution but i am not getting true at any point of time though i have the string match in the formula.

in "formulaCells" i have "grid_device_adds__5__1,grid_device_adds__5__2,grid_device_adds__5__3,grid_device_adds__5__4,grid_device_adds__5__5" though this is part of the whole string.

in "formula"  i have "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar"

but still i am not getting true from the matches() method. I don't know why? any ideas?
I think because of the number in the string binarysearch is not working fine.

any other ideas?
Hang on. I'm putting a more complete (working) example together.
OK. Here's what I was thinking. Let's start from the beginning with the CellMatcher class:

import java.util.Arrays;

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    // this method is for when you want to see if the formula
    // exactly matches one of the formulaCell values
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
    // this method is for when you want to see if the any of
    // the defined formulaCells is contained in the formula
    public boolean contains(String formula) {
          for (int i = 0; i < _cells.length; ++i) {
                if (formula.indexOf(_cells[i]) >= 0)
                      return true;
          }
          return false;
    }
}

And here's a test class I wrote:

public class CellMatcherTest {

      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(100);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
}

The CellMatcher.matches method was looking for an exact match. So I wrote the contains method to invert this and check whether a formulaCell string appeared anywhere in the formula string. That's at least closer. But you may possibly run into problems with this. For example, what if you had the two formulaCells values grid_device_adds__5__1 and grid_device_adds__5__11? Then if a formula was "=grid_device_adds__5__111", and grid_device_adds__5__111 had not been defined as a formulaCell, contains would return true anyway because it would be doing a simple string match. If that's isn't a problem then I think you're good to go. If it is a problem we'll need to talk more.

Jim
I was doing the same thing Jim. I thought that we can do the same thing using binarySearch also.

Because of the for loops it is taking lot of time.

if you see my earlier code which i mentiond, i was almost doing the same thing. Is there any way to avoid the for loop for searching the string?

i am using the outer for loop also which will loop for nearly 2000 times and apart from that if i use for loop for searching, it is taking lot of time. did you get me?
Well, we could try using regular expressions. The brute force way of doing this is to take the formulaCells and do something like this:
    formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

That would yield a regular expression which would match any of the formula cell identifiers anywhere in the formula string. In some testing I did it was about twice as fast as a brute force search iterating across all the formula cell identifiers. But only when I manipulated the formula string in such a way that forced it to search at least half of the array of possible values. But the regex method was very predictable. :-)

If your cell identifiers are very regular in form then we can be a lot smarter about the pattern. For example, if the cell identifier is always grid_device_adds__5__ followed by 1 to 4 digits then we could use a pattern like this:
    .*grid_device_adds__5__1\d{1,4}.*

Now we're really using the power of regular expressions. In my test, which I'll post shortly, this pattern yielded a constant search time that was substantially lower than the brute force search.

Here's the code -- two classes this time.

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CellMatcher {
    private String[] _cells;
    private Pattern _pattern;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
        //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";
        formulaCells = ".*grid_device_adds__5__\\d{1,4}.*";
        _pattern = Pattern.compile(formulaCells, Pattern.CASE_INSENSITIVE);
    }
    // regex force :-)
    public boolean matches(String formula) {
          Matcher matcher = _pattern.matcher(formula);
        return matcher.matches();
    }
    // brute force :-(
    public boolean contains(String formula) {
          formula = formula.toLowerCase();
        for (int i = 0; i < _cells.length; ++i) {
              if (formula.indexOf(_cells[i]) >= 0)
                    return true;
        }
        return false;
    }
}

public class CellMatcherTest {
      private static final int LoopCount = 10000;
      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(3000);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
            System.out.println("matches: " + testMatches(matcher, formula) + "ms");
            System.out.println("contains: " + testContains(matcher, formula) + "ms");
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
      private static long testMatches(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.matches(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }
      
      private static long testContains(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.contains(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }

}
Thanks for the code you sent me, I will try this code and let you know about it.
Thanks for the reply jim.

I get different formulas with different strings. it won't be same as "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar" this all the time.

Also in the above example what happens after setFormulaCells() call?

what does this line do?   //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

Thanks once again.
ASKER CERTIFIED SOLUTION
Avatar of Jim Cakalic
Jim Cakalic
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for your response jim.