SriVelagapudi
asked on
how do i know if part of one string and part of other string are same
HI all,
here is an example
I have two strings
1)String str1 = "abc1, def*12*ghi";
second String is
2)String str2 = "zxy,def, mno,qrs,tuv,str";
if i compare these two string i should get true based on both strings have "def". how to achieve this.
Thanks in advance.
here is an example
I have two strings
1)String str1 = "abc1, def*12*ghi";
second String is
2)String str2 = "zxy,def, mno,qrs,tuv,str";
if i compare these two string i should get true based on both strings have "def". how to achieve this.
Thanks in advance.
Use String.contains()
or String.matches() and a regular expression.
Also the pre-5.0 way, use String.indexOf:
boolean matched = str1.indexOf("def") >=0 && str2.indexOf("def") >= 0;
Regards
Jim Cakalic
boolean matched = str1.indexOf("def") >=0 && str2.indexOf("def") >= 0;
Regards
Jim Cakalic
You can use StringTokenizer class based on "," delimiter.
check for your token.
also could use indexOf() for other string.
http://www.tanul.info
check for your token.
also could use indexOf() for other string.
http://www.tanul.info
Hi,
If both strings are comma separated tokens use StringTokenizer() + indexOf() function:
String s1 = "ab,cd,de";
String s2 = "xx,yy,cd";
// add these if you want case insensitive comparison:
// s1 = s1.toLowerCase();
// s2 = s1.toLowerCase();
StringTokenizer st = new StringTokenizer(s1, ", " ); // commas, and spaces are used as delimiters
while ( st.hasMoreTokens() ) {
String r = st.nextToken();
if ( s2.indexOf( r ) >= 0) {
// WE CATCHED THE BEAST!
System.out.rpintln( r );
}
}
If both strings are comma separated tokens use StringTokenizer() + indexOf() function:
String s1 = "ab,cd,de";
String s2 = "xx,yy,cd";
// add these if you want case insensitive comparison:
// s1 = s1.toLowerCase();
// s2 = s1.toLowerCase();
StringTokenizer st = new StringTokenizer(s1, ", " ); // commas, and spaces are used as delimiters
while ( st.hasMoreTokens() ) {
String r = st.nextToken();
if ( s2.indexOf( r ) >= 0) {
// WE CATCHED THE BEAST!
System.out.rpintln( r );
}
}
>> Also the pre-5.0 way, use String.indexOf:
String.contains() is 5.0 but String().matches() is 1.4, but you're absolutely correct in that indexOf will do the job very well and may be easier in some respects for someone new to Java :-)
String.contains() is 5.0 but String().matches() is 1.4, but you're absolutely correct in that indexOf will do the job very well and may be easier in some respects for someone new to Java :-)
ASKER
My problem is the first string what i mentioned in my last mail is a dynamic string. it keeps on changing and the second string is a big string with "," delimited.
what i would like to know is how to know any part of the first string is matching any part of the second string?
Right now i was already using StringTokenizer like mdebiasi mentioned but since second string is very big string, processing time is more. i need a short cut way to do this.
what i would like to know is how to know any part of the first string is matching any part of the second string?
Right now i was already using StringTokenizer like mdebiasi mentioned but since second string is very big string, processing time is more. i need a short cut way to do this.
Uhm,
Pre-tokenize the second string and put the tokens in a String array:
String tokens[] = new String[ 1024 ]; // up to 1024 tokens, use Vector if you don't know the MAX number of tokens of the second string
int tokencount = 0;
StringTokenizer st = new StringTokenizer( s2 , ", " );
while ( st.hasMoreTokens() ) {
tokens[ tokencount++ ] = st.nextToken();
}
// now use indexOf on the "mutant" s1 (DON't TOKENIZE s2 AGAIN)
for (int i = 0; i < tokencount; i++ ) {
if ( s1.indexOf( tokens[i] ) >= 0 ) {
// FOUND IT ...
}
}
If this solution is still too slow then you must use regular expressions: build a regular expression from s2 and test if "mutant" s1 matches it.
Pre-tokenize the second string and put the tokens in a String array:
String tokens[] = new String[ 1024 ]; // up to 1024 tokens, use Vector if you don't know the MAX number of tokens of the second string
int tokencount = 0;
StringTokenizer st = new StringTokenizer( s2 , ", " );
while ( st.hasMoreTokens() ) {
tokens[ tokencount++ ] = st.nextToken();
}
// now use indexOf on the "mutant" s1 (DON't TOKENIZE s2 AGAIN)
for (int i = 0; i < tokencount; i++ ) {
if ( s1.indexOf( tokens[i] ) >= 0 ) {
// FOUND IT ...
}
}
If this solution is still too slow then you must use regular expressions: build a regular expression from s2 and test if "mutant" s1 matches it.
ASKER
Right now my code like below
"formulaCells" below is the "," delimited string, which will have around 300 strings.
for(int i=0; i<2000; i++){
StringTokenizer st = new StringTokenizer(formulaCel ls, ",");
boolean formulaFlag = false;
while(formulaCells.equals( "") == false && st.hasMoreTokens()){
String cellName = st.nextToken().toString();
String formulaLower = formula.toLowerCase();
if(formulaLower.indexOf(ce llName) != -1){
formulaFlag = true;
break;
}
}
}
the process has to run for 2000 times with different formula and different formulaCells. how to do this to make it faster.
"formulaCells" below is the "," delimited string, which will have around 300 strings.
for(int i=0; i<2000; i++){
StringTokenizer st = new StringTokenizer(formulaCel
boolean formulaFlag = false;
while(formulaCells.equals(
String cellName = st.nextToken().toString();
String formulaLower = formula.toLowerCase();
if(formulaLower.indexOf(ce
formulaFlag = true;
break;
}
}
}
the process has to run for 2000 times with different formula and different formulaCells. how to do this to make it faster.
How about:
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel ls);
}
public setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(cells);
}
private boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(cells, formula) >= 0;
}
I would anticipate your using the class this way:
CellMatcher matcher = new CellMatcher("some,comma,de lim,string ");
loop across formulas
if (matcher.matches(formula)) {
...
}
...
// oops, formulaCells changed!
matcher.setFormulaCells(fo rmulaCells );
Hope that make sense.
FYI, upon the introduction of regex in JDK 1.4, use of StringTokenizer was officially "discouraged" in favor of String.split. It is retained for compatibility purposes.
Regards,
Jim
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel
}
public setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(cells);
}
private boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(cells,
}
I would anticipate your using the class this way:
CellMatcher matcher = new CellMatcher("some,comma,de
loop across formulas
if (matcher.matches(formula))
...
}
...
// oops, formulaCells changed!
matcher.setFormulaCells(fo
Hope that make sense.
FYI, upon the introduction of regex in JDK 1.4, use of StringTokenizer was officially "discouraged" in favor of String.split. It is retained for compatibility purposes.
Regards,
Jim
ASKER
Thanks for your reply mdebiasi.
I tried as you said like pre-tokenize but still it is not making lot of difference.
I tried as you said like pre-tokenize but still it is not making lot of difference.
Oops. That should have been:
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel ls);
}
public setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
}
public boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(_cells , formula) >= 0;
}
}
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel
}
public setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
}
public boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(_cells
}
}
ASKER
Thanks for your response Jim.
I tried your solution but i am not getting true at any point of time though i have the string match in the formula.
in "formulaCells" i have "grid_device_adds__5__1,gr id_device_ adds__5__2 ,grid_devi ce_adds__5 __3,grid_d evice_adds __5__4,gri d_device_a dds__5__5" though this is part of the whole string.
in "formula" i have "=grid_hidden_supt__1__9*g rid_device _adds__5__ 1*ongoing_ IITOA_scal ar"
but still i am not getting true from the matches() method. I don't know why? any ideas?
I tried your solution but i am not getting true at any point of time though i have the string match in the formula.
in "formulaCells" i have "grid_device_adds__5__1,gr
in "formula" i have "=grid_hidden_supt__1__9*g
but still i am not getting true from the matches() method. I don't know why? any ideas?
ASKER
I think because of the number in the string binarysearch is not working fine.
any other ideas?
any other ideas?
Hang on. I'm putting a more complete (working) example together.
OK. Here's what I was thinking. Let's start from the beginning with the CellMatcher class:
import java.util.Arrays;
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel ls);
}
public void setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
}
// this method is for when you want to see if the formula
// exactly matches one of the formulaCell values
public boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(_cells , formula) >= 0;
}
// this method is for when you want to see if the any of
// the defined formulaCells is contained in the formula
public boolean contains(String formula) {
for (int i = 0; i < _cells.length; ++i) {
if (formula.indexOf(_cells[i] ) >= 0)
return true;
}
return false;
}
}
And here's a test class I wrote:
public class CellMatcherTest {
public static void main(String[] args) {
String formulaCells = generateFormulaCells(100);
String formula = "=grid_hidden_supt__1__9*g rid_device _adds__5__ 1*ongoing_ IITOA_scal ar";
CellMatcher matcher = new CellMatcher(formulaCells);
System.out.println("matche s: " + matcher.matches(formula));
System.out.println("contai ns: " + matcher.contains(formula)) ;
}
private static String generateFormulaCells(int count) {
StringBuffer buf = new StringBuffer(25 * count);
for (int i = 0; i < count; ++i) {
buf.append("grid_device_ad ds__5__"). append(i + 1).append(",");
}
return buf.toString();
}
}
The CellMatcher.matches method was looking for an exact match. So I wrote the contains method to invert this and check whether a formulaCell string appeared anywhere in the formula string. That's at least closer. But you may possibly run into problems with this. For example, what if you had the two formulaCells values grid_device_adds__5__1 and grid_device_adds__5__11? Then if a formula was "=grid_device_adds__5__111 ", and grid_device_adds__5__111 had not been defined as a formulaCell, contains would return true anyway because it would be doing a simple string match. If that's isn't a problem then I think you're good to go. If it is a problem we'll need to talk more.
Jim
import java.util.Arrays;
public class CellMatcher {
private String[] _cells;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel
}
public void setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
}
// this method is for when you want to see if the formula
// exactly matches one of the formulaCell values
public boolean matches(String formula) {
formula = formula.toLowerCase();
return Arrays.binarySearch(_cells
}
// this method is for when you want to see if the any of
// the defined formulaCells is contained in the formula
public boolean contains(String formula) {
for (int i = 0; i < _cells.length; ++i) {
if (formula.indexOf(_cells[i]
return true;
}
return false;
}
}
And here's a test class I wrote:
public class CellMatcherTest {
public static void main(String[] args) {
String formulaCells = generateFormulaCells(100);
String formula = "=grid_hidden_supt__1__9*g
CellMatcher matcher = new CellMatcher(formulaCells);
System.out.println("matche
System.out.println("contai
}
private static String generateFormulaCells(int count) {
StringBuffer buf = new StringBuffer(25 * count);
for (int i = 0; i < count; ++i) {
buf.append("grid_device_ad
}
return buf.toString();
}
}
The CellMatcher.matches method was looking for an exact match. So I wrote the contains method to invert this and check whether a formulaCell string appeared anywhere in the formula string. That's at least closer. But you may possibly run into problems with this. For example, what if you had the two formulaCells values grid_device_adds__5__1 and grid_device_adds__5__11? Then if a formula was "=grid_device_adds__5__111
Jim
ASKER
I was doing the same thing Jim. I thought that we can do the same thing using binarySearch also.
Because of the for loops it is taking lot of time.
if you see my earlier code which i mentiond, i was almost doing the same thing. Is there any way to avoid the for loop for searching the string?
i am using the outer for loop also which will loop for nearly 2000 times and apart from that if i use for loop for searching, it is taking lot of time. did you get me?
Because of the for loops it is taking lot of time.
if you see my earlier code which i mentiond, i was almost doing the same thing. Is there any way to avoid the for loop for searching the string?
i am using the outer for loop also which will loop for nearly 2000 times and apart from that if i use for loop for searching, it is taking lot of time. did you get me?
Well, we could try using regular expressions. The brute force way of doing this is to take the formulaCells and do something like this:
formulaCells = ".*(" + formulaCells.replaceAll(", ", "|") + ").*";
That would yield a regular expression which would match any of the formula cell identifiers anywhere in the formula string. In some testing I did it was about twice as fast as a brute force search iterating across all the formula cell identifiers. But only when I manipulated the formula string in such a way that forced it to search at least half of the array of possible values. But the regex method was very predictable. :-)
If your cell identifiers are very regular in form then we can be a lot smarter about the pattern. For example, if the cell identifier is always grid_device_adds__5__ followed by 1 to 4 digits then we could use a pattern like this:
.*grid_device_adds__5__1\d {1,4}.*
Now we're really using the power of regular expressions. In my test, which I'll post shortly, this pattern yielded a constant search time that was substantially lower than the brute force search.
Here's the code -- two classes this time.
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CellMatcher {
private String[] _cells;
private Pattern _pattern;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel ls);
}
public void setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
//formulaCells = ".*(" + formulaCells.replaceAll(", ", "|") + ").*";
formulaCells = ".*grid_device_adds__5__\\ d{1,4}.*";
_pattern = Pattern.compile(formulaCel ls, Pattern.CASE_INSENSITIVE);
}
// regex force :-)
public boolean matches(String formula) {
Matcher matcher = _pattern.matcher(formula);
return matcher.matches();
}
// brute force :-(
public boolean contains(String formula) {
formula = formula.toLowerCase();
for (int i = 0; i < _cells.length; ++i) {
if (formula.indexOf(_cells[i] ) >= 0)
return true;
}
return false;
}
}
public class CellMatcherTest {
private static final int LoopCount = 10000;
public static void main(String[] args) {
String formulaCells = generateFormulaCells(3000) ;
String formula = "=grid_hidden_supt__1__9*g rid_device _adds__5__ 2100*ongoi ng_IITOA_s calar";
CellMatcher matcher = new CellMatcher(formulaCells);
System.out.println("matche s: " + matcher.matches(formula));
System.out.println("contai ns: " + matcher.contains(formula)) ;
System.out.println("matche s: " + testMatches(matcher, formula) + "ms");
System.out.println("contai ns: " + testContains(matcher, formula) + "ms");
}
private static String generateFormulaCells(int count) {
StringBuffer buf = new StringBuffer(25 * count);
for (int i = 0; i < count; ++i) {
buf.append("grid_device_ad ds__5__"). append(i + 1).append(",");
}
return buf.toString();
}
private static long testMatches(CellMatcher matcher, String formula) {
long t1 = System.currentTimeMillis() ;
for (int i = 0; i < LoopCount; ++i) {
matcher.matches(formula);
}
long t2 = System.currentTimeMillis() ;
return t2 - t1;
}
private static long testContains(CellMatcher matcher, String formula) {
long t1 = System.currentTimeMillis() ;
for (int i = 0; i < LoopCount; ++i) {
matcher.contains(formula);
}
long t2 = System.currentTimeMillis() ;
return t2 - t1;
}
}
formulaCells = ".*(" + formulaCells.replaceAll(",
That would yield a regular expression which would match any of the formula cell identifiers anywhere in the formula string. In some testing I did it was about twice as fast as a brute force search iterating across all the formula cell identifiers. But only when I manipulated the formula string in such a way that forced it to search at least half of the array of possible values. But the regex method was very predictable. :-)
If your cell identifiers are very regular in form then we can be a lot smarter about the pattern. For example, if the cell identifier is always grid_device_adds__5__ followed by 1 to 4 digits then we could use a pattern like this:
.*grid_device_adds__5__1\d
Now we're really using the power of regular expressions. In my test, which I'll post shortly, this pattern yielded a constant search time that was substantially lower than the brute force search.
Here's the code -- two classes this time.
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CellMatcher {
private String[] _cells;
private Pattern _pattern;
public CellMatcher(String formulaCells) {
setFormulaCells(formulaCel
}
public void setFormulaCells(String formulaCells) {
_cells = formulaCells.split(",");
Arrays.sort(_cells);
//formulaCells = ".*(" + formulaCells.replaceAll(",
formulaCells = ".*grid_device_adds__5__\\
_pattern = Pattern.compile(formulaCel
}
// regex force :-)
public boolean matches(String formula) {
Matcher matcher = _pattern.matcher(formula);
return matcher.matches();
}
// brute force :-(
public boolean contains(String formula) {
formula = formula.toLowerCase();
for (int i = 0; i < _cells.length; ++i) {
if (formula.indexOf(_cells[i]
return true;
}
return false;
}
}
public class CellMatcherTest {
private static final int LoopCount = 10000;
public static void main(String[] args) {
String formulaCells = generateFormulaCells(3000)
String formula = "=grid_hidden_supt__1__9*g
CellMatcher matcher = new CellMatcher(formulaCells);
System.out.println("matche
System.out.println("contai
System.out.println("matche
System.out.println("contai
}
private static String generateFormulaCells(int count) {
StringBuffer buf = new StringBuffer(25 * count);
for (int i = 0; i < count; ++i) {
buf.append("grid_device_ad
}
return buf.toString();
}
private static long testMatches(CellMatcher matcher, String formula) {
long t1 = System.currentTimeMillis()
for (int i = 0; i < LoopCount; ++i) {
matcher.matches(formula);
}
long t2 = System.currentTimeMillis()
return t2 - t1;
}
private static long testContains(CellMatcher matcher, String formula) {
long t1 = System.currentTimeMillis()
for (int i = 0; i < LoopCount; ++i) {
matcher.contains(formula);
}
long t2 = System.currentTimeMillis()
return t2 - t1;
}
}
ASKER
Thanks for the code you sent me, I will try this code and let you know about it.
ASKER
Thanks for the reply jim.
I get different formulas with different strings. it won't be same as "=grid_hidden_supt__1__9*g rid_device _adds__5__ 2100*ongoi ng_IITOA_s calar" this all the time.
Also in the above example what happens after setFormulaCells() call?
what does this line do? //formulaCells = ".*(" + formulaCells.replaceAll(", ", "|") + ").*";
Thanks once again.
I get different formulas with different strings. it won't be same as "=grid_hidden_supt__1__9*g
Also in the above example what happens after setFormulaCells() call?
what does this line do? //formulaCells = ".*(" + formulaCells.replaceAll(",
Thanks once again.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for your response jim.