Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 250
  • Last Modified:

how do i know if part of one string and part of other string are same

HI all,

here is an example

I have two strings

1)String str1 = "abc1, def*12*ghi";

second String is

2)String str2 = "zxy,def, mno,qrs,tuv,str";

if i compare these two string i should get true based on both strings have "def". how to achieve this.

Thanks in advance.
0
SriVelagapudi
Asked:
SriVelagapudi
  • 9
  • 7
  • 3
  • +2
1 Solution
 
bloodredsunCommented:
Use String.contains()
0
 
bloodredsunCommented:
or String.matches() and a regular expression.
0
 
Jim CakalicSenior Developer/ArchitectCommented:
Also the pre-5.0 way, use String.indexOf:

    boolean matched = str1.indexOf("def") >=0 && str2.indexOf("def") >= 0;

Regards
Jim Cakalic
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
implementITCommented:
You can use StringTokenizer class based on "," delimiter.
check for your token.

also could use indexOf() for other string.

http://www.tanul.info
0
 
mdebiasiCommented:
Hi,
If both strings are comma separated tokens use StringTokenizer() + indexOf() function:

String s1 = "ab,cd,de";
String s2 = "xx,yy,cd";
// add these if you want case insensitive comparison:
// s1 = s1.toLowerCase();
// s2 = s1.toLowerCase();
StringTokenizer st = new StringTokenizer(s1, ", " );  // commas, and spaces are used as delimiters
while ( st.hasMoreTokens() ) {
  String r = st.nextToken();
  if ( s2.indexOf( r ) >= 0) {
    // WE CATCHED THE BEAST!
    System.out.rpintln( r );
  }
}

0
 
bloodredsunCommented:
>> Also the pre-5.0 way, use String.indexOf:

String.contains() is 5.0 but String().matches() is 1.4, but you're absolutely correct in that indexOf will do the job very well and may be easier in some respects for someone new to Java :-)
0
 
SriVelagapudiAuthor Commented:
My problem is the first string what i mentioned in my last mail is a dynamic string. it keeps on changing and the second string is a big string with "," delimited.

what i would like to know is how to know any part of the first string is matching any part of the second string?

Right now i was already using StringTokenizer like mdebiasi mentioned but since second string is very big string, processing time is more. i need a short cut way to do this.
0
 
mdebiasiCommented:
Uhm,
Pre-tokenize the second string and put the tokens in a String array:

String tokens[] = new String[ 1024 ];  // up to 1024 tokens, use Vector if you don't know the MAX number of tokens of the second string
int tokencount = 0;
StringTokenizer st = new StringTokenizer( s2 , ", " );
while ( st.hasMoreTokens() ) {
  tokens[ tokencount++ ] = st.nextToken();
}

// now use indexOf on the "mutant" s1 (DON't TOKENIZE s2 AGAIN)
for (int i = 0; i < tokencount; i++ ) {
  if ( s1.indexOf( tokens[i] ) >= 0 ) {
    // FOUND IT ...
  }
}

If this solution is still too slow then you must use regular expressions: build a regular expression from s2 and test if "mutant" s1 matches it.

0
 
SriVelagapudiAuthor Commented:
Right now my code like below

"formulaCells" below is the "," delimited string, which will have around 300 strings.

for(int i=0; i<2000; i++){
                                               StringTokenizer st = new StringTokenizer(formulaCells, ",");
                    boolean formulaFlag = false;
                    while(formulaCells.equals("") == false && st.hasMoreTokens()){
                          String cellName = st.nextToken().toString();
                          String formulaLower = formula.toLowerCase();
                          if(formulaLower.indexOf(cellName) != -1){
                                formulaFlag = true;
                                break;
                          }
                    }

}

the process has to run for 2000 times with different formula and different formulaCells. how to do this to make it faster.
0
 
Jim CakalicSenior Developer/ArchitectCommented:
How about:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(cells);
    }
    private boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(cells, formula) >= 0;
    }

I would anticipate your using the class this way:

    CellMatcher matcher = new CellMatcher("some,comma,delim,string");
    loop across formulas
        if (matcher.matches(formula)) {
            ...
        }
    ...
    // oops, formulaCells changed!
    matcher.setFormulaCells(formulaCells);

Hope that make sense.
           
FYI, upon the introduction of regex in JDK 1.4, use of StringTokenizer was officially "discouraged" in favor of String.split. It is retained for compatibility purposes.

Regards,
Jim
0
 
SriVelagapudiAuthor Commented:
Thanks for your reply mdebiasi.


I tried as you said like pre-tokenize but still it is not making lot of difference.


0
 
Jim CakalicSenior Developer/ArchitectCommented:
Oops. That should have been:

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
}
0
 
SriVelagapudiAuthor Commented:
Thanks for your response Jim.

I tried your solution but i am not getting true at any point of time though i have the string match in the formula.

in "formulaCells" i have "grid_device_adds__5__1,grid_device_adds__5__2,grid_device_adds__5__3,grid_device_adds__5__4,grid_device_adds__5__5" though this is part of the whole string.

in "formula"  i have "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar"

but still i am not getting true from the matches() method. I don't know why? any ideas?
0
 
SriVelagapudiAuthor Commented:
I think because of the number in the string binarysearch is not working fine.

any other ideas?
0
 
Jim CakalicSenior Developer/ArchitectCommented:
Hang on. I'm putting a more complete (working) example together.
0
 
Jim CakalicSenior Developer/ArchitectCommented:
OK. Here's what I was thinking. Let's start from the beginning with the CellMatcher class:

import java.util.Arrays;

public class CellMatcher {
    private String[] _cells;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
    }
    // this method is for when you want to see if the formula
    // exactly matches one of the formulaCell values
    public boolean matches(String formula) {
        formula = formula.toLowerCase();
        return Arrays.binarySearch(_cells, formula) >= 0;
    }
    // this method is for when you want to see if the any of
    // the defined formulaCells is contained in the formula
    public boolean contains(String formula) {
          for (int i = 0; i < _cells.length; ++i) {
                if (formula.indexOf(_cells[i]) >= 0)
                      return true;
          }
          return false;
    }
}

And here's a test class I wrote:

public class CellMatcherTest {

      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(100);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__1*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
}

The CellMatcher.matches method was looking for an exact match. So I wrote the contains method to invert this and check whether a formulaCell string appeared anywhere in the formula string. That's at least closer. But you may possibly run into problems with this. For example, what if you had the two formulaCells values grid_device_adds__5__1 and grid_device_adds__5__11? Then if a formula was "=grid_device_adds__5__111", and grid_device_adds__5__111 had not been defined as a formulaCell, contains would return true anyway because it would be doing a simple string match. If that's isn't a problem then I think you're good to go. If it is a problem we'll need to talk more.

Jim
0
 
SriVelagapudiAuthor Commented:
I was doing the same thing Jim. I thought that we can do the same thing using binarySearch also.

Because of the for loops it is taking lot of time.

if you see my earlier code which i mentiond, i was almost doing the same thing. Is there any way to avoid the for loop for searching the string?

i am using the outer for loop also which will loop for nearly 2000 times and apart from that if i use for loop for searching, it is taking lot of time. did you get me?
0
 
Jim CakalicSenior Developer/ArchitectCommented:
Well, we could try using regular expressions. The brute force way of doing this is to take the formulaCells and do something like this:
    formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

That would yield a regular expression which would match any of the formula cell identifiers anywhere in the formula string. In some testing I did it was about twice as fast as a brute force search iterating across all the formula cell identifiers. But only when I manipulated the formula string in such a way that forced it to search at least half of the array of possible values. But the regex method was very predictable. :-)

If your cell identifiers are very regular in form then we can be a lot smarter about the pattern. For example, if the cell identifier is always grid_device_adds__5__ followed by 1 to 4 digits then we could use a pattern like this:
    .*grid_device_adds__5__1\d{1,4}.*

Now we're really using the power of regular expressions. In my test, which I'll post shortly, this pattern yielded a constant search time that was substantially lower than the brute force search.

Here's the code -- two classes this time.

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class CellMatcher {
    private String[] _cells;
    private Pattern _pattern;

    public CellMatcher(String formulaCells) {
        setFormulaCells(formulaCells);
    }
    public void setFormulaCells(String formulaCells) {
        _cells = formulaCells.split(",");
        Arrays.sort(_cells);
        //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";
        formulaCells = ".*grid_device_adds__5__\\d{1,4}.*";
        _pattern = Pattern.compile(formulaCells, Pattern.CASE_INSENSITIVE);
    }
    // regex force :-)
    public boolean matches(String formula) {
          Matcher matcher = _pattern.matcher(formula);
        return matcher.matches();
    }
    // brute force :-(
    public boolean contains(String formula) {
          formula = formula.toLowerCase();
        for (int i = 0; i < _cells.length; ++i) {
              if (formula.indexOf(_cells[i]) >= 0)
                    return true;
        }
        return false;
    }
}

public class CellMatcherTest {
      private static final int LoopCount = 10000;
      public static void main(String[] args) {
            String formulaCells = generateFormulaCells(3000);
            String formula = "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar";
            CellMatcher matcher = new CellMatcher(formulaCells);
            System.out.println("matches: " + matcher.matches(formula));
            System.out.println("contains: " + matcher.contains(formula));
            System.out.println("matches: " + testMatches(matcher, formula) + "ms");
            System.out.println("contains: " + testContains(matcher, formula) + "ms");
      }
      
      private static String generateFormulaCells(int count) {
            StringBuffer buf = new StringBuffer(25 * count);
            for (int i = 0; i < count; ++i) {
                  buf.append("grid_device_adds__5__").append(i + 1).append(",");
            }
            return buf.toString();
      }
      
      private static long testMatches(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.matches(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }
      
      private static long testContains(CellMatcher matcher, String formula) {
            long t1 = System.currentTimeMillis();
            for (int i = 0; i < LoopCount; ++i) {
                  matcher.contains(formula);
            }
            long t2 = System.currentTimeMillis();
            return t2 - t1;
      }

}
0
 
SriVelagapudiAuthor Commented:
Thanks for the code you sent me, I will try this code and let you know about it.
0
 
SriVelagapudiAuthor Commented:
Thanks for the reply jim.

I get different formulas with different strings. it won't be same as "=grid_hidden_supt__1__9*grid_device_adds__5__2100*ongoing_IITOA_scalar" this all the time.

Also in the above example what happens after setFormulaCells() call?

what does this line do?   //formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

Thanks once again.
0
 
Jim CakalicSenior Developer/ArchitectCommented:
I was experimenting with two different ways of building the expression. The way that is commented out:
    formulaCells = ".*(" + formulaCells.replaceAll(",", "|") + ").*";

creates a regular expression from the comma-delimited cell identifiers by simply putting them in a group (with parens), replacing the commas with the vertical bar (which is OR in regular expressions) and then allowing for any character to precede and succeed the group.

The line after it:
     formulaCells = ".*grid_device_adds__5__\\d{1,4}.*";

is building a different expression that assumes the cell identifiers follow a pattern. According to your last post, that isn't true so that option is out.

I was pondering whether we could parse the formula itself and extract candidate cell identifiers ...
0
 
SriVelagapudiAuthor Commented:
Thanks for your response jim.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 9
  • 7
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now