[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Need pattern string occured from the regular expression

Posted on 2007-07-26
12
Medium Priority
?
404 Views
Last Modified: 2013-11-23
Hi All,

I am splitting a string based on regular expression..... which looks like below.

public List splitString (String str, String regularExp){
 // Doing splitting and segregating it to chunks...
String [] splitArrey = str.split(regularExp);

// Want to get the actual pattern occuring each time...

}
Invoking above method
String regEx =  "<?(br|p|-- end --)>"
List resultList = new Splitter.splitString (str, regEx);
----------------------------------------------------------------------------

All I need in the above method now I want to find out what is the mattching pattern occured. Say it could be <br> or <p> or could be <-- end -->.

When I try to access the pattern String I'm getting the regular Expression I passed.. Instead I want to get the actual pattern string which was occured.. because I need to add that pattern string at the end of my chunk.

All I need is to access the pattern string which is splitting the string.... is there any method to get it...
Thanks
0
Comment
Question by:Suda_RamanaReddy
  • 4
  • 3
  • 3
  • +2
12 Comments
 
LVL 1

Expert Comment

by:dankuck
ID: 19580201
Unlike Perl, Java does not offer parentheses to solve this problem (or at least, the API documentation doesn't mention it).  If parentheses were included around the regex in Perl, the results would include
the delimiters as components of the resulting array.

One way to solve this in Java would be to use Matcher.find to search for the delimiter and then use Matcher.start and Matcher.end to determine where and what the delimiter was.  By keeping a little extra info as we loop, we can determine what the component between the delimiters was.

Example:

To split the String "Four score\tand seven  years ago" using whitespace as the delimiter, the following code could be used.

public static void main(String[] args){
      String t = "Four score\tand seven  years ago";

      Matcher r = Pattern.compile("\\s").matcher(t);

      int previousEnd = 0;

      while (r.find()){
            System.out.println("component : \"" + t.substring(previousEnd, r.start()) + "\"");
            System.out.println("delimiter : \"" + t.substring(r.start(), r.end()) + "\"");
            previousEnd = r.end();
      }
      System.out.println("component : " + t.substring(previousEnd));

}

The previousEnd variable records where the last match ended and therefore where the next component begins.

When the loop is completed it is likely that one more component remains in the String, so the previousEnd variable can be used again to grab all content from the last delimiter to the end of the String.

Note that that the \s will match a space, a tab, or a newline character and will match the double space between "seven" and "years" twice, yielding a single zero-length string between them.

The output of this code would be:

component : "Four"
delimiter : " "
component : "score"
delimiter : "   "
component : "and"
delimiter : " "
component : "seven"
delimiter : " "
component : ""
delimiter : " "
component : "years"
delimiter : " "
component : ago
0
 
LVL 3

Expert Comment

by:asood314
ID: 19580243
You can also do it using the indexOf() and substring methods as follows:

public List splitString (String str, String regularExp){
 // Doing splitting and segregating it to chunks...
String [] splitArrey = str.split(regularExp);

String pattern = str.substring(str.indexOf(splitArrey[0]) + splitArrey[0].length(), str.indexOf(splitArrey[1]));

}
0
 
LVL 1

Expert Comment

by:dankuck
ID: 19580356
indexOf can be used too, but will not match regular expresssions.  Also, it may give misleading results if the same token shows up in the String twice.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 13

Expert Comment

by:Bart Cremers
ID: 19580510
  public static List<String> splitString(String str, String regularExp) {
        Pattern p = Pattern.compile(regularExp);
       
        List<String> result = new ArrayList<String>();
        Matcher matcher = p.matcher(str);
       
        while (matcher.find()) {
            result.add(matcher.group());
        }
       
        return result;
    }    
0
 

Author Comment

by:Suda_RamanaReddy
ID: 19582549
Hi,

Here is my code......

public class Splitter2 {
            
      /**
       * <p>
       * Method which accepts a String and Split it into chunks when ever a regular expression pattern is found.
       * </p>
       * @param inputString - String to split into chunks
       * @param regex - Pattern String based on which String needs to be splitted.
       * @param maximumCount - Maximum number of chunks for a page
       * @return - minimumCount - Minimum number of chunks for a page
       */
             
      public List splitString(String inputString, String regex, int maximumCount, int minimumCount) {
                  //String patternStr = "(/<?(br|p|-- end --)>/";
            
                  /*
                   * Pattern pattern = Pattern.compile(regex);
                   * Matcher matcher = pattern.matcher(inputString);
                   *
                   * while(matcher.find()){
                   * }
                   */
            
                  String patternStrg = regex;
                  int maxCount = maximumCount;
                  int minCount = minimumCount -1;
                  
                  LinkedList linkedList = new LinkedList();
                  int limit=0;
                                          
                  // Split String into chunks at all occurences of pattern
                  String[] splitString = inputString.split(patternStrg);  
           
                  // if no of chunks are less than maxCount return a list add details to that...
                   if (splitString.length > maxCount){
                         for (int x=0; x<splitString.length;){
                               String tempStr ="";
                               
                               //System.out.println("SplitString lenght"+splitString.length);
                               
                               if ((splitString.length - x) <= minCount){
                                           String lastStr = (String)linkedList.removeLast();
                                             lastStr += splitString[splitString.length - minCount];
                                             
                                             //System.out.println("last String"+lastStr);
                                             linkedList.addLast(lastStr);                                                                            
                               }else{
                                     limit = Math.min(maxCount,splitString.length);
                                     int newlimit = ((x+limit) > splitString.length) ? splitString.length:(x+limit);
                                     for (int j = x ; j < newlimit; j++){
                                           //str1 += splitString[j]+patternStrg;
                                           tempStr += splitString[j];
                                     }
                                     linkedList.addLast(tempStr);
                               }
                               x += limit;
                       }
                   }
                   else {
                               String str1="";
                                                        
                               for (int k=0; k<splitString.length; k++){                                   
                                     
                                     if(patternStrg.equalsIgnoreCase("<!-- page break -->")){
                                           System.out.println("page break occured");
                                     }                                     
                                     str1 += splitString[k];
                                     System.out.println("pattern String"+ patternStrg);
                               }
                               linkedList.addLast(str1);
                               System.out.println("Split size if less than mincount" + linkedList.size());
                   }
            return linkedList;      
    }
            
      public static void main(String[] args) {
            BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
            System.out.println("Please enter a String: ");
            try{
                  String str = in.readLine();
                  //String regex = "<(Br|p|!-- page break --)>+( <Br>)?"; (Working..)
                  
                  String regex = "<(Br|p|!-- page break --)>+( <Br>)?";
                                    
                  List result = new Splitter2().splitString(str,regex,7,2);
                  
                  Iterator i = result.iterator();
            while (i.hasNext()) {
                System.out.println("Test check...." + i.next());
            }
            }
            catch(IOException ioe){
                  System.out.println(ioe);
            }
            catch(Exception e){
                  System.out.println(e);
            }
      }
}

/// I need to get the pattern String, because I have to do return the list if <!-- page break --> occurs...

Pattern pattern = Pattern.compile(patternStr);
                  Matcher matcher = pattern.matcher(str);
                  
                  while(matcher.find()){                       
                       count = count+1;
                       String[] splitString = str.split(patternStr);

........................do some thing....}

The problem here.. is I may get multiple occurances of pattern Strings... ( <Br> or <p> or ... sth else )
Earlier the treatment was just splitting string irrespective of what pattern String it is..... but now if I get <!--page break --> I have to return the list...

please let me know what could be the best possible way to achive it..
0
 

Author Comment

by:Suda_RamanaReddy
ID: 19582605
and one more problem with

while(matcher.find()) {} it repeats the o/p equal to the no.of matched pattern Strings in the given String.
0
 
LVL 2

Expert Comment

by:freeexpert
ID: 19584155
> dankuck:
>     Unlike Perl, Java does not offer parentheses to solve this problem (or at least, the API documentation doesn't mention it).

Yes, it does. They are called capturing groups. See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#cg

Coming back to original question:
Pattern p = Pattern.compile(".*(<(a|p|br|tr)) .*");
Matcher m1 = p.matcher("And you can go to <a href=\"http://www.yahoo.com\">yahoo</a> for details");
boolean matches = m1.matches(); // evals to true, modulo my spelling errors
String matchString = m1.group(1); // "<a"
String tag = m1.group(2); // "a"
int start = m1.start(1); // index of '<'
<etc>

0
 
LVL 1

Expert Comment

by:dankuck
ID: 19586182
> freeexpert:
>     Yes, it does. They are called capturing groups...

Ah, I guess I should have said "I don't understand the API documentation".  But I do now, thanks!  Anyway, capture groups don't work with the split method as in some other languages.

Suda_RamanaReddy:

I'm sorry, I don't completely understand the purpose of your code above, however the following method will produce a String[] identically to split, except that it will include the delimiter between each chunk.

Each even-numbered element will be a chunk (0, 2, 4, etc) and each odd-numbered element will be a delimiter (1, 3, 5, etc).  The last element will be a chunk even if it is a zero-length String.

      public static String[] splitIncludingDelimiter(String input, String regex){
            Pattern pattern = Pattern.compile(regex);
            Matcher matcher = pattern.matcher(input);

            List<String> list = new ArrayList<String>();

            int previousEnd = 0;

            while(matcher.find()){
                  String chunk = input.substring(previousEnd, matcher.end());
                  String delimiter = matcher.group();
                  previousEnd = matcher.end();
                  list.add(chunk);
                  list.add(delimiter);
            }
            String chunk = input.substring(previousEnd);
            list.add(chunk);

            String[] results = new String[list.size()];
            list.toArray(results);
            return results;
      }

If you use this method instead of String.split, you'll want to check every odd numbered element to see if it's the "<!-- page break -->" String you're looking for, and treat every even numbered element as you would treat a chunk.

You can check to see if a number is even by using:
if (number % 2 == 0)
   /* even */
else
   /* odd */

(Note: the splitIncludingDelimiter method is not optimized, but instead written for understanding.)
(Note: optimally, the splitIncludingDelimiter method would be written in a more Object-Oriented fashion, perhaps using some type of token or iterator, but here it's written for compatibility with the original String.split method.)
0
 

Author Comment

by:Suda_RamanaReddy
ID: 19593114
Thanks.
I'm sorry.. but I need some thing different answer...
Well.. I'm doing this for pagination application, where I have to split the input string based on the regular Expression. I need to do different validations based on the type pf pattern String...
My string may consists of " two or more Continuos <Br>  (or) <!-- Page Break -->. I have split it so that I can spread the input across the pages....  If the pattern string is only <Br>'s.. I'm just addint chunks to a temp String and once the chunks count is 7... I;m returing it as the first element of the array.. and continuing the process....

The problem is with <!-- Page Break --> When it occurs... Irrespective of the count I have to return that String.. so that It will be a  new page... For this I used ((((((((matcher.find))))))))) method... which is doing the validation each and everytime it finds the pattern String.. I need this method.. because I have to find the pattern STring occured.. so that I can do the comparison.... At the same time I don;t want it to repeat the process everytime it occurs...

Hope I explained the problem clearly....!

My program is ......
import java.io.*;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitter2 {
            
      /**
       * <p>
       * Method which accepts a String and Split it into chunks when ever a regular expression pattern is found.
       * </p>
       * @param inputString - String to split into chunks
       * @param regex - Pattern String based on which String needs to be splitted.
       * @param maximumCount - Maximum number of chunks for a page
       * @return - minimumCount - Minimum number of chunks for a page
       */
             
      public List splitString(String inputString, String regex, int maximumCount, int minimumCount) {
                              
                  /*
                   * Pattern pattern = Pattern.compile(regex);
                   * Matcher matcher = pattern.matcher(inputString);
                   *
                   * while(matcher.find()){
                   * }
                   */
                  String patternStrg = regex;
                  int maxCount = maximumCount;
                  int minCount = minimumCount -1;
                  
                  LinkedList linkedList = new LinkedList();
                  int limit=0;
                  String subStr ="";
                                                
                  Pattern pattern = Pattern.compile(regex);
                  Matcher matcher = pattern.matcher(inputString);
                  while(matcher.find()){
            
                        int startIndex = matcher.start();
                        int endIndex = matcher.end();
                        
                        subStr= inputString.substring(startIndex, endIndex);
                        //System.out.println("Actual Pattern String"+inputString.substring(startIndex, endIndex));
                        
                        // Split String into chunks at all occurences of pattern
                        //String[] splitString = inputString.split(patternStrg,6);  
                  
                        String[] splitString = inputString.split(patternStrg);
                        
                        // if no of chunks are less than maxCount return a list add details to that...
                         if (splitString.length > maxCount){
                               for (int x=0; x<splitString.length;){
                                     String tempStr ="";
                               
                                     //System.out.println("SplitString lenght"+splitString.length);
                                     
                                     if ((splitString.length - x) <= minCount){
                                                 String lastStr = (String)linkedList.removeLast();
                                                   lastStr += splitString[splitString.length - minCount];
                                                   
                                                   //System.out.println("last String"+lastStr);
                                                   linkedList.addLast(lastStr);                                                                            
                               }else{
                                     limit = Math.min(maxCount,splitString.length);
                                     int newlimit = ((x+limit) > splitString.length) ? splitString.length:(x+limit);
                                     for (int j = x ; j < newlimit; j++){
                                           //str1 += splitString[j]+patternStrg;
                                           tempStr += splitString[j];
                                     }
                                     linkedList.addLast(tempStr);
                               }
                               x += limit;
                       }
                   }
                   else {
                               String str1="";
                               for (int k=0; k<splitString.length; k++){                                   
                                     if(subStr.equalsIgnoreCase("<!-- page break -->")){
                                           str1 += splitString[k];
                                           linkedList.addLast(str1);
                                           System.out.println("page break occured"+subStr);
                                           str1="";
                                     }
                                     else{
                                           str1 += splitString[k];
                                           System.out.println("value of k in else part"+k +"String value"+ str1);
                                     }                                     
                               }
                               linkedList.addLast(str1);
                               System.out.println("Split size if less than mincount" + linkedList.size());
                   }
                  }
            return linkedList;
            
    }
            
      public static void main(String[] args) {
            BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
            System.out.println("Please enter a String: ");
            try{
                  String str = in.readLine();
                  //String regex = "<(Br|p|!-- page break --)>+( <Br>)?"; (Working..)
                  
                  String regex = "<(Br|p|!-- page break --)>+( <Br>)*";
                                    
                  List result = new Splitter2().splitString(str,regex,7,2);
                  
                  Iterator i = result.iterator();
            while (i.hasNext()) {
                System.out.println("Test check...." + i.next());
            }
            }
            catch(IOException ioe){
                  System.out.println(ioe);
            }
            catch(Exception e){
                  System.out.println(e);
            }
      }
}
------------------------------------------------------------------------------------------
and the i/p string is : 1st Chunk <p> 2nd Chunk <!-- page break --> 3rd Chunk <Br> <Br>

I should retrun the splitted string only once....
First element of the array is 1st Chunk  2nd Chunk
Second element of the array is 3rd Chunk

Similarly...
1st Chunk <p> 2nd Chunk <!-- page break --> 3rd Chunk <Br> <Br> 4th Chunk <p> 5th Chunk <p> 6th Chunk <Br> <Br> 7 th Chunk <!-- page break --> 8 th Chunk <!-- page break -->

I should retrun the splitted string only once....
First element of the array is 1st Chunk  2nd Chunk
Second element of the array is 3rd Chunk 4th Chunk 5th Chunk 6th Chunk 7 th Chunk
Third element of the array is 8 th Chunk.

Thanks in advance for all your help..

 





0
 
LVL 2

Expert Comment

by:freeexpert
ID: 19593677
It might be easier to do it in two classes:

class Chunk {
   String _string,
   String _delimiter;
}

class CoreSplitter{
    CoreSplitter(String input, Stringg delimeter) {
    }
    // you should be able to write this method now...
   // you can also have a hasNext method or return null from next at end...
    Chunk next() {
    }
}

Class PageSplitter {
   static  Vector<String> split(String input) {
             Vector<String> result = new Vector<String>();
            CoreSplitter splitter = new Splitter(input, ....);
            count = 0;
            StringBuffer page = new StringBuffer();
           while ((Chunk chunk = spliiter.next()) != null) {
                 page.append(chunk._string);
                 if (++count > 6 || chunk._delim.equals(PAGE_BREAK) {
                              result.append(page);
                               page = new StringBuffer();
                               count = 0;
                  }
             }
       }
}


I am sure there are a bunch of compile errors and couple of logical errors there, but you get the idea...
0
 

Author Comment

by:Suda_RamanaReddy
ID: 19602839
Thanks to all.. I did this now without using split method of String API
0
 
LVL 2

Accepted Solution

by:
freeexpert earned 1500 total points
ID: 19603108
> Thanks to all.. I did this now without using split method of String API

Did you use the Pattern and Matcher classes like I suggested? Did you use my approach for splitting either six chunks to a page or until page break?

If you did, I will appreciate if you can give me the points, hopefully with a high grade.

If you did not, I am curious to know the approach you eventually took.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Suggested Courses
Course of the Month19 days, 13 hours left to enroll

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question