[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 530
  • Last Modified:

Java regular expressions to return specific values

I've got a simple question for someone that knows how to do this :)

Suppose I've got a string of text and I want to split it and return just a piece of it, say between two tags.  Something like this:

String text = "here's some text, bla bla bla <tag1>just give me this text</tag1> and then more text bla bla bla";
String delims = ???
String[] result = new String[10]; // don't really know how many entries, but assume just a few for this example

result = text.split(pattern);

for(int i=0; i<result.length; i++)
  System.out.println("token["+i+"]: " +result[i].toString());

Does that make sense?  Can you offer me a little help?

0
dopyiii
Asked:
dopyiii
  • 6
  • 3
  • 2
  • +2
2 Solutions
 
InteractiveMindCommented:
You could always check the index of the tag substrings, and then extract the substring from there.


However, a regex can be used — you'd need to make use of both negative lookahead and lookbehind [I imagine].

Check out "Special constructs (non-capturing)" here:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
0
 
InteractiveMindCommented:
Here's a regex which seems to work:

(?<=<tag1>).*(?=</tag1>)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Siva Prasanna KumarPrincipal Solutions ArchitectCommented:
String text = "here's some text, bla bla bla <tag1>just give me this text</tag1> and then more text bla bla bla";
String x =  text.substring(text.indexOf("<")+1,text.lastIndexOf(">"));
x=x.substring(x.indexOf(">")+1,x.lastIndexOf("<"));

THIS SIMPLE CODE WILL WORK ASSUMING ONLY ONE TAG IS THERE IN A STRING.

Thank you.
0
 
CEHJCommented:
>>(?<=<tag1>).*(?=</tag1>)

May we know where you got that one from?
0
 
InteractiveMindCommented:
What do you mean ?

(BTW, that's not to be used with String#split())
0
 
InteractiveMindCommented:
Improving it slightly:

   (?<=<tag1>)[^><]*(?=</tag1>)

And an example of use:


import java.util.regex.* ;

class Example
{
public static void main(String[]a)
{
    String input = "here's some text, bla bla bla <tag1>i want this</tag1> and <tag1>and this !!</tag1> more text bla bla bla";
   
    Pattern p = Pattern.compile( "(?<=<tag1>)[^><]*(?=</tag1>)" ) ;
    Matcher m = p.matcher( input ) ;
   
    int i = 0 ;
    while ( m.find() )
    {
        System.out.println( "Token["+(i++)+"]: "+m.group() ) ;
    }
}
}
0
 
JeffHorribleCommented:
Correct me if I am wrong but the code
import java.util.regex.* ;

class Example
{
public static void main(String[]a)
{
    String input = "here's some text, bla bla bla <tag1>i want this</tag1> and <tag1>and this !!</tag1> more text bla bla bla";
   
    Pattern p = Pattern.compile( "(?<=<tag1>)[^><]*(?=</tag1>)" ) ;
    Matcher m = p.matcher( input ) ;
   
    int i = 0 ;
    while ( m.find() )
    {
        System.out.println( "Token["+(i++)+"]: "+m.group() ) ;
    }
}
}

Will give you everything including the tags.  Is there a way to get the string between the tags without the tags?
0
 
JeffHorribleCommented:
My bad.  you are right.  This works just fine.
0
 
dopyiiiAuthor Commented:
I knew that I'd come to the right place :)

Thanks for the links CEHJ  and InteractiveMind.  I'd looked at those previously to posting, but they weren't making much sense in my little brain.  I reread CEHJ's post, and it made a little more sense the second time around.

I tried shivaspk's substring idea which works well if you only have one tag (per the warning).  But, I improved it a bit:

<<<code>>>
String text = "jsdf jskdfj ksd fj <tag1>some stuff here</tag1>";

// get the text between '<tag1>' and '</tag1>'
String result = text.substring(text.indexOf("<tag1>")+6, text.lastIndexOf("</tag1>"))
System.out.println(result.toString ()); // this prints "some stuff here"

<<< code>>>

And also (possibly) coded up one that will work for multiple tags (haven't tried it):

<<< code >>>
// these start from the beginning by default
String text = "jsdf jskdfj ksd fj <tag1>some stuff here</tag1> ksjdfksd sdkfj ksdf  <tag1>more here too</tag1> dksfj";
int startPointer = text.indexOf("<tag1>");
int endPointer = text.indexOf("</tag1>");

// find them all - indexOf will return -1 if nothing is found
while(startPointer > 0) {
   String result = text.substring(startPointer+5, endPointer);
   if(!result.equals("")
      System.out.println("found one: " + result.toString());

   // find the next tag in the stream
   startPointer = text.indexOf("<tag1>", endPointer+5);
   if(startPointer > 0)
      endPointer = text.indexOf("</tag1>", startPointer+5);

<<< code >>>

I actually like InteractiveMind's answer the best (it seems most geeky and I'm a geek so I liked it :)

Thanks a bunch Experts!
0
 
InteractiveMindCommented:
No probs :-)
Thanks for the points.
0
 
CEHJCommented:
>>Will give you everything including the tags.  

Yes and no. It does, but since the lookbehind and lookahead are zero-width, that's of no consequence.

Thanks to IM - you've taught me something. I didn't know lookahead/behind were capable of variable length (which is why i commented earlier) ;-)
0
 
dopyiiiAuthor Commented:
I learned a lot too :)
0
 
InteractiveMindCommented:
IM 1 - 1000 CEHJ

:-)
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 6
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now