• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1268
  • Last Modified:

regex for <body> tag

I'm working on java code that reads in a html file and checks for the <body> tag and then inserts text after it.  The body tag may vary from looking like this <body> to
<body lang=EN-US
style='tab-interval:.5in'>

How do I check for the second version of the body tag?  I currently have this as my regex string:
<body[a-zA-Z0-9]*>  but that does not work as it never finds a match.  Any ideas what the regex needs to look like?
0
newbieal
Asked:
newbieal
  • 6
  • 4
  • 2
3 Solutions
 
ddrudikCommented:
<body[^>]*>
0
 
newbiealAuthor Commented:
Thanks, but that doesn't seem to work:
The <body> tag is split over two lines (that may not always be the case but I have to account for that in the regex):

<body lang=EN-US
style='tab-interval:.5in'>
0
 
ddrudikCommented:
That works for me, show your code.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("<body[^>]*>",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount(); groupIdx++ ){
        System.out.println( "[" + groupIdx + "] = " + m.group(groupIdx));
      }
    }
  }
}

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Peter KwanCommented:
You may try replacing the carriage return in your string before you do a pattern matching with regular expression.
0
 
ddrudikCommented:
Any carraige returns would be in the character set of [^>]* so I would need to see the code used to understand the issue, I assume the pattern used is not as shown in my code post.
0
 
newbiealAuthor Commented:
Here is what I have:

String thisLine = "";
String regex = "<body[^>]*>";
Pattern p = Pattern.compile(regex);
Matcher m;
 
while ((thisLine = in.readLine()) != null) 
		{	
			
			m = p.matcher(thisLine.toLowerCase());
			//save each line read to new html file
			out.println(thisLine);
			while(m.find()){
				//add new content at this string position
				out.println(lineToBeInserted);
			}
			
		}

Open in new window

0
 
ddrudikCommented:
I would need to see what was in thisLine to see why it is not working, however the pattern given is correct to match a body tag regardless of content in the tag.
0
 
newbiealAuthor Commented:
thisLine contains the first line of this:

<body lang=EN-US
style='tab-interval:.5in'>
0
 
ddrudikCommented:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "<body lang=EN-US \r\n"+"style='tab-interval:.5in'>";
  System.out.println(sourcestring);
  Pattern re = Pattern.compile("<body[^>]*>",Pattern.CASE_INSENSITIVE);
  Matcher m = re.matcher(sourcestring);
    if(m.find()){
      System.out.println("[0] = " + m.group(0));
    }
  }
}

Open in new window

0
 
newbiealAuthor Commented:
Thanks, but I have to keep it more generic than that as this:

<body lang=EN-US
style='tab-interval:.5in'>

Some of the docs might just have this:
<body>

Or this:
<body style="">

and so on....
0
 
ddrudikCommented:
new bieal, read again line 7 in 22886439 and see that it will match "<body" followed by anything until finally a ">", the code shown would match all of your examples.

Feel free to change the sourcestring in my code example to any of your desired body tags and test it to see the sourcestring used and the match found.
0
 
Peter KwanCommented:
Of course that does not work. Since you have:

thisLine = "<body lang=EN-US";

and

thisLine = "style='tab-interval:.5in'> "

in two loops. You may consider the following:


			while ((thisLine = in.readLine()) != null) 
            {
				if (thisLine.indexOf("<body") >= 0) {
					if (thisLine.indexOf('>') > 0)
						; // add your content
					else {
						do {
							thisLine = in.readLine();
						} while (thisLine.indexOf(">") == -1);
						// add your content
					}
				}

Open in new window

0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 6
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now