jimmieandersson
asked on
Simple RegEx to get only content between tags over multiple lines
Hi,
I'm new to regular expressions and believe this is really easy to someone that is good at it =)
I'm searching a html document for a title that's inside <h1> tags.
it could look something like this:
I tried this simple RegEx: <h1>(.*)</h1>
But it doesn't work over multiple lines. And I'm also worried that it will match from the first <h1> tag in document to the last (if multiple) </h1>?
And, I only want the result to be: Here is the title I want and it could span over multiple lines, not including the <h1> and </h1>
My code is:
Thanks for any help.
I'm new to regular expressions and believe this is really easy to someone that is good at it =)
I'm searching a html document for a title that's inside <h1> tags.
it could look something like this:
text text text
text <h1>Here is the title I want
and it could span over multiple lines</h1>
more text
I tried this simple RegEx: <h1>(.*)</h1>
But it doesn't work over multiple lines. And I'm also worried that it will match from the first <h1> tag in document to the last (if multiple) </h1>?
And, I only want the result to be: Here is the title I want and it could span over multiple lines, not including the <h1> and </h1>
My code is:
var titleMatch = new Regex("<h1>(.*)</h1>", RegexOptions.IgnoreCase).Match(htmlInput);
Thanks for any help.
How about:
"<h1>[^<]*</h1>"
"<h1>[^<]*</h1>"
Actually I got this to work, escaping is not needed. I am testing the text in the Div:-
eg Use "i" for case insensitivity
<script>
var titleMatch = new RegExp("<h1>(.*)</h1>","i" );
</script>
<div id="myDiv">
<p>some text </p>
<h1>Some text in the tag
and some more and
some more</h1>
<b>
<p>Some text outside</b> </p>
<h1>Some more text in the tag and some more and some more</h1>
<p>Some randomstuff </p>
</div>
<script>
str=document.getElementByI d("myDiv") .innerHTML
alert(str.match(titleMatch ))
</script>
eg Use "i" for case insensitivity
<script>
var titleMatch = new RegExp("<h1>(.*)</h1>","i"
</script>
<div id="myDiv">
<p>some text </p>
<h1>Some text in the tag
and some more and
some more</h1>
<b>
<p>Some text outside</b> </p>
<h1>Some more text in the tag and some more and some more</h1>
<p>Some randomstuff </p>
</div>
<script>
str=document.getElementByI
alert(str.match(titleMatch
</script>
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
P.S.
Do note that my suggestion is only going to work as intended if the <h1>'s are balanaced--meaning each <h1> has a corresponding </h1>. If they are not balanced, then you will get unexpected results. Regex is not really good for parsing HTML.
Do note that my suggestion is only going to work as intended if the <h1>'s are balanaced--meaning each <h1> has a corresponding </h1>. If they are not balanced, then you will get unexpected results. Regex is not really good for parsing HTML.
You are far better using document.getElementsByTagN ame("h1"), then pulling out the innerHTML of each tag occurrence. eg something like this:-
<body>
<p>some text </p>
<h1>Some first text in the tag</h1>
<p><b>Some text outside</b> </p>
<h1>Some second text in the tag </h1>
<p>Some randomstuff outside tags </p>
<h1>Some third text in the tag </h1>
<script>
strings=document.getElementsByTagName("h1")
for (i=0;i<strings.length;i++)
alert(strings[i].innerHTML)
</script>
</body>
ASKER
Thank you everyone.
I need to do this server side in .net so I haven't tried the javascript methods.
But I tried the following suggestions:
However, those I wrote WORKING on, gives me all
<h1>Here is the title I want and it could span over multiple lines</h1>
I would only like to get
Here is the title I want and it could span over multiple lines
This must be possible, right? (do not want to replace or remove result afterwords)
I need to do this server side in .net so I haven't tried the javascript methods.
But I tried the following suggestions:
\<h1\>(.*)\<\/h1\> (NOT WORKING, ENDS AT LAST </h1>)
<h1>[^<]*</h1> (WORKING)
and this one with SingleLine.
var titleMatch = new Regex("<h1>(.*?)</h1>", RegexOptions.IgnoreCase | RegexOptions.SingleLine).Match(htmlInput); (WORKING)
However, those I wrote WORKING on, gives me all
<h1>Here is the title I want and it could span over multiple lines</h1>
I would only like to get
Here is the title I want and it could span over multiple lines
This must be possible, right? (do not want to replace or remove result afterwords)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
It works! Thank you
new Regex("\<h1\>(.*)\<\/h1\>"