.net c# regex for extractingh H1=5 header tag content from HTML string

Looking to extract from a HTML string using Regex in .net c# all the Header H1 to H5 tags and their inner text

In the H1 tags they can be like

<h1>normal h1 heading </h1>
<H1>upper case  h1 heading </H1>
< h1>heading with space before h1</h1>
<h1 class=etc>heading with class reference or other string</h1>
<h1 >with space after h1</h1>

regex would have to cope with extracing all h1,h2,h3,h4,h5 tags.

Any help would be appreciated
stephenwildeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Karrtik IyerSoftware ArchitectCommented:
Are you specifically looking for regex or you are open to using html parsers in c#, which can do complete DOM based parsing and also use power of xpath?
There are html parsers such as HTMLAgilityPack which does such job for you. ( http://htmlagilitypack.codeplex.com)
0
stephenwildeAuthor Commented:
Thanks trying to use regex first as it will deal with all Header tages h1,h2,h3,h4,h5 content in one code line rather than running several lines and more complex as I want to list header content in the order they are in the HTML string

plus regex will deal with malformed or variances as outlined in my original question, if it is well formed.

I just don't have the knowledge of the syntax of regex to achieve the desired result.
0
Fernando SotoRetiredCommented:
Hi  stephenwilde;

This Regex pattern will do what you need.
string pattern = "(?i)<(?'h'h)[12345].+(?<!/${h})"; 

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
stephenwildeAuthor Commented:
Thanks very much it looks like it has worked on all variations
0
Fernando SotoRetiredCommented:
Not a problem, glad to help.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
HTML

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.