We help IT Professionals succeed at work.

regex in c#

dkim18
dkim18 asked
on
I am trying to remove certain html tags and save the value.

ex) <customerid><style face="normal" font="default">d33333</style></customerid>


There are some other tags I want to remove. <tag> has some attributes.
<tag att='dfdff" att2="fdfdf">

Can you help?
Comment
Watch Question

Meir RivkinFull stack Software Engineer
CERTIFIED EXPERT

Commented:
can u post the html?
which data you need exactly?
Terry WoodsWeb Developer, specialising in WordPress
CERTIFIED EXPERT
Most Valuable Expert 2011
Commented:
You can only use a regex if your tags aren't nested.

eg with the following case we would need to remove the 2nd </div> tag without removing the first one if we were only targeting div tags with atttribute att="dfdff":

<div att="dfdff" att2="fdfdf"><div att="somethingelse">content</div>other content</div>

Working out that the 2nd </div> needs to be removed but not the first is a task for a parser, not a regex. If you are happy with the limitation that tags can't be nested, then I should be able to provide a regex. Let me know.
anarki_jimbelSenior Developer
CERTIFIED EXPERT
Commented:
Is your html text big enough?
In other words, are you sure that regex is a right solution? Unfortunately regex is known to have pretty bad performance... (e.g., http://www.codinghorror.com/blog/2006/01/regex-performance.html).
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015
Commented:
Unfortunately regex is known to have pretty bad performance...
That depends, I think, on how you structure the regex. The "catastrophic backtracking" referenced in the article would be an example of a poorly-designed regex.
Senior consultant
CERTIFIED EXPERT
Commented:
The way to process files with this kind of structure is with the XPath libraries. HTML can be easily converted into XML files, and then parsed in the way you want.

See several examples of C# and .NET here: http://www.java2s.com/Tutorial/CSharp/0540__XML/0380__XmlPathNavigator.htm
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
HTML can be easily converted into XML files, and then parsed in the way you want.
...provided the HTML is actually valid XML (structurally).
Pierre FrançoisSenior consultant
CERTIFIED EXPERT

Commented:
@kaufman: If the HTML is valid XML (XHTML), you don't need to convert it. My statement is that valid HTML can be converted into XHTML, which is valid XML.
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
And I agree, but your last post doesn't say "valid" HTML  = )

Explore More ContentExplore courses, solutions, and other research materials related to this topic.