regex in c#

I am trying to remove certain html tags and save the value.

ex) <customerid><style face="normal" font="default">d33333</style></customerid>


There are some other tags I want to remove. <tag> has some attributes.
<tag att='dfdff" att2="fdfdf">

Can you help?
dkim18Asked:
Who is Participating?
 
Pierre FrançoisConnect With a Mentor Senior consultantCommented:
The way to process files with this kind of structure is with the XPath libraries. HTML can be easily converted into XML files, and then parsed in the way you want.

See several examples of C# and .NET here: http://www.java2s.com/Tutorial/CSharp/0540__XML/0380__XmlPathNavigator.htm
0
 
Meir RivkinFull stack Software EngineerCommented:
can u post the html?
which data you need exactly?
0
 
Terry WoodsConnect With a Mentor IT GuruCommented:
You can only use a regex if your tags aren't nested.

eg with the following case we would need to remove the 2nd </div> tag without removing the first one if we were only targeting div tags with atttribute att="dfdff":

<div att="dfdff" att2="fdfdf"><div att="somethingelse">content</div>other content</div>

Working out that the 2nd </div> needs to be removed but not the first is a task for a parser, not a regex. If you are happy with the limitation that tags can't be nested, then I should be able to provide a regex. Let me know.
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
anarki_jimbelConnect With a Mentor Commented:
Is your html text big enough?
In other words, are you sure that regex is a right solution? Unfortunately regex is known to have pretty bad performance... (e.g., http://www.codinghorror.com/blog/2006/01/regex-performance.html).
0
 
käµfm³d 👽Connect With a Mentor Commented:
Unfortunately regex is known to have pretty bad performance...
That depends, I think, on how you structure the regex. The "catastrophic backtracking" referenced in the article would be an example of a poorly-designed regex.
0
 
käµfm³d 👽Commented:
HTML can be easily converted into XML files, and then parsed in the way you want.
...provided the HTML is actually valid XML (structurally).
0
 
Pierre FrançoisSenior consultantCommented:
@kaufman: If the HTML is valid XML (XHTML), you don't need to convert it. My statement is that valid HTML can be converted into XHTML, which is valid XML.
0
 
käµfm³d 👽Commented:
And I agree, but your last post doesn't say "valid" HTML  = )
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.