regex in c#

I am trying to remove certain html tags and save the value.

ex) <customerid><style face="normal" font="default">d33333</style></customerid>


There are some other tags I want to remove. <tag> has some attributes.
<tag att='dfdff" att2="fdfdf">

Can you help?
dkim18Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Meir RivkinFull stack Software EngineerCommented:
can u post the html?
which data you need exactly?
0
Terry WoodsIT GuruCommented:
You can only use a regex if your tags aren't nested.

eg with the following case we would need to remove the 2nd </div> tag without removing the first one if we were only targeting div tags with atttribute att="dfdff":

<div att="dfdff" att2="fdfdf"><div att="somethingelse">content</div>other content</div>

Working out that the 2nd </div> needs to be removed but not the first is a task for a parser, not a regex. If you are happy with the limitation that tags can't be nested, then I should be able to provide a regex. Let me know.
0
anarki_jimbelSenior DeveloperCommented:
Is your html text big enough?
In other words, are you sure that regex is a right solution? Unfortunately regex is known to have pretty bad performance... (e.g., http://www.codinghorror.com/blog/2006/01/regex-performance.html).
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

käµfm³d 👽Commented:
Unfortunately regex is known to have pretty bad performance...
That depends, I think, on how you structure the regex. The "catastrophic backtracking" referenced in the article would be an example of a poorly-designed regex.
0
Pierre FrançoisSenior consultantCommented:
The way to process files with this kind of structure is with the XPath libraries. HTML can be easily converted into XML files, and then parsed in the way you want.

See several examples of C# and .NET here: http://www.java2s.com/Tutorial/CSharp/0540__XML/0380__XmlPathNavigator.htm
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
käµfm³d 👽Commented:
HTML can be easily converted into XML files, and then parsed in the way you want.
...provided the HTML is actually valid XML (structurally).
0
Pierre FrançoisSenior consultantCommented:
@kaufman: If the HTML is valid XML (XHTML), you don't need to convert it. My statement is that valid HTML can be converted into XHTML, which is valid XML.
0
käµfm³d 👽Commented:
And I agree, but your last post doesn't say "valid" HTML  = )
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.