Solved

regex in c#

Posted on 2012-03-12
8
308 Views
Last Modified: 2012-03-16
I am trying to remove certain html tags and save the value.

ex) <customerid><style face="normal" font="default">d33333</style></customerid>


There are some other tags I want to remove. <tag> has some attributes.
<tag att='dfdff" att2="fdfdf">

Can you help?
0
Comment
Question by:dkim18
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 37711619
can u post the html?
which data you need exactly?
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 50 total points
ID: 37711960
You can only use a regex if your tags aren't nested.

eg with the following case we would need to remove the 2nd </div> tag without removing the first one if we were only targeting div tags with atttribute att="dfdff":

<div att="dfdff" att2="fdfdf"><div att="somethingelse">content</div>other content</div>

Working out that the 2nd </div> needs to be removed but not the first is a task for a parser, not a regex. If you are happy with the limitation that tags can't be nested, then I should be able to provide a regex. Let me know.
0
 
LVL 30

Assisted Solution

by:anarki_jimbel
anarki_jimbel earned 50 total points
ID: 37712127
Is your html text big enough?
In other words, are you sure that regex is a right solution? Unfortunately regex is known to have pretty bad performance... (e.g., http://www.codinghorror.com/blog/2006/01/regex-performance.html).
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 50 total points
ID: 37712136
Unfortunately regex is known to have pretty bad performance...
That depends, I think, on how you structure the regex. The "catastrophic backtracking" referenced in the article would be an example of a poorly-designed regex.
0
 
LVL 10

Accepted Solution

by:
pfrancois earned 350 total points
ID: 37713959
The way to process files with this kind of structure is with the XPath libraries. HTML can be easily converted into XML files, and then parsed in the way you want.

See several examples of C# and .NET here: http://www.java2s.com/Tutorial/CSharp/0540__XML/0380__XmlPathNavigator.htm
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37715591
HTML can be easily converted into XML files, and then parsed in the way you want.
...provided the HTML is actually valid XML (structurally).
0
 
LVL 10

Expert Comment

by:pfrancois
ID: 37715679
@kaufman: If the HTML is valid XML (XHTML), you don't need to convert it. My statement is that valid HTML can be converted into XHTML, which is valid XML.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37715766
And I agree, but your last post doesn't say "valid" HTML  = )
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question