Solved

parse illegal characters

Posted on 2011-02-16
10
348 Views
Last Modified: 2012-05-11
Hi Experts,
I have following html text as a sample, which I need to parse for illegal characters.
But the parse should be only for html values and not for html tags.

eg.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce">
<body>
<p class ="test">afa sf  & as ffaf</p><p class="ss">adssfa fa sfasf <br />"<i>sds</i>"</p>
</body>
</html>


any help would be appreciate

Thanks
0
Comment
Question by:saloj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 9

Expert Comment

by:s_chilkury
ID: 34913824
Check the following:
http://www.codeproject.com/Articles/57176/Parsing-HTML-Tags-in-Csharp.aspx

Also, you can use HTMLAgility Pack which does the same.
0
 
LVL 11

Expert Comment

by:Sudhakar Pulivarthi
ID: 34914037
Hi Saloj,

i am attaching the code which will parse the values present in the HTML data.
Please look it might be useful. In need of further expantion in code will do. What are those ilegal chars ur looking for? and what you want to do with those chars?
public static string ParseTheValuesInHTML(string htmlData)
        {
            int startIndex = 0;

            if (htmlData.Contains(">"))
            {
                // Process till last occurance of the <.
                while (startIndex >= 0)
                {
                    int nextCount = htmlData.IndexOf("<", startIndex + 1);
                    int lastIndex = nextCount - htmlData.IndexOf(">", startIndex);

                    if (nextCount > 0 && lastIndex > 0)
                    {
                        // Find the text between > and <. 
                        string value = htmlData.Substring(htmlData.IndexOf(">", startIndex) + 1, lastIndex - 1);

                        if (value != "")
                        {
                            // The value string is what you are looking for between HTML tags.
                            // Here you can verify any illegal chars present in it and process as u want.
                        }
                    }

                    // Find the next tag.
                    startIndex = htmlData.IndexOf(">", startIndex + 1);
                }
            }

            return htmlData;
        }

Open in new window

0
Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

 
LVL 2

Author Comment

by:saloj
ID: 34914039
Hi EE, i have following string, when I parse it with the following code, it parse all data including the html tags also. But I only need to parse value.
can anybody help me on the code.


string strA = "<html xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:mce=\"mce\"><head><style></style><title>sadfafsfsdf asdf asd sadf saf</title></head><body class=\"hhh\"><p style=\"margin-top: 0pt; margin-right: 0pt; margin-bottom: 0pt; margin-left: 0pt\">content has posted the latest <a href=\"http://www.xyz.com/media/763/Dynacor_Gold_Mines_TSX:_DNG_News_Alert/\" target=\"_blank\">video</a> \"News Alert\" for Dynacor Gold Mines Inc. If the link is unavailable, please visit <a href=\"http://www.xyz.com/\" target=\"_blank\">www.xyz.com</a> and enter \"Dynacor\" in the search box.</p><p class=\"hhh\" style=\"margin-top: 0pt; margin-right: 0pt;  margin-left: 0pt; margin-bottom:0pt;\">According to Metanor&apos;s press release:<br class=\"hhh\" />\"<i class=\"hhh\">S of US$500 (the \"Per Ounce Payments\") and the then prevailing market price per ounce of gold. Sandstorm will (i) US$5 million upon signing of the </i><i class=\"hhh\">agreement, (ii) US$9 million,once Metan</i>.\"</p></body></html>";
ParseTheValuesInXML(strA);

public static string ParseTheValuesInXML(string xmlData)
        {
            int startIndex = 0;

            if (xmlData.Contains(">"))
            {
                // Process till last occurance of the <.
                while (startIndex >= 0)
                {
                    int nextCount = xmlData.IndexOf("<", startIndex + 1);
                    //int lastIndex = nextCount - xmlData.IndexOf(">", startIndex);

                    //if (nextCount > 0 && lastIndex > 0)
                    if (nextCount > 0)
                    {
                        // Find the text between > and <. 
                        string value = xmlData.Substring(xmlData.IndexOf(">", startIndex) + 1, nextCount - xmlData.IndexOf(">", startIndex) - 1);
                        if (value != "")
                        {                            
                            // Replace the text with xml special data.
                            xmlData = xmlData.Replace(value, XmlSpecial(value));
                        }
                    }

                    // Find the next tag.
                    startIndex = xmlData.IndexOf(">", startIndex + 1);
                }
            }
            return xmlData;
        }


    private static string XmlSpecial(string strTxt)
        {
            if (strTxt.Contains("&"))
            {
                // Convert value to all special xml character.
                strTxt = strTxt.Replace("&amp;", "&");
                // Start replacing the specials.
                strTxt = strTxt.Replace("&", "&amp;");
                strTxt = strTxt.Replace("&amp;apos;", "&apos;");
                strTxt = strTxt.Replace("&amp;quot;", "&quot;");
                strTxt = strTxt.Replace("&amp;lt;", "&lt;");
                strTxt = strTxt.Replace("&amp;gt;", "&gt;");
            }
            if (strTxt.Contains("<"))
            {
                strTxt = strTxt.Replace("<", "&lt;");
            }
            if (strTxt.Contains(">"))
            {
                strTxt = strTxt.Replace(">", "&gt;");
            }
            if (strTxt.Contains("\""))
            {
                strTxt = strTxt.Replace("\"", "&quot;");
            }
            if (strTxt.Contains("'"))
            {
                strTxt = strTxt.Replace("'", "&apos;");
            }
            return strTxt;
        }

Open in new window

0
 
LVL 20

Expert Comment

by:informaniac
ID: 34914085
Html values?
0
 
LVL 2

Author Comment

by:saloj
ID: 34914095
Hi Sudhakar,
Thanks for ur quick response. I am using your code from previous solution.
on the following code when I have value = "\"";
it is going to replace all  to &quot; how can I filter it?

if (value != "")
           {
          htmlData = htmlData.Replace(value, XmlSpecial(value));
         }
0
 
LVL 11

Accepted Solution

by:
Sudhakar Pulivarthi earned 500 total points
ID: 34914160
Hi Saloj,

Please replace this statement in the code.

xmlData = xmlData.Replace(">" + value + "<", ">" + XmlSpecial(value) + "<");

This will work!!!
0
 
LVL 11

Expert Comment

by:Sudhakar Pulivarthi
ID: 34914179

Actually, Since one of the tag value was " (quote) so as part of our previous code, The " was replaced with &quot; in the source string. This caused to replace all the occurance of " even in the tags also.

Hence now the fix made is to include the boundaries also with the value so that the replace happens only in the parsed value in the whole string.

Happy Working... take care
0
 
LVL 2

Author Comment

by:saloj
ID: 34923312
Thanks Sudhakar!
you have great technique!

Happy working!
Take care
0
 
LVL 2

Author Closing Comment

by:saloj
ID: 34923316
EXCELLENT !!!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question