Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 937
  • Last Modified:

How to resolve quot xml parsing issue by using C# .net

How to resolve quot xml parsing issue by using C# .net. For example, I have this node:


<description>&#60;p>&#60;a href="http://us.rd.yahoo.com/dailynews/rss/tech/*http://news.yahoo.com/s/afp/20091230/ts_alt_afp/usitcomputersecurityinternetmcafee">&#60;img src="http://d.yimg.com/a/p/afp/20091230/capt.photo_1262191861138-1-0.jpg?x=130&y=84&q=85&sig=XB8VuNiiCtOC5TYh6H441w--" align="left" height="84" width="130" alt="The logo of social networking website &#39;Facebook&#39;
is displayed on a computer screen.
&quot;The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals,&quot; McAfee said.(AFP/File/Leon Neal)" border="0" />&#60;/a>AFP - Social networks will face increasingly sophisticated hacker attacks in 2010 but law enforcement is expected to make strides in fighting cybercrime, according to Web security firm McAfee Labs.&#60;/p>&#60;br clear="all"/></description>


and the string

<b>&amp;#39;Facebook&amp;#39; </b>or the string

<b>quot;The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals,&amp;amp;quot;</b>

is causing the page to fail validation by HTML validator. Some of the errors about this specific line are:
#

    * Line 378, column &gt; 80: XML Parsing Error: attributes construct error

      &amp;/s/afp/20091230/ts_alt_afp/usitcomputersecurityinternetmcafee"&gt;&lt;img src="http&amp;

    * Line 378, column &gt; 80: XML Parsing Error: Couldn't find end of Start Tag img line 378

      &amp;/s/afp/20091230/ts_alt_afp/usitcomputersecurityinternetmcafee"&gt;&lt;img src="http&amp;

# Error X is not a member of a group specified for any attribute

     

    * Line 378, column 374: "The" is not a member of a group specified for any attribute

      &amp;isplayed on a computer screen. "The explosion of applications on Facebook and

    * Line 378, column 384: "explosion" is not a member of a group specified for any attribute

      &amp;n a computer screen. "The explosion of applications on Facebook and other ser

    * Line 378, column 387: "of" is not a member of a group specified for any attribute

      &amp; computer screen. "The explosion of applications on Facebook and other servic

    * Line 378, column 400: "applications" is not a member of a group specified for any attribute

      &amp;een. "The explosion of applications on Facebook and other services will be an

    * Line 378, column 403: "on" is not a member of a group specified for any attribute

      &amp;. "The explosion of applications on Facebook and other services will be an id

And this is happeneing as the node has &amp;#39;Facebook&amp;#39; or 'Facebook' in the alt message string of the image tag. I havet tried codes as below

        mystring  = str.Replace( "&amp;apos;", "'" );
        mystring = str.Replace( "&amp;quot;", "\"" );


But these two lines are generating more errors. Please tell me or give me suggestion what you know about this specific issue of validating where ' is involved. You can find the rss file at http://rss.news.yahoo.com/rss/tech


0
ashley2009
Asked:
ashley2009
  • 2
1 Solution
 
roeibCommented:
put the InnerText inside CDATA block, that will make that Xml Valid and help the C# Engine understand that no parsing on this is needed
0
 
ashley2009Author Commented:
Hello,

thank you for your response. There is no CDATA section in the specified node. I believe that my original post is confusing; therefore, I am stating my problem again more throughly:

The following node is not validating by HTML validator when I process the xml file, and the HTML validator generates 21 errors because of this node.

<description>

&#60;p>&#60;a href="http://us.rd.yahoo.com/dailynews/rss/tech/*http://news.yahoo.com/s/afp/20091230/ts_alt_afp/usitcomputersecurityinternetmcafee">&#60;img src="http://d.yimg.com/a/p/afp/20091230/capt.photo_1262191861138-1-0.jpg?x=130&y=84&q=85&sig=XB8VuNiiCtOC5TYh6H441w--" align="left" height="84" width="130" alt="The logo of social networking website &#39;Facebook&#39; is displayed on a computer screen. &quot;The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals,&quot; McAfee said.(AFP/File/Leon Neal)" border="0" />&#60;/a>AFP - Social networks will face increasingly sophisticated hacker attacks in 2010 but law enforcement is expected to make strides in fighting cybercrime, according to Web security firm McAfee Labs.&#60;/p>&#60;br clear="all"/>

</description>  

And I believe &quot;The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals,&quot;

is causing the problem as there are double quotes inside double quotes.

I have tried to replace the &quot; by space using codes like:

str = str.Replace("&quot;", "");

but does not work.

And causes more problem as it gets rid of all necessary quotes.

Any idea? Should I just try to capture the string

alt="The logo of social networking website &#39;Facebook&#39; is displayed on a computer screen. &quot;The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals,&quot; McAfee said.(AFP/File/Leon Neal)" border="0"

and modify the string accordingly or is there a better way of doing this by using c# .NET framework.

Some more information:

My news.aspx.cs file looks like

protected void Page_Load(object sender, EventArgs e){
        XmlDocument doc = new XmlDocument();
        doc.Load(@"C:\Users\sharmin\Documents\Visual Studio 2008\WebSites\GoDaddy\yahooNews.xml");
        XmlNodeList nodes = doc.SelectNodes("/rss/channel/item");
       
        XmlNode title = doc.SelectSingleNode("/rss/channel/title");
        XmlNode link = doc.SelectSingleNode("/rss/channel/link");
        HyperLink1.NavigateUrl = link.InnerText;
        HyperLink1.Text = title.InnerText;
        Repeater1.DataSource = nodes;
        Repeater1.DataBind();
    }
 
 
    public string GetString(string str)
    {
        str = str.Replace("&", "&amp;");
        str = str.Replace("&#39;", "");
        str = str.Replace("'", "");
        str = str.Replace("<", "<");
        str= str.Replace(">", ">");
        str = str.Replace("&#34;", "");
        str = str.Replace("&quot;", "");
     
       return str;
       
    }
       
}

My news.aspx looks like:

<asp:repeater id="Repeater1" runat="server">
                <HeaderTemplate></HeaderTemplate>
                <ItemTemplate>
               <h3><a href='<%#((System.Xml.XmlNode)Container.DataItem).SelectSingleNode("link").InnerText%>'>
                <%#((System.Xml.XmlNode)Container.DataItem).SelectSingleNode("title").InnerText%></a></h3>
                <%#GetString(((System.Xml.XmlNode)Container.DataItem).SelectSingleNode("description").InnerText)%>
                </ItemTemplate>
                <FooterTemplate></FooterTemplate>
</asp:repeater>


Please help if there were any fast and better way of solving this XML parsing issue, where quot; is inside quote; , by using c# .net framework. VB solution is also okay.




0
 
Todd GerbertIT ConsultantCommented:
I think roeib means to store your XML like below. I don't think it should matter there's single quotes inside double quotes, that's perfectly valid HTML and C# won't care about where quotes are because it's stored in a variable, not a string literal hard-coded.



<description>
  <![CDATA[
    <p><a href="http://us.rd.yahoo.com/dailynews/rss/tech/*http://news.yahoo.com/s/afp/20091230/ts_alt_afp/usitcomputersecurityinternetmcafee"><img src="http://d.yimg.com/a/p/afp/20091230/capt.photo_1262191861138-1-0.jpg?x=130&y=84&q=85&sig=XB8VuNiiCtOC5TYh6H441w--" align="left" height="84" width="130" alt="The logo of social networking website 'Facebook' is displayed on a computer screen. "The explosion of applications on Facebook and other services will be an ideal vector for cybercriminals," McAfee said.(AFP/File/Leon Neal)" border="0" /></a>AFP - Social networks will face increasingly sophisticated hacker attacks in 2010 but law enforcement is expected to make strides in fighting cybercrime, according to Web security firm McAfee Labs.</p><br clear="all"/>
  ]]>
</description>
0
 
ashley2009Author Commented:
Hello,

thank you for your input and suggestion. I am going to try the CDATA way of parsing also; however,

I have solved this problem by the following way and there is no parsing error from HTML validator. My method is:

public string GetString(string str)
    {
        str = str.Replace("&", "&amp;");
        str = str.Replace("<", "<");
        str = str.Replace(">", ">");
        str = str.Replace("&quot;", "");
        int result = str.IndexOf("alt=");
        if (result != -1)
        {  
            int length = str.Length;
            int j = str.IndexOf("alt=");
            int i = str.IndexOf("border=");
            int len = i - j;
            string newStr = str.Substring(j, len);
            int k = newStr.Length;
            string myNewStr = Server.HtmlEncode(str);
            string altStr = newStr;
            altStr = newStr.Substring(5, k - 8);
            altStr = altStr.Replace("\"", "'");
            myNewStr = Server.HtmlDecode(myNewStr);
            int a = str.IndexOf("border=");
            int b = str.IndexOf("<br clear=");
            int len3 = b - a;
            string endstring = str.Substring(a, len3 + 17);
            newStr = "alt=" + '"' + altStr + "\" "; ;
            string startString = str.Substring(0, j);
            string finalString = startString + newStr + endstring;
            string replaceString = "http://us.rd.yahoo.com/dailynews/rss/tech/*";
            finalString = finalString.Replace(replaceString, "");
            return finalString;
        }
        else
        {
            return str;
        }
    }

At above code str is the node. If the node has image tag, there will be specific string operation and if there were no image tag, there will be no string operation.
But I will try out your CDATA way of solving because I think that by CDATA way, there will be less coding.

Thank you for your suggestion.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now