?
Solved

xml and html data seperation from xml file

Posted on 2010-11-26
9
Medium Priority
?
738 Views
Last Modified: 2012-05-10
Hi experts,

I have following code and I need to seperate xml data for xml data type column and html data for nvarchar(max) column for sql server.
How can I seperate the data in c# like
1.
XML column
from:
<?xml version="1.0" encoding="utf-8"?>
to:(before <DataContent>)
<Format FormalName="XHTML"/>

2. nvarchar(max) column
from:
<DataContent>
sdfsf
</DataContent>

any help would be very much appreciated

Thanks alot
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE NewsML SYSTEM "http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.dtd">
<NewsML Version="1.2">
  <Catalog Href="http://www-test.com.catalog.xml" />
  <NewsEnvelope>
    <TransmissionId>1328088_</TransmissionId>
    <DateAndTime>20100930T110626+0200</DateAndTime>
    <NewsProduct FormalName="Regulatory Information Service"/>
  </NewsEnvelope>
  <NewsItem>
    <Identification>
      <NewsIdentifier>
        <ProviderId>test.com</ProviderId>
        <DateId>20100930</DateId>
        <NewsItemId>TEST1328088</NewsItemId>
        <RevisionId PreviousRevision="0" Update="N">1</RevisionId>
        <PublicIdentifier>urn:newsml:TEST.com:20100930:XX1328088:1</PublicIdentifier>
      </NewsIdentifier>
    </Identification>
    <NewsManagement>
      <NewsItemType FormalName="News"/>
      <FirstCreated>20100930T110626+0200</FirstCreated>
      <ThisRevisionCreated>20100930T110626+0200</ThisRevisionCreated>
      <Status FormalName="Usable"/>
      <Urgency FormalName="4"/>
      <Property FormalName="sst.3rdPartyStyleGuideVersion" Value="2.0" />
    <Property FormalName="category" Value="N" />
    <Property FormalName="ern" Value="N.A." />
    <Property FormalName="distributor" Value="SSS" />
    </NewsManagement>
    <NewsComponent xml:lang="en" Essential="no" EquivalentsList="no" Duid="NC00001">
      <TopicSet FormalName="Companies">
        <Topic Duid="T000001">
          <TopicType FormalName="Company"/>
          <FormalName Scheme="CompanyLongName"><![CDATA[Test Client]]></FormalName>
          <FormalName Scheme="CompanyShortName"><![CDATA[]]></FormalName>
      <FormalName Scheme="Country"><![CDATA[U.S.A.]]></FormalName>
      <FormalName Scheme="City"><![CDATA[Paris]]></FormalName>
          <FormalName Scheme="TIDM"></FormalName>
      <FormalName Scheme="USTIC"></FormalName>
          <FormalName Scheme="ISIN"></FormalName>
      <FormalName Scheme="ISIC"></FormalName>
          <FormalName Scheme="cRIC"></FormalName>
      <FormalName Scheme="CompanyUrl"></FormalName>
      <FormalName Scheme="GermanWkn"></FormalName>
      <FormalName Scheme="Sedol"></FormalName>
        </Topic>    
      </TopicSet>  
      <Role FormalName="Main"/>
      <NewsLines>
        <HeadLine><![CDATA[TEST RELEASE]]></HeadLine>
        <DateLine>London, September, 30, 2010</DateLine>
      </NewsLines>
      <AdministrativeMetadata>
        <Creator>     
          <Party FormalName="Test Client"/>  
        </Creator>
        <Source>
          <Party FormalName="Test Client"/>
        </Source>
      </AdministrativeMetadata>

      <RightsMetadata/>

      <DescriptiveMetadata>
        <Language FormalName="en"/>
        <TopicOccurrence Topic="#T000001"/>

        <TopicOccurrence Topic="#T00003" HowPresent="FSACategories"/>
        <TopicOccurrence Topic="#T00004" HowPresent="MediumImportance"/>
    
              <SubjectCode Duid="SC#1" HowPresent="Related" >
        <Subject Duid="S#1" HowPresent="Related" FormalName="Economy, Business And Finance" />
              <SubjectMatter Duid="S#1_SM#1" HowPresent="Related" FormalName="Company Information" />
                  <SubjectDetail Duid="S#1_SM#1_SD#1" HowPresent="Related" FormalName="Contract" />
                  <SubjectDetail Duid="S#1_SM#1_SD#2" HowPresent="Related" FormalName="Earnings" />
                    </SubjectCode>
        
    <TopicOccurrence Topic="#ICB_IN" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SU" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SE" HowPresent="ICBClasification"/>
      </DescriptiveMetadata>
    
      <ContentItem Duid="CI00001">
        <MediaType FormalName="text"/>
        <Format FormalName="XHTML"/>
<DataContent>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce"><head><style>* { font-family: Arial, Verdana, Helvetica; font-size: 13px;}
td { padding: 3px; }
}</style><title>TEST RELEASE</title></head><body class="TEST">   <p align="center" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><i class="TEST"><u class="TEST">You can disregard this test release. </u></i></p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><a class="TEST" href="http://www.test.com/" target="_blank">Test</a> text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test.</p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text. <a class="TEST" href="http://www.1.com/" target="_blank">www.1.com</a> <a class="TEST" href="mailto:test@1.com" target="_blank">test@1.com</a> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Quelques caractères spéciaux €, à, é, è, ç, ï, ë, í, ñ, %, &amp;, §, #,;, «, »</p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">TABLEAU 1</p>                                                                     </body></html>
</DataContent>
      </ContentItem>
    </NewsComponent>
  </NewsItem>
</NewsML>

Open in new window

0
Comment
Question by:saloj
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Accepted Solution

by:
WesWilson earned 668 total points
ID: 34219658
I suppose you have a couple of options.

1. You could place CDATA tags inside the <DataContent> tags to surround the HTML content.

2. If you have no control over the data you are receiving, but need to parse it, you could load it into a string and split the string on <DataContent> and </DataContent> as your delimiters. Combine string 0 and 2 to go into your XML column, and place string 1 in your nvarchar column. Just remember to add the <DataContent> tags back into the XML string.

There are other ways to parse it, but the Split function should do well if you know the exact tag name.
0
 
LVL 20

Assisted Solution

by:BuggyCoder
BuggyCoder earned 668 total points
ID: 34219661
You have to parse this XML Document using the powerful .net framework's XML APIs in System.XML Namespace.
1. first create a XMLDocument object out of this xml data.
2. then try to read DataContent ChildNode and its contents.
3. GEt the value of DataContent child node in some of your local variable.
4. After reading, remove the child using removechild function.
5. Now rest of the data in XMLDocument is your XML Data.
try doing its innerXML and save it in another field.

here's how XMLDocument works:-
http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=312
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
http://www.codeproject.com/KB/cpp/myXPath.aspx

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34219693
Here's a Linq to XML way:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.ToString();

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 8

Expert Comment

by:WesWilson
ID: 34219699
My concern with BuggyCoder's solution is that you need valid XML to load into the XmlDocument object. If that works, his solution is good, but if you are not guaranteed to have the HTML section parseable as XML, then you would need CDATA tags or another solution.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 664 total points
ID: 34219703
Forgot to note:

In my example, textBox1 was the source XML string, textBox2 is the XML only, and textBox3 is the HTML. The HTML still has the <DataContent> wrapping it, so you would have to trim it off. This trims it also:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.FirstNode.ToString();

Open in new window

0
 
LVL 2

Author Comment

by:saloj
ID: 34230255
HI Guys,

Thank you for your response.
I have the above xml content files, which i need to store into sql server 2005 and also display the xml data for website.

I tried the following ways and still having difficulties to figure out what exactly should I do.

1.
I tried to separate xml data on (xml data type) and html content on (nvarchar(max)) on sql server 2005 to avoid illegal character from html content. But xml content also can have illegal characters so I could not store xml content on xml data type.

2.
Now I tried  to store the all xml content into nvarchar(max) data type and having difficulties to query the xml data.
is it possible to query the xml data from nvarchar column?

3.
what could be the best way to store the xml intact into database and query the xml data do display for websites.

any help would be very much appreciated.

Many Thanks


0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34230367
Well your original question asked how to separate in C#. Are you saying that you would rather separate it on the DB side? Given the above methods, you should be able to do an insert query passing the separated parts to the appropriate columns.
0
 
LVL 20

Expert Comment

by:BuggyCoder
ID: 34231484
you can create an XML Column to store the XML Data, to query the same you need to know a bit of XQuery language which is fully supported by SQL Server 2005. here is an article to query the XML Data strored in XML Column in database:
http://www.15seconds.com/issue/050803.htm


@saloj : i would request you to close your current question and post this one in a new thread.

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34231566
>>  XQuery language which is fully supported by SQL Server 2005

It's not *fully* supported. For example, "let" is not supported in FLOWR clauses.

http://msdn.microsoft.com/en-us/library/ms345122%28SQL.90%29.aspx#sql2k5_xqueryintro_topic3
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
High user turnover can cause old/redundant user data to consume valuable space. UserResourceCleanup was developed to address this by automatically deleting user folders when the user account is deleted.
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an anti-spam), the admin…
Look below the covers at a subform control , and the form that is inside it. Explore properties and see how easy it is to aggregate, get statistics, and synchronize results for your data. A Microsoft Access subform is used to show relevant calcul…
Suggested Courses

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question