Solved

xml and html data seperation from xml file

Posted on 2010-11-26
9
688 Views
Last Modified: 2012-05-10
Hi experts,

I have following code and I need to seperate xml data for xml data type column and html data for nvarchar(max) column for sql server.
How can I seperate the data in c# like
1.
XML column
from:
<?xml version="1.0" encoding="utf-8"?>
to:(before <DataContent>)
<Format FormalName="XHTML"/>

2. nvarchar(max) column
from:
<DataContent>
sdfsf
</DataContent>

any help would be very much appreciated

Thanks alot
<?xml version="1.0" encoding="utf-8"?>

<!DOCTYPE NewsML SYSTEM "http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.dtd">

<NewsML Version="1.2">

  <Catalog Href="http://www-test.com.catalog.xml" />

  <NewsEnvelope>

    <TransmissionId>1328088_</TransmissionId>

    <DateAndTime>20100930T110626+0200</DateAndTime>

    <NewsProduct FormalName="Regulatory Information Service"/>

  </NewsEnvelope>

  <NewsItem>

    <Identification>

      <NewsIdentifier>

        <ProviderId>test.com</ProviderId>

        <DateId>20100930</DateId>

        <NewsItemId>TEST1328088</NewsItemId>

        <RevisionId PreviousRevision="0" Update="N">1</RevisionId>

        <PublicIdentifier>urn:newsml:TEST.com:20100930:XX1328088:1</PublicIdentifier>

      </NewsIdentifier>

    </Identification>

    <NewsManagement>

      <NewsItemType FormalName="News"/>

      <FirstCreated>20100930T110626+0200</FirstCreated>

      <ThisRevisionCreated>20100930T110626+0200</ThisRevisionCreated>

      <Status FormalName="Usable"/>

      <Urgency FormalName="4"/>

      <Property FormalName="sst.3rdPartyStyleGuideVersion" Value="2.0" />

    <Property FormalName="category" Value="N" />

    <Property FormalName="ern" Value="N.A." />

    <Property FormalName="distributor" Value="SSS" />

    </NewsManagement>

    <NewsComponent xml:lang="en" Essential="no" EquivalentsList="no" Duid="NC00001">

      <TopicSet FormalName="Companies">

        <Topic Duid="T000001">

          <TopicType FormalName="Company"/>

          <FormalName Scheme="CompanyLongName"><![CDATA[Test Client]]></FormalName>

          <FormalName Scheme="CompanyShortName"><![CDATA[]]></FormalName>

      <FormalName Scheme="Country"><![CDATA[U.S.A.]]></FormalName>

      <FormalName Scheme="City"><![CDATA[Paris]]></FormalName>

          <FormalName Scheme="TIDM"></FormalName>

      <FormalName Scheme="USTIC"></FormalName>

          <FormalName Scheme="ISIN"></FormalName>

      <FormalName Scheme="ISIC"></FormalName>

          <FormalName Scheme="cRIC"></FormalName>

      <FormalName Scheme="CompanyUrl"></FormalName>

      <FormalName Scheme="GermanWkn"></FormalName>

      <FormalName Scheme="Sedol"></FormalName>

        </Topic>    

      </TopicSet>  

      <Role FormalName="Main"/>

      <NewsLines>

        <HeadLine><![CDATA[TEST RELEASE]]></HeadLine>

        <DateLine>London, September, 30, 2010</DateLine>

      </NewsLines>

      <AdministrativeMetadata>

        <Creator>     

          <Party FormalName="Test Client"/>  

        </Creator>

        <Source>

          <Party FormalName="Test Client"/>

        </Source>

      </AdministrativeMetadata>



      <RightsMetadata/>



      <DescriptiveMetadata>

        <Language FormalName="en"/>

        <TopicOccurrence Topic="#T000001"/>



        <TopicOccurrence Topic="#T00003" HowPresent="FSACategories"/>

        <TopicOccurrence Topic="#T00004" HowPresent="MediumImportance"/>

    

              <SubjectCode Duid="SC#1" HowPresent="Related" >

        <Subject Duid="S#1" HowPresent="Related" FormalName="Economy, Business And Finance" />

              <SubjectMatter Duid="S#1_SM#1" HowPresent="Related" FormalName="Company Information" />

                  <SubjectDetail Duid="S#1_SM#1_SD#1" HowPresent="Related" FormalName="Contract" />

                  <SubjectDetail Duid="S#1_SM#1_SD#2" HowPresent="Related" FormalName="Earnings" />

                    </SubjectCode>

        

    <TopicOccurrence Topic="#ICB_IN" HowPresent="ICBClasification"/>

    <TopicOccurrence Topic="#ICB_SU" HowPresent="ICBClasification"/>

    <TopicOccurrence Topic="#ICB_SE" HowPresent="ICBClasification"/>

      </DescriptiveMetadata>

    

      <ContentItem Duid="CI00001">

        <MediaType FormalName="text"/>

        <Format FormalName="XHTML"/>

<DataContent>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce"><head><style>* { font-family: Arial, Verdana, Helvetica; font-size: 13px;}

td { padding: 3px; }

}</style><title>TEST RELEASE</title></head><body class="TEST">   <p align="center" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><i class="TEST"><u class="TEST">You can disregard this test release. </u></i></p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><a class="TEST" href="http://www.test.com/" target="_blank">Test</a> text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test.</p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text. <a class="TEST" href="http://www.1.com/" target="_blank">www.1.com</a> <a class="TEST" href="mailto:test@1.com" target="_blank">test@1.com</a> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Quelques caractères spéciaux €, à, é, è, ç, ï, ë, í, ñ, %, &amp;, §, #,;, «, »</p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">TABLEAU 1</p>                                                                     </body></html>

</DataContent>

      </ContentItem>

    </NewsComponent>

  </NewsItem>

</NewsML>

Open in new window

0
Comment
Question by:saloj
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Accepted Solution

by:
WesWilson earned 167 total points
ID: 34219658
I suppose you have a couple of options.

1. You could place CDATA tags inside the <DataContent> tags to surround the HTML content.

2. If you have no control over the data you are receiving, but need to parse it, you could load it into a string and split the string on <DataContent> and </DataContent> as your delimiters. Combine string 0 and 2 to go into your XML column, and place string 1 in your nvarchar column. Just remember to add the <DataContent> tags back into the XML string.

There are other ways to parse it, but the Split function should do well if you know the exact tag name.
0
 
LVL 20

Assisted Solution

by:BuggyCoder
BuggyCoder earned 167 total points
ID: 34219661
You have to parse this XML Document using the powerful .net framework's XML APIs in System.XML Namespace.
1. first create a XMLDocument object out of this xml data.
2. then try to read DataContent ChildNode and its contents.
3. GEt the value of DataContent child node in some of your local variable.
4. After reading, remove the child using removechild function.
5. Now rest of the data in XMLDocument is your XML Data.
try doing its innerXML and save it in another field.

here's how XMLDocument works:-
http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=312
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
http://www.codeproject.com/KB/cpp/myXPath.aspx

hope it helps
:-)
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34219693
Here's a Linq to XML way:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.ToString();

Open in new window

0
 
LVL 8

Expert Comment

by:WesWilson
ID: 34219699
My concern with BuggyCoder's solution is that you need valid XML to load into the XmlDocument object. If that works, his solution is good, but if you are not guaranteed to have the HTML section parseable as XML, then you would need CDATA tags or another solution.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 166 total points
ID: 34219703
Forgot to note:

In my example, textBox1 was the source XML string, textBox2 is the XML only, and textBox3 is the HTML. The HTML still has the <DataContent> wrapping it, so you would have to trim it off. This trims it also:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.FirstNode.ToString();

Open in new window

0
 
LVL 2

Author Comment

by:saloj
ID: 34230255
HI Guys,

Thank you for your response.
I have the above xml content files, which i need to store into sql server 2005 and also display the xml data for website.

I tried the following ways and still having difficulties to figure out what exactly should I do.

1.
I tried to separate xml data on (xml data type) and html content on (nvarchar(max)) on sql server 2005 to avoid illegal character from html content. But xml content also can have illegal characters so I could not store xml content on xml data type.

2.
Now I tried  to store the all xml content into nvarchar(max) data type and having difficulties to query the xml data.
is it possible to query the xml data from nvarchar column?

3.
what could be the best way to store the xml intact into database and query the xml data do display for websites.

any help would be very much appreciated.

Many Thanks


0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34230367
Well your original question asked how to separate in C#. Are you saying that you would rather separate it on the DB side? Given the above methods, you should be able to do an insert query passing the separated parts to the appropriate columns.
0
 
LVL 20

Expert Comment

by:BuggyCoder
ID: 34231484
you can create an XML Column to store the XML Data, to query the same you need to know a bit of XQuery language which is fully supported by SQL Server 2005. here is an article to query the XML Data strored in XML Column in database:
http://www.15seconds.com/issue/050803.htm


@saloj : i would request you to close your current question and post this one in a new thread.

hope it helps
:-)
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34231566
>>  XQuery language which is fully supported by SQL Server 2005

It's not *fully* supported. For example, "let" is not supported in FLOWR clauses.

http://msdn.microsoft.com/en-us/library/ms345122%28SQL.90%29.aspx#sql2k5_xqueryintro_topic3
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now