Solved

xml and html data seperation from xml file

Posted on 2010-11-26
9
719 Views
Last Modified: 2012-05-10
Hi experts,

I have following code and I need to seperate xml data for xml data type column and html data for nvarchar(max) column for sql server.
How can I seperate the data in c# like
1.
XML column
from:
<?xml version="1.0" encoding="utf-8"?>
to:(before <DataContent>)
<Format FormalName="XHTML"/>

2. nvarchar(max) column
from:
<DataContent>
sdfsf
</DataContent>

any help would be very much appreciated

Thanks alot
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE NewsML SYSTEM "http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.dtd">
<NewsML Version="1.2">
  <Catalog Href="http://www-test.com.catalog.xml" />
  <NewsEnvelope>
    <TransmissionId>1328088_</TransmissionId>
    <DateAndTime>20100930T110626+0200</DateAndTime>
    <NewsProduct FormalName="Regulatory Information Service"/>
  </NewsEnvelope>
  <NewsItem>
    <Identification>
      <NewsIdentifier>
        <ProviderId>test.com</ProviderId>
        <DateId>20100930</DateId>
        <NewsItemId>TEST1328088</NewsItemId>
        <RevisionId PreviousRevision="0" Update="N">1</RevisionId>
        <PublicIdentifier>urn:newsml:TEST.com:20100930:XX1328088:1</PublicIdentifier>
      </NewsIdentifier>
    </Identification>
    <NewsManagement>
      <NewsItemType FormalName="News"/>
      <FirstCreated>20100930T110626+0200</FirstCreated>
      <ThisRevisionCreated>20100930T110626+0200</ThisRevisionCreated>
      <Status FormalName="Usable"/>
      <Urgency FormalName="4"/>
      <Property FormalName="sst.3rdPartyStyleGuideVersion" Value="2.0" />
    <Property FormalName="category" Value="N" />
    <Property FormalName="ern" Value="N.A." />
    <Property FormalName="distributor" Value="SSS" />
    </NewsManagement>
    <NewsComponent xml:lang="en" Essential="no" EquivalentsList="no" Duid="NC00001">
      <TopicSet FormalName="Companies">
        <Topic Duid="T000001">
          <TopicType FormalName="Company"/>
          <FormalName Scheme="CompanyLongName"><![CDATA[Test Client]]></FormalName>
          <FormalName Scheme="CompanyShortName"><![CDATA[]]></FormalName>
      <FormalName Scheme="Country"><![CDATA[U.S.A.]]></FormalName>
      <FormalName Scheme="City"><![CDATA[Paris]]></FormalName>
          <FormalName Scheme="TIDM"></FormalName>
      <FormalName Scheme="USTIC"></FormalName>
          <FormalName Scheme="ISIN"></FormalName>
      <FormalName Scheme="ISIC"></FormalName>
          <FormalName Scheme="cRIC"></FormalName>
      <FormalName Scheme="CompanyUrl"></FormalName>
      <FormalName Scheme="GermanWkn"></FormalName>
      <FormalName Scheme="Sedol"></FormalName>
        </Topic>    
      </TopicSet>  
      <Role FormalName="Main"/>
      <NewsLines>
        <HeadLine><![CDATA[TEST RELEASE]]></HeadLine>
        <DateLine>London, September, 30, 2010</DateLine>
      </NewsLines>
      <AdministrativeMetadata>
        <Creator>     
          <Party FormalName="Test Client"/>  
        </Creator>
        <Source>
          <Party FormalName="Test Client"/>
        </Source>
      </AdministrativeMetadata>

      <RightsMetadata/>

      <DescriptiveMetadata>
        <Language FormalName="en"/>
        <TopicOccurrence Topic="#T000001"/>

        <TopicOccurrence Topic="#T00003" HowPresent="FSACategories"/>
        <TopicOccurrence Topic="#T00004" HowPresent="MediumImportance"/>
    
              <SubjectCode Duid="SC#1" HowPresent="Related" >
        <Subject Duid="S#1" HowPresent="Related" FormalName="Economy, Business And Finance" />
              <SubjectMatter Duid="S#1_SM#1" HowPresent="Related" FormalName="Company Information" />
                  <SubjectDetail Duid="S#1_SM#1_SD#1" HowPresent="Related" FormalName="Contract" />
                  <SubjectDetail Duid="S#1_SM#1_SD#2" HowPresent="Related" FormalName="Earnings" />
                    </SubjectCode>
        
    <TopicOccurrence Topic="#ICB_IN" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SU" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SE" HowPresent="ICBClasification"/>
      </DescriptiveMetadata>
    
      <ContentItem Duid="CI00001">
        <MediaType FormalName="text"/>
        <Format FormalName="XHTML"/>
<DataContent>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce"><head><style>* { font-family: Arial, Verdana, Helvetica; font-size: 13px;}
td { padding: 3px; }
}</style><title>TEST RELEASE</title></head><body class="TEST">   <p align="center" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><i class="TEST"><u class="TEST">You can disregard this test release. </u></i></p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><a class="TEST" href="http://www.test.com/" target="_blank">Test</a> text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test.</p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text. <a class="TEST" href="http://www.1.com/" target="_blank">www.1.com</a> <a class="TEST" href="mailto:test@1.com" target="_blank">test@1.com</a> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Quelques caractères spéciaux €, à, é, è, ç, ï, ë, í, ñ, %, &amp;, §, #,;, «, »</p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">TABLEAU 1</p>                                                                     </body></html>
</DataContent>
      </ContentItem>
    </NewsComponent>
  </NewsItem>
</NewsML>

Open in new window

0
Comment
Question by:saloj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Accepted Solution

by:
WesWilson earned 167 total points
ID: 34219658
I suppose you have a couple of options.

1. You could place CDATA tags inside the <DataContent> tags to surround the HTML content.

2. If you have no control over the data you are receiving, but need to parse it, you could load it into a string and split the string on <DataContent> and </DataContent> as your delimiters. Combine string 0 and 2 to go into your XML column, and place string 1 in your nvarchar column. Just remember to add the <DataContent> tags back into the XML string.

There are other ways to parse it, but the Split function should do well if you know the exact tag name.
0
 
LVL 20

Assisted Solution

by:BuggyCoder
BuggyCoder earned 167 total points
ID: 34219661
You have to parse this XML Document using the powerful .net framework's XML APIs in System.XML Namespace.
1. first create a XMLDocument object out of this xml data.
2. then try to read DataContent ChildNode and its contents.
3. GEt the value of DataContent child node in some of your local variable.
4. After reading, remove the child using removechild function.
5. Now rest of the data in XMLDocument is your XML Data.
try doing its innerXML and save it in another field.

here's how XMLDocument works:-
http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=312
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
http://www.codeproject.com/KB/cpp/myXPath.aspx

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34219693
Here's a Linq to XML way:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.ToString();

Open in new window

0
Monthly Recap

May was a big month for new releases from Linux Academy! Take a look at what our team built recently in our blog. You can access the newest releases from our blog.

 
LVL 8

Expert Comment

by:WesWilson
ID: 34219699
My concern with BuggyCoder's solution is that you need valid XML to load into the XmlDocument object. If that works, his solution is good, but if you are not guaranteed to have the HTML section parseable as XML, then you would need CDATA tags or another solution.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 166 total points
ID: 34219703
Forgot to note:

In my example, textBox1 was the source XML string, textBox2 is the XML only, and textBox3 is the HTML. The HTML still has the <DataContent> wrapping it, so you would have to trim it off. This trims it also:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.FirstNode.ToString();

Open in new window

0
 
LVL 2

Author Comment

by:saloj
ID: 34230255
HI Guys,

Thank you for your response.
I have the above xml content files, which i need to store into sql server 2005 and also display the xml data for website.

I tried the following ways and still having difficulties to figure out what exactly should I do.

1.
I tried to separate xml data on (xml data type) and html content on (nvarchar(max)) on sql server 2005 to avoid illegal character from html content. But xml content also can have illegal characters so I could not store xml content on xml data type.

2.
Now I tried  to store the all xml content into nvarchar(max) data type and having difficulties to query the xml data.
is it possible to query the xml data from nvarchar column?

3.
what could be the best way to store the xml intact into database and query the xml data do display for websites.

any help would be very much appreciated.

Many Thanks


0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34230367
Well your original question asked how to separate in C#. Are you saying that you would rather separate it on the DB side? Given the above methods, you should be able to do an insert query passing the separated parts to the appropriate columns.
0
 
LVL 20

Expert Comment

by:BuggyCoder
ID: 34231484
you can create an XML Column to store the XML Data, to query the same you need to know a bit of XQuery language which is fully supported by SQL Server 2005. here is an article to query the XML Data strored in XML Column in database:
http://www.15seconds.com/issue/050803.htm


@saloj : i would request you to close your current question and post this one in a new thread.

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34231566
>>  XQuery language which is fully supported by SQL Server 2005

It's not *fully* supported. For example, "let" is not supported in FLOWR clauses.

http://msdn.microsoft.com/en-us/library/ms345122%28SQL.90%29.aspx#sql2k5_xqueryintro_topic3
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question