Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

xml and html data seperation from xml file

Posted on 2010-11-26
9
Medium Priority
?
734 Views
Last Modified: 2012-05-10
Hi experts,

I have following code and I need to seperate xml data for xml data type column and html data for nvarchar(max) column for sql server.
How can I seperate the data in c# like
1.
XML column
from:
<?xml version="1.0" encoding="utf-8"?>
to:(before <DataContent>)
<Format FormalName="XHTML"/>

2. nvarchar(max) column
from:
<DataContent>
sdfsf
</DataContent>

any help would be very much appreciated

Thanks alot
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE NewsML SYSTEM "http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.dtd">
<NewsML Version="1.2">
  <Catalog Href="http://www-test.com.catalog.xml" />
  <NewsEnvelope>
    <TransmissionId>1328088_</TransmissionId>
    <DateAndTime>20100930T110626+0200</DateAndTime>
    <NewsProduct FormalName="Regulatory Information Service"/>
  </NewsEnvelope>
  <NewsItem>
    <Identification>
      <NewsIdentifier>
        <ProviderId>test.com</ProviderId>
        <DateId>20100930</DateId>
        <NewsItemId>TEST1328088</NewsItemId>
        <RevisionId PreviousRevision="0" Update="N">1</RevisionId>
        <PublicIdentifier>urn:newsml:TEST.com:20100930:XX1328088:1</PublicIdentifier>
      </NewsIdentifier>
    </Identification>
    <NewsManagement>
      <NewsItemType FormalName="News"/>
      <FirstCreated>20100930T110626+0200</FirstCreated>
      <ThisRevisionCreated>20100930T110626+0200</ThisRevisionCreated>
      <Status FormalName="Usable"/>
      <Urgency FormalName="4"/>
      <Property FormalName="sst.3rdPartyStyleGuideVersion" Value="2.0" />
    <Property FormalName="category" Value="N" />
    <Property FormalName="ern" Value="N.A." />
    <Property FormalName="distributor" Value="SSS" />
    </NewsManagement>
    <NewsComponent xml:lang="en" Essential="no" EquivalentsList="no" Duid="NC00001">
      <TopicSet FormalName="Companies">
        <Topic Duid="T000001">
          <TopicType FormalName="Company"/>
          <FormalName Scheme="CompanyLongName"><![CDATA[Test Client]]></FormalName>
          <FormalName Scheme="CompanyShortName"><![CDATA[]]></FormalName>
      <FormalName Scheme="Country"><![CDATA[U.S.A.]]></FormalName>
      <FormalName Scheme="City"><![CDATA[Paris]]></FormalName>
          <FormalName Scheme="TIDM"></FormalName>
      <FormalName Scheme="USTIC"></FormalName>
          <FormalName Scheme="ISIN"></FormalName>
      <FormalName Scheme="ISIC"></FormalName>
          <FormalName Scheme="cRIC"></FormalName>
      <FormalName Scheme="CompanyUrl"></FormalName>
      <FormalName Scheme="GermanWkn"></FormalName>
      <FormalName Scheme="Sedol"></FormalName>
        </Topic>    
      </TopicSet>  
      <Role FormalName="Main"/>
      <NewsLines>
        <HeadLine><![CDATA[TEST RELEASE]]></HeadLine>
        <DateLine>London, September, 30, 2010</DateLine>
      </NewsLines>
      <AdministrativeMetadata>
        <Creator>     
          <Party FormalName="Test Client"/>  
        </Creator>
        <Source>
          <Party FormalName="Test Client"/>
        </Source>
      </AdministrativeMetadata>

      <RightsMetadata/>

      <DescriptiveMetadata>
        <Language FormalName="en"/>
        <TopicOccurrence Topic="#T000001"/>

        <TopicOccurrence Topic="#T00003" HowPresent="FSACategories"/>
        <TopicOccurrence Topic="#T00004" HowPresent="MediumImportance"/>
    
              <SubjectCode Duid="SC#1" HowPresent="Related" >
        <Subject Duid="S#1" HowPresent="Related" FormalName="Economy, Business And Finance" />
              <SubjectMatter Duid="S#1_SM#1" HowPresent="Related" FormalName="Company Information" />
                  <SubjectDetail Duid="S#1_SM#1_SD#1" HowPresent="Related" FormalName="Contract" />
                  <SubjectDetail Duid="S#1_SM#1_SD#2" HowPresent="Related" FormalName="Earnings" />
                    </SubjectCode>
        
    <TopicOccurrence Topic="#ICB_IN" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SU" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SE" HowPresent="ICBClasification"/>
      </DescriptiveMetadata>
    
      <ContentItem Duid="CI00001">
        <MediaType FormalName="text"/>
        <Format FormalName="XHTML"/>
<DataContent>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce"><head><style>* { font-family: Arial, Verdana, Helvetica; font-size: 13px;}
td { padding: 3px; }
}</style><title>TEST RELEASE</title></head><body class="TEST">   <p align="center" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><i class="TEST"><u class="TEST">You can disregard this test release. </u></i></p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><a class="TEST" href="http://www.test.com/" target="_blank">Test</a> text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test.</p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text. <a class="TEST" href="http://www.1.com/" target="_blank">www.1.com</a> <a class="TEST" href="mailto:test@1.com" target="_blank">test@1.com</a> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Quelques caractères spéciaux €, à, é, è, ç, ï, ë, í, ñ, %, &amp;, §, #,;, «, »</p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">TABLEAU 1</p>                                                                     </body></html>
</DataContent>
      </ContentItem>
    </NewsComponent>
  </NewsItem>
</NewsML>

Open in new window

0
Comment
Question by:saloj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Accepted Solution

by:
WesWilson earned 668 total points
ID: 34219658
I suppose you have a couple of options.

1. You could place CDATA tags inside the <DataContent> tags to surround the HTML content.

2. If you have no control over the data you are receiving, but need to parse it, you could load it into a string and split the string on <DataContent> and </DataContent> as your delimiters. Combine string 0 and 2 to go into your XML column, and place string 1 in your nvarchar column. Just remember to add the <DataContent> tags back into the XML string.

There are other ways to parse it, but the Split function should do well if you know the exact tag name.
0
 
LVL 20

Assisted Solution

by:BuggyCoder
BuggyCoder earned 668 total points
ID: 34219661
You have to parse this XML Document using the powerful .net framework's XML APIs in System.XML Namespace.
1. first create a XMLDocument object out of this xml data.
2. then try to read DataContent ChildNode and its contents.
3. GEt the value of DataContent child node in some of your local variable.
4. After reading, remove the child using removechild function.
5. Now rest of the data in XMLDocument is your XML Data.
try doing its innerXML and save it in another field.

here's how XMLDocument works:-
http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=312
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
http://www.codeproject.com/KB/cpp/myXPath.aspx

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34219693
Here's a Linq to XML way:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.ToString();

Open in new window

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Expert Comment

by:WesWilson
ID: 34219699
My concern with BuggyCoder's solution is that you need valid XML to load into the XmlDocument object. If that works, his solution is good, but if you are not guaranteed to have the HTML section parseable as XML, then you would need CDATA tags or another solution.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 664 total points
ID: 34219703
Forgot to note:

In my example, textBox1 was the source XML string, textBox2 is the XML only, and textBox3 is the HTML. The HTML still has the <DataContent> wrapping it, so you would have to trim it off. This trims it also:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.FirstNode.ToString();

Open in new window

0
 
LVL 2

Author Comment

by:saloj
ID: 34230255
HI Guys,

Thank you for your response.
I have the above xml content files, which i need to store into sql server 2005 and also display the xml data for website.

I tried the following ways and still having difficulties to figure out what exactly should I do.

1.
I tried to separate xml data on (xml data type) and html content on (nvarchar(max)) on sql server 2005 to avoid illegal character from html content. But xml content also can have illegal characters so I could not store xml content on xml data type.

2.
Now I tried  to store the all xml content into nvarchar(max) data type and having difficulties to query the xml data.
is it possible to query the xml data from nvarchar column?

3.
what could be the best way to store the xml intact into database and query the xml data do display for websites.

any help would be very much appreciated.

Many Thanks


0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34230367
Well your original question asked how to separate in C#. Are you saying that you would rather separate it on the DB side? Given the above methods, you should be able to do an insert query passing the separated parts to the appropriate columns.
0
 
LVL 20

Expert Comment

by:BuggyCoder
ID: 34231484
you can create an XML Column to store the XML Data, to query the same you need to know a bit of XQuery language which is fully supported by SQL Server 2005. here is an article to query the XML Data strored in XML Column in database:
http://www.15seconds.com/issue/050803.htm


@saloj : i would request you to close your current question and post this one in a new thread.

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34231566
>>  XQuery language which is fully supported by SQL Server 2005

It's not *fully* supported. For example, "let" is not supported in FLOWR clauses.

http://msdn.microsoft.com/en-us/library/ms345122%28SQL.90%29.aspx#sql2k5_xqueryintro_topic3
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question