Solved

xml and html data seperation from xml file

Posted on 2010-11-26
9
704 Views
Last Modified: 2012-05-10
Hi experts,

I have following code and I need to seperate xml data for xml data type column and html data for nvarchar(max) column for sql server.
How can I seperate the data in c# like
1.
XML column
from:
<?xml version="1.0" encoding="utf-8"?>
to:(before <DataContent>)
<Format FormalName="XHTML"/>

2. nvarchar(max) column
from:
<DataContent>
sdfsf
</DataContent>

any help would be very much appreciated

Thanks alot
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE NewsML SYSTEM "http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.dtd">
<NewsML Version="1.2">
  <Catalog Href="http://www-test.com.catalog.xml" />
  <NewsEnvelope>
    <TransmissionId>1328088_</TransmissionId>
    <DateAndTime>20100930T110626+0200</DateAndTime>
    <NewsProduct FormalName="Regulatory Information Service"/>
  </NewsEnvelope>
  <NewsItem>
    <Identification>
      <NewsIdentifier>
        <ProviderId>test.com</ProviderId>
        <DateId>20100930</DateId>
        <NewsItemId>TEST1328088</NewsItemId>
        <RevisionId PreviousRevision="0" Update="N">1</RevisionId>
        <PublicIdentifier>urn:newsml:TEST.com:20100930:XX1328088:1</PublicIdentifier>
      </NewsIdentifier>
    </Identification>
    <NewsManagement>
      <NewsItemType FormalName="News"/>
      <FirstCreated>20100930T110626+0200</FirstCreated>
      <ThisRevisionCreated>20100930T110626+0200</ThisRevisionCreated>
      <Status FormalName="Usable"/>
      <Urgency FormalName="4"/>
      <Property FormalName="sst.3rdPartyStyleGuideVersion" Value="2.0" />
    <Property FormalName="category" Value="N" />
    <Property FormalName="ern" Value="N.A." />
    <Property FormalName="distributor" Value="SSS" />
    </NewsManagement>
    <NewsComponent xml:lang="en" Essential="no" EquivalentsList="no" Duid="NC00001">
      <TopicSet FormalName="Companies">
        <Topic Duid="T000001">
          <TopicType FormalName="Company"/>
          <FormalName Scheme="CompanyLongName"><![CDATA[Test Client]]></FormalName>
          <FormalName Scheme="CompanyShortName"><![CDATA[]]></FormalName>
      <FormalName Scheme="Country"><![CDATA[U.S.A.]]></FormalName>
      <FormalName Scheme="City"><![CDATA[Paris]]></FormalName>
          <FormalName Scheme="TIDM"></FormalName>
      <FormalName Scheme="USTIC"></FormalName>
          <FormalName Scheme="ISIN"></FormalName>
      <FormalName Scheme="ISIC"></FormalName>
          <FormalName Scheme="cRIC"></FormalName>
      <FormalName Scheme="CompanyUrl"></FormalName>
      <FormalName Scheme="GermanWkn"></FormalName>
      <FormalName Scheme="Sedol"></FormalName>
        </Topic>    
      </TopicSet>  
      <Role FormalName="Main"/>
      <NewsLines>
        <HeadLine><![CDATA[TEST RELEASE]]></HeadLine>
        <DateLine>London, September, 30, 2010</DateLine>
      </NewsLines>
      <AdministrativeMetadata>
        <Creator>     
          <Party FormalName="Test Client"/>  
        </Creator>
        <Source>
          <Party FormalName="Test Client"/>
        </Source>
      </AdministrativeMetadata>

      <RightsMetadata/>

      <DescriptiveMetadata>
        <Language FormalName="en"/>
        <TopicOccurrence Topic="#T000001"/>

        <TopicOccurrence Topic="#T00003" HowPresent="FSACategories"/>
        <TopicOccurrence Topic="#T00004" HowPresent="MediumImportance"/>
    
              <SubjectCode Duid="SC#1" HowPresent="Related" >
        <Subject Duid="S#1" HowPresent="Related" FormalName="Economy, Business And Finance" />
              <SubjectMatter Duid="S#1_SM#1" HowPresent="Related" FormalName="Company Information" />
                  <SubjectDetail Duid="S#1_SM#1_SD#1" HowPresent="Related" FormalName="Contract" />
                  <SubjectDetail Duid="S#1_SM#1_SD#2" HowPresent="Related" FormalName="Earnings" />
                    </SubjectCode>
        
    <TopicOccurrence Topic="#ICB_IN" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SU" HowPresent="ICBClasification"/>
    <TopicOccurrence Topic="#ICB_SE" HowPresent="ICBClasification"/>
      </DescriptiveMetadata>
    
      <ContentItem Duid="CI00001">
        <MediaType FormalName="text"/>
        <Format FormalName="XHTML"/>
<DataContent>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:mce="mce"><head><style>* { font-family: Arial, Verdana, Helvetica; font-size: 13px;}
td { padding: 3px; }
}</style><title>TEST RELEASE</title></head><body class="TEST">   <p align="center" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><i class="TEST"><u class="TEST">You can disregard this test release. </u></i></p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 0pt; margin-left: 36pt"> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 36pt; text-indent: -18pt">· TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT TEST TEXT </p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm"><a class="TEST" href="http://www.test.com/" target="_blank">Test</a> text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test.</p> <p align="justify" class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text test text. <a class="TEST" href="http://www.1.com/" target="_blank">www.1.com</a> <a class="TEST" href="mailto:test@1.com" target="_blank">test@1.com</a> </p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">Quelques caractères spéciaux €, à, é, è, ç, ï, ë, í, ñ, %, &amp;, §, #,;, «, »</p> <p class="TEST" style="margin-top: 0cm; margin-right: 0cm; margin-bottom: 10pt; margin-left: 0cm">TABLEAU 1</p>                                                                     </body></html>
</DataContent>
      </ContentItem>
    </NewsComponent>
  </NewsItem>
</NewsML>

Open in new window

0
Comment
Question by:saloj
  • 4
  • 2
  • 2
  • +1
9 Comments
 
LVL 8

Accepted Solution

by:
WesWilson earned 167 total points
ID: 34219658
I suppose you have a couple of options.

1. You could place CDATA tags inside the <DataContent> tags to surround the HTML content.

2. If you have no control over the data you are receiving, but need to parse it, you could load it into a string and split the string on <DataContent> and </DataContent> as your delimiters. Combine string 0 and 2 to go into your XML column, and place string 1 in your nvarchar column. Just remember to add the <DataContent> tags back into the XML string.

There are other ways to parse it, but the Split function should do well if you know the exact tag name.
0
 
LVL 20

Assisted Solution

by:BuggyCoder
BuggyCoder earned 167 total points
ID: 34219661
You have to parse this XML Document using the powerful .net framework's XML APIs in System.XML Namespace.
1. first create a XMLDocument object out of this xml data.
2. then try to read DataContent ChildNode and its contents.
3. GEt the value of DataContent child node in some of your local variable.
4. After reading, remove the child using removechild function.
5. Now rest of the data in XMLDocument is your XML Data.
try doing its innerXML and save it in another field.

here's how XMLDocument works:-
http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=312
http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
http://www.codeproject.com/KB/cpp/myXPath.aspx

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34219693
Here's a Linq to XML way:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.ToString();

Open in new window

0
MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

 
LVL 8

Expert Comment

by:WesWilson
ID: 34219699
My concern with BuggyCoder's solution is that you need valid XML to load into the XmlDocument object. If that works, his solution is good, but if you are not guaranteed to have the HTML section parseable as XML, then you would need CDATA tags or another solution.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 166 total points
ID: 34219703
Forgot to note:

In my example, textBox1 was the source XML string, textBox2 is the XML only, and textBox3 is the HTML. The HTML still has the <DataContent> wrapping it, so you would have to trim it off. This trims it also:
using System.Xml.Linq;

...

XElement doc = XElement.Parse(this.textBox1.Text);

var html = (from node in doc.Descendants("DataContent")
            select node).First();

html.Remove();
this.textBox2.Text = doc.ToString();
this.textBox3.Text = html.FirstNode.ToString();

Open in new window

0
 
LVL 2

Author Comment

by:saloj
ID: 34230255
HI Guys,

Thank you for your response.
I have the above xml content files, which i need to store into sql server 2005 and also display the xml data for website.

I tried the following ways and still having difficulties to figure out what exactly should I do.

1.
I tried to separate xml data on (xml data type) and html content on (nvarchar(max)) on sql server 2005 to avoid illegal character from html content. But xml content also can have illegal characters so I could not store xml content on xml data type.

2.
Now I tried  to store the all xml content into nvarchar(max) data type and having difficulties to query the xml data.
is it possible to query the xml data from nvarchar column?

3.
what could be the best way to store the xml intact into database and query the xml data do display for websites.

any help would be very much appreciated.

Many Thanks


0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34230367
Well your original question asked how to separate in C#. Are you saying that you would rather separate it on the DB side? Given the above methods, you should be able to do an insert query passing the separated parts to the appropriate columns.
0
 
LVL 20

Expert Comment

by:BuggyCoder
ID: 34231484
you can create an XML Column to store the XML Data, to query the same you need to know a bit of XQuery language which is fully supported by SQL Server 2005. here is an article to query the XML Data strored in XML Column in database:
http://www.15seconds.com/issue/050803.htm


@saloj : i would request you to close your current question and post this one in a new thread.

hope it helps
:-)
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 34231566
>>  XQuery language which is fully supported by SQL Server 2005

It's not *fully* supported. For example, "let" is not supported in FLOWR clauses.

http://msdn.microsoft.com/en-us/library/ms345122%28SQL.90%29.aspx#sql2k5_xqueryintro_topic3
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes a simple method to resize a control at runtime.  It includes ready-to-use source code and a complete sample demonstration application.  We'll also talk about C# Extension Methods. Introduction In one of my applications…
This article introduced a TextBox that supports transparent background.   Introduction TextBox is the most widely used control component in GUI design. Most GUI controls do not support transparent background and more or less do not have the…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question