Solved

Produce XML file from flat file

Posted on 2013-01-31
10
250 Views
Last Modified: 2013-02-04
Using a C# executable, how can I convert

<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />

into

<n>
      <sedol />
      <curr>SEK</curr>
      <localId>5</localId>
      <issuShortDesc />
      <exch>SK</exch>
      <longIssuerName>SXXX 920RI</longIssuerName>
      <symbol>.SXXX.920RI</symbol>
      <action>A</action>
      <sectyType>9</sectyType>
      <cusip />
      <issuLongDesc />
      <localCode>GG</localCode>
      <dfltInd>1</dfltInd>
      <mat>2013-01-30</mat>
      <isin>QQQ</isin>
      <sess>NORM</sess>
      <issuerName>SXXX 920RI</issuerName>
   </n>
0
Comment
Question by:AlHal2
  • 5
  • 3
  • 2
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838920
c:\temp\1.txt contains the flat xml.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.


            var root = XElement.Load(@"c:\temp\1.txt");
var newXml = XElement.Parse("<n/>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
             newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838939
The total file is nearly 3GB.  There is a section which contains bonds data.  I want to ignore the rest.  In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838954
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
            var data = File.ReadAllText(@"c:\temp\1.txt");
            var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
            var bond_data = tokens[1];
            var root = XElement.Parse("<bond> " + bond_data + "</bond>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
            newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838999
I get an out of memory exception.  Any way of ingesting the data bit by bit?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850587
Give this a spin:
        // call with: ParseXmlBond(@"input.xml", @"output.xml");

        string[] arrAttr = {
                                "sedol",
                                "curr",
                                "localId",
                                "issuShortDesc",
                                "exch",
                                "longIssuerName",
                                "symbol",
                                "action",
                                "sectyType",
                                "cusip",
                                "issuLongDesc",
                                "localCode",
                                "dfltInd",
                                "mat",
                                "isin",
                                "sess",
                                "issuerName"
                           };

        // code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/

        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    bool blnParsing = false;

                    while (r.Read()) {
                        Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = true;
                                        break;
                                    case "bnd":
                                        if (blnParsing) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = false;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 

Author Comment

by:AlHal2
ID: 38850749
It's still running after an hour.  Any way to speed it up?  It seems to be stuck on equity and options data.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850813
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.

Like this for example:
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    while (r.Read() && intParsingState < 2) {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
0
 

Author Comment

by:AlHal2
ID: 38851153
I think the order is equities  indices, options, bonds.  At the moment I want bonds.  Later on I will want indices too.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 200 total points
ID: 38851368
Ok, so these are all on the main level. This new code will skip anything on that level that's not already handled. I changed the structure because it seems the example code I found at several sites did work but the logging I added before seemed to indicate that the structure of the file was not being followed consistently.
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    // changed the loop structure to start at root (not really necessary) but also to be able to use "continue" to skip r.Read() after r.Skip()...
                    do {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // skip children of other elements on the same depth as the <bond> element
                                        if (r.Depth >= 1) {
                                            r.Skip();
                                            continue; // don't call r.Read() in this case...
                                        }
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2; // break do-while
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                        if (intParsingState < 2 && !r.Read())
                            break;
                    } while (intParsingState < 2);
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 38851468
This is great.  Thanks.
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
In C#, how would you convert a byte array to its integer value equivalent? 12 38
INSERT DATE FROM STRING COLUMN 18 49
Need help debbuging stored procedure 21 33
Runtime Error 2 28
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now