Solved

Produce XML file from flat file

Posted on 2013-01-31
10
248 Views
Last Modified: 2013-02-04
Using a C# executable, how can I convert

<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />

into

<n>
      <sedol />
      <curr>SEK</curr>
      <localId>5</localId>
      <issuShortDesc />
      <exch>SK</exch>
      <longIssuerName>SXXX 920RI</longIssuerName>
      <symbol>.SXXX.920RI</symbol>
      <action>A</action>
      <sectyType>9</sectyType>
      <cusip />
      <issuLongDesc />
      <localCode>GG</localCode>
      <dfltInd>1</dfltInd>
      <mat>2013-01-30</mat>
      <isin>QQQ</isin>
      <sess>NORM</sess>
      <issuerName>SXXX 920RI</issuerName>
   </n>
0
Comment
Question by:AlHal2
  • 5
  • 3
  • 2
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838920
c:\temp\1.txt contains the flat xml.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.


            var root = XElement.Load(@"c:\temp\1.txt");
var newXml = XElement.Parse("<n/>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
             newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838939
The total file is nearly 3GB.  There is a section which contains bonds data.  I want to ignore the rest.  In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838954
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
            var data = File.ReadAllText(@"c:\temp\1.txt");
            var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
            var bond_data = tokens[1];
            var root = XElement.Parse("<bond> " + bond_data + "</bond>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
            newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838999
I get an out of memory exception.  Any way of ingesting the data bit by bit?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850587
Give this a spin:
        // call with: ParseXmlBond(@"input.xml", @"output.xml");

        string[] arrAttr = {
                                "sedol",
                                "curr",
                                "localId",
                                "issuShortDesc",
                                "exch",
                                "longIssuerName",
                                "symbol",
                                "action",
                                "sectyType",
                                "cusip",
                                "issuLongDesc",
                                "localCode",
                                "dfltInd",
                                "mat",
                                "isin",
                                "sess",
                                "issuerName"
                           };

        // code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/

        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    bool blnParsing = false;

                    while (r.Read()) {
                        Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = true;
                                        break;
                                    case "bnd":
                                        if (blnParsing) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = false;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:AlHal2
ID: 38850749
It's still running after an hour.  Any way to speed it up?  It seems to be stuck on equity and options data.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850813
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.

Like this for example:
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    while (r.Read() && intParsingState < 2) {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
0
 

Author Comment

by:AlHal2
ID: 38851153
I think the order is equities  indices, options, bonds.  At the moment I want bonds.  Later on I will want indices too.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 200 total points
ID: 38851368
Ok, so these are all on the main level. This new code will skip anything on that level that's not already handled. I changed the structure because it seems the example code I found at several sites did work but the logging I added before seemed to indicate that the structure of the file was not being followed consistently.
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    // changed the loop structure to start at root (not really necessary) but also to be able to use "continue" to skip r.Read() after r.Skip()...
                    do {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // skip children of other elements on the same depth as the <bond> element
                                        if (r.Depth >= 1) {
                                            r.Skip();
                                            continue; // don't call r.Read() in this case...
                                        }
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2; // break do-while
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                        if (intParsingState < 2 && !r.Read())
                            break;
                    } while (intParsingState < 2);
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 38851468
This is great.  Thanks.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now