• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 275
  • Last Modified:

Produce XML file from flat file

Using a C# executable, how can I convert

<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />

into

<n>
      <sedol />
      <curr>SEK</curr>
      <localId>5</localId>
      <issuShortDesc />
      <exch>SK</exch>
      <longIssuerName>SXXX 920RI</longIssuerName>
      <symbol>.SXXX.920RI</symbol>
      <action>A</action>
      <sectyType>9</sectyType>
      <cusip />
      <issuLongDesc />
      <localCode>GG</localCode>
      <dfltInd>1</dfltInd>
      <mat>2013-01-30</mat>
      <isin>QQQ</isin>
      <sess>NORM</sess>
      <issuerName>SXXX 920RI</issuerName>
   </n>
0
AlHal2
Asked:
AlHal2
  • 5
  • 3
  • 2
1 Solution
 
Meir RivkinFull stack Software EngineerCommented:
c:\temp\1.txt contains the flat xml.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.


            var root = XElement.Load(@"c:\temp\1.txt");
var newXml = XElement.Parse("<n/>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
             newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 
AlHal2Author Commented:
The total file is nearly 3GB.  There is a section which contains bonds data.  I want to ignore the rest.  In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
0
 
Meir RivkinFull stack Software EngineerCommented:
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
            var data = File.ReadAllText(@"c:\temp\1.txt");
            var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
            var bond_data = tokens[1];
            var root = XElement.Parse("<bond> " + bond_data + "</bond>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
            newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
AlHal2Author Commented:
I get an out of memory exception.  Any way of ingesting the data bit by bit?
0
 
Robert SchuttSoftware EngineerCommented:
Give this a spin:
        // call with: ParseXmlBond(@"input.xml", @"output.xml");

        string[] arrAttr = {
                                "sedol",
                                "curr",
                                "localId",
                                "issuShortDesc",
                                "exch",
                                "longIssuerName",
                                "symbol",
                                "action",
                                "sectyType",
                                "cusip",
                                "issuLongDesc",
                                "localCode",
                                "dfltInd",
                                "mat",
                                "isin",
                                "sess",
                                "issuerName"
                           };

        // code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/

        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    bool blnParsing = false;

                    while (r.Read()) {
                        Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = true;
                                        break;
                                    case "bnd":
                                        if (blnParsing) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = false;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 
AlHal2Author Commented:
It's still running after an hour.  Any way to speed it up?  It seems to be stuck on equity and options data.
0
 
Robert SchuttSoftware EngineerCommented:
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.

Like this for example:
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    while (r.Read() && intParsingState < 2) {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
0
 
AlHal2Author Commented:
I think the order is equities  indices, options, bonds.  At the moment I want bonds.  Later on I will want indices too.
0
 
Robert SchuttSoftware EngineerCommented:
Ok, so these are all on the main level. This new code will skip anything on that level that's not already handled. I changed the structure because it seems the example code I found at several sites did work but the logging I added before seemed to indicate that the structure of the file was not being followed consistently.
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    // changed the loop structure to start at root (not really necessary) but also to be able to use "continue" to skip r.Read() after r.Skip()...
                    do {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // skip children of other elements on the same depth as the <bond> element
                                        if (r.Depth >= 1) {
                                            r.Skip();
                                            continue; // don't call r.Read() in this case...
                                        }
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2; // break do-while
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                        if (intParsingState < 2 && !r.Read())
                            break;
                    } while (intParsingState < 2);
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 
AlHal2Author Commented:
This is great.  Thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 5
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now