Solved

Produce XML file from flat file

Posted on 2013-01-31
10
257 Views
Last Modified: 2013-02-04
Using a C# executable, how can I convert

<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />

into

<n>
      <sedol />
      <curr>SEK</curr>
      <localId>5</localId>
      <issuShortDesc />
      <exch>SK</exch>
      <longIssuerName>SXXX 920RI</longIssuerName>
      <symbol>.SXXX.920RI</symbol>
      <action>A</action>
      <sectyType>9</sectyType>
      <cusip />
      <issuLongDesc />
      <localCode>GG</localCode>
      <dfltInd>1</dfltInd>
      <mat>2013-01-30</mat>
      <isin>QQQ</isin>
      <sess>NORM</sess>
      <issuerName>SXXX 920RI</issuerName>
   </n>
0
Comment
Question by:AlHal2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
  • 2
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838920
c:\temp\1.txt contains the flat xml.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.


            var root = XElement.Load(@"c:\temp\1.txt");
var newXml = XElement.Parse("<n/>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
             newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838939
The total file is nearly 3GB.  There is a section which contains bonds data.  I want to ignore the rest.  In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 38838954
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
            var data = File.ReadAllText(@"c:\temp\1.txt");
            var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
            var bond_data = tokens[1];
            var root = XElement.Parse("<bond> " + bond_data + "</bond>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
            newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 

Author Comment

by:AlHal2
ID: 38838999
I get an out of memory exception.  Any way of ingesting the data bit by bit?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850587
Give this a spin:
        // call with: ParseXmlBond(@"input.xml", @"output.xml");

        string[] arrAttr = {
                                "sedol",
                                "curr",
                                "localId",
                                "issuShortDesc",
                                "exch",
                                "longIssuerName",
                                "symbol",
                                "action",
                                "sectyType",
                                "cusip",
                                "issuLongDesc",
                                "localCode",
                                "dfltInd",
                                "mat",
                                "isin",
                                "sess",
                                "issuerName"
                           };

        // code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/

        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    bool blnParsing = false;

                    while (r.Read()) {
                        Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = true;
                                        break;
                                    case "bnd":
                                        if (blnParsing) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = false;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38850749
It's still running after an hour.  Any way to speed it up?  It seems to be stuck on equity and options data.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850813
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.

Like this for example:
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    while (r.Read() && intParsingState < 2) {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
0
 

Author Comment

by:AlHal2
ID: 38851153
I think the order is equities  indices, options, bonds.  At the moment I want bonds.  Later on I will want indices too.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 200 total points
ID: 38851368
Ok, so these are all on the main level. This new code will skip anything on that level that's not already handled. I changed the structure because it seems the example code I found at several sites did work but the logging I added before seemed to indicate that the structure of the file was not being followed consistently.
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    // changed the loop structure to start at root (not really necessary) but also to be able to use "continue" to skip r.Read() after r.Skip()...
                    do {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // skip children of other elements on the same depth as the <bond> element
                                        if (r.Depth >= 1) {
                                            r.Skip();
                                            continue; // don't call r.Read() in this case...
                                        }
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2; // break do-while
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                        if (intParsingState < 2 && !r.Read())
                            break;
                    } while (intParsingState < 2);
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 38851468
This is great.  Thanks.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question