Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Produce XML file from flat file

Posted on 2013-01-31
10
Medium Priority
?
263 Views
Last Modified: 2013-02-04
Using a C# executable, how can I convert

<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />

into

<n>
      <sedol />
      <curr>SEK</curr>
      <localId>5</localId>
      <issuShortDesc />
      <exch>SK</exch>
      <longIssuerName>SXXX 920RI</longIssuerName>
      <symbol>.SXXX.920RI</symbol>
      <action>A</action>
      <sectyType>9</sectyType>
      <cusip />
      <issuLongDesc />
      <localCode>GG</localCode>
      <dfltInd>1</dfltInd>
      <mat>2013-01-30</mat>
      <isin>QQQ</isin>
      <sess>NORM</sess>
      <issuerName>SXXX 920RI</issuerName>
   </n>
0
Comment
Question by:AlHal2
  • 5
  • 3
  • 2
10 Comments
 
LVL 42

Expert Comment

by:Meir Rivkin
ID: 38838920
c:\temp\1.txt contains the flat xml.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.


            var root = XElement.Load(@"c:\temp\1.txt");
var newXml = XElement.Parse("<n/>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
             newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38838939
The total file is nearly 3GB.  There is a section which contains bonds data.  I want to ignore the rest.  In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
0
 
LVL 42

Expert Comment

by:Meir Rivkin
ID: 38838954
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
            var data = File.ReadAllText(@"c:\temp\1.txt");
            var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
            var bond_data = tokens[1];
            var root = XElement.Parse("<bond> " + bond_data + "</bond>");
            var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
            newXml.Add(elements);
            newXml.Save(@"c:\temp\1.xml");

Open in new window

0
Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

 

Author Comment

by:AlHal2
ID: 38838999
I get an out of memory exception.  Any way of ingesting the data bit by bit?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850587
Give this a spin:
        // call with: ParseXmlBond(@"input.xml", @"output.xml");

        string[] arrAttr = {
                                "sedol",
                                "curr",
                                "localId",
                                "issuShortDesc",
                                "exch",
                                "longIssuerName",
                                "symbol",
                                "action",
                                "sectyType",
                                "cusip",
                                "issuLongDesc",
                                "localCode",
                                "dfltInd",
                                "mat",
                                "isin",
                                "sess",
                                "issuerName"
                           };

        // code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/

        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    bool blnParsing = false;

                    while (r.Read()) {
                        Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = true;
                                        break;
                                    case "bnd":
                                        if (blnParsing) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        blnParsing = false;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Comment

by:AlHal2
ID: 38850749
It's still running after an hour.  Any way to speed it up?  It seems to be stuck on equity and options data.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38850813
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.

Like this for example:
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    while (r.Read() && intParsingState < 2) {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2;
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                    }
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
0
 

Author Comment

by:AlHal2
ID: 38851153
I think the order is equities  indices, options, bonds.  At the moment I want bonds.  Later on I will want indices too.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 800 total points
ID: 38851368
Ok, so these are all on the main level. This new code will skip anything on that level that's not already handled. I changed the structure because it seems the example code I found at several sites did work but the logging I added before seemed to indicate that the structure of the file was not being followed consistently.
        private void ParseXmlBond(string fn, string fnout) {

            using (Stream stream = File.OpenRead(fn)) {
                XmlTextReader r = new XmlTextReader(stream);
                r.WhitespaceHandling = WhitespaceHandling.None;
                r.MoveToContent();

                using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
                    w.Formatting = Formatting.Indented;
                    w.Indentation = 1;
                    w.IndentChar = '\t';
                    w.WriteStartDocument();
                    w.WriteStartElement("output");

                    int intParsingState = 0;

                    // changed the loop structure to start at root (not really necessary) but also to be able to use "continue" to skip r.Read() after r.Skip()...
                    do {
                        switch (r.NodeType) {
                            case XmlNodeType.Element:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 1;
                                        break;
                                    case "bnd":
                                        if (intParsingState > 0) {
                                            w.WriteStartElement("n");
                                            foreach (string a in arrAttr) {
                                                w.WriteElementString(a, r.GetAttribute(a).ToString());
                                            }
                                            w.WriteEndElement();
                                        }
                                        break;
                                    default:
                                        // skip children of other elements on the same depth as the <bond> element
                                        if (r.Depth >= 1) {
                                            r.Skip();
                                            continue; // don't call r.Read() in this case...
                                        }
                                        break;
                                }
                                break;
                            case XmlNodeType.EndElement:
                                switch (r.LocalName) {
                                    case "bond":
                                        intParsingState = 2; // break do-while
                                        break;
                                    default:
                                        // ignore other nodes
                                        break;
                                }
                                break;
                            default:
                                break;
                        }
                        if (intParsingState < 2 && !r.Read())
                            break;
                    } while (intParsingState < 2);
                    w.WriteEndElement();
                    w.WriteEndDocument();
                    w.Flush();
                    w.Close();
                }
                r.Close();
            }
        }

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 38851468
This is great.  Thanks.
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Data architecture is an important aspect in Software as a Service (SaaS) delivery model. This article is a study on the database of a single-tenant application that could be extended to support multiple tenants. The application is web-based develope…
High user turnover can cause old/redundant user data to consume valuable space. UserResourceCleanup was developed to address this by automatically deleting user folders when the user account is deleted.
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question