AlHal2
asked on
Produce XML file from flat file
Using a C# executable, how can I convert
<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />
into
<n>
<sedol />
<curr>SEK</curr>
<localId>5</localId>
<issuShortDesc />
<exch>SK</exch>
<longIssuerName>SXXX 920RI</longIssuerName>
<symbol>.SXXX.920RI</symbo l>
<action>A</action>
<sectyType>9</sectyType>
<cusip />
<issuLongDesc />
<localCode>GG</localCode>
<dfltInd>1</dfltInd>
<mat>2013-01-30</mat>
<isin>QQQ</isin>
<sess>NORM</sess>
<issuerName>SXXX 920RI</issuerName>
</n>
<bnd action="A" sectyType="9" symbol=".SXXX.920RI" exch="SK" curr="SEK" sess="NORM" dfltInd="1" issuerName="SXXX 920RI" issuShortDesc="" sedol="" isin="QQQ" cusip="" mat="2013-01-30" longIssuerName="SXXX 920RI" issuLongDesc="" localCode="GG" localId="5" ric="HH" ricOriginal="II" />
into
<n>
<sedol />
<curr>SEK</curr>
<localId>5</localId>
<issuShortDesc />
<exch>SK</exch>
<longIssuerName>SXXX 920RI</longIssuerName>
<symbol>.SXXX.920RI</symbo
<action>A</action>
<sectyType>9</sectyType>
<cusip />
<issuLongDesc />
<localCode>GG</localCode>
<dfltInd>1</dfltInd>
<mat>2013-01-30</mat>
<isin>QQQ</isin>
<sess>NORM</sess>
<issuerName>SXXX 920RI</issuerName>
</n>
ASKER
The total file is nearly 3GB. There is a section which contains bonds data. I want to ignore the rest. In other words I only want data between <bond> and </bond>.
I then need to print the results to a separate file.
Does this require anything extra?
I then need to print the results to a separate file.
Does this require anything extra?
if there's only one section in the file contains <bond> ... </bond> then you can use the following:
var newXml = XElement.Parse("<bond/>");
var data = File.ReadAllText(@"c:\temp\1.txt");
var tokens = data.Split(new string[] { "<bond", "/>" }, StringSplitOptions.RemoveEmptyEntries);
var bond_data = tokens[1];
var root = XElement.Parse("<bond> " + bond_data + "</bond>");
var elements = root.Attributes().Select(n => new XElement(n.Name, n.Value));
newXml.Add(elements);
newXml.Save(@"c:\temp\1.xml");
ASKER
I get an out of memory exception. Any way of ingesting the data bit by bit?
Give this a spin:
// call with: ParseXmlBond(@"input.xml", @"output.xml");
string[] arrAttr = {
"sedol",
"curr",
"localId",
"issuShortDesc",
"exch",
"longIssuerName",
"symbol",
"action",
"sectyType",
"cusip",
"issuLongDesc",
"localCode",
"dfltInd",
"mat",
"isin",
"sess",
"issuerName"
};
// code partly taken from http://programmersnotebook.wordpress.com/2010/02/27/c-xmltextreader-tutorial/
private void ParseXmlBond(string fn, string fnout) {
using (Stream stream = File.OpenRead(fn)) {
XmlTextReader r = new XmlTextReader(stream);
r.WhitespaceHandling = WhitespaceHandling.None;
r.MoveToContent();
using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
w.Formatting = Formatting.Indented;
w.Indentation = 1;
w.IndentChar = '\t';
w.WriteStartDocument();
w.WriteStartElement("output");
bool blnParsing = false;
while (r.Read()) {
Console.WriteLine("{0}: {1} (parsing = {2})\n", r.NodeType, r.LocalName, blnParsing);
switch (r.NodeType) {
case XmlNodeType.Element:
switch (r.LocalName) {
case "bond":
blnParsing = true;
break;
case "bnd":
if (blnParsing) {
w.WriteStartElement("n");
foreach (string a in arrAttr) {
w.WriteElementString(a, r.GetAttribute(a).ToString());
}
w.WriteEndElement();
}
break;
default:
// ignore other nodes
break;
}
break;
case XmlNodeType.EndElement:
switch (r.LocalName) {
case "bond":
blnParsing = false;
break;
default:
// ignore other nodes
break;
}
break;
default:
break;
}
}
w.WriteEndElement();
w.WriteEndDocument();
w.Flush();
w.Close();
}
r.Close();
}
}
ASKER
It's still running after an hour. Any way to speed it up? It seems to be stuck on equity and options data.
Depending on the general structure of your input document it may be a lot quicker if you exit the read loop after hitting the </bond> tag.
Like this for example:
Like this for example:
private void ParseXmlBond(string fn, string fnout) {
using (Stream stream = File.OpenRead(fn)) {
XmlTextReader r = new XmlTextReader(stream);
r.WhitespaceHandling = WhitespaceHandling.None;
r.MoveToContent();
using (XmlTextWriter w = new XmlTextWriter(fnout, Encoding.UTF8)) {
w.Formatting = Formatting.Indented;
w.Indentation = 1;
w.IndentChar = '\t';
w.WriteStartDocument();
w.WriteStartElement("output");
int intParsingState = 0;
while (r.Read() && intParsingState < 2) {
switch (r.NodeType) {
case XmlNodeType.Element:
switch (r.LocalName) {
case "bond":
intParsingState = 1;
break;
case "bnd":
if (intParsingState > 0) {
w.WriteStartElement("n");
foreach (string a in arrAttr) {
w.WriteElementString(a, r.GetAttribute(a).ToString());
}
w.WriteEndElement();
}
break;
default:
// ignore other nodes
break;
}
break;
case XmlNodeType.EndElement:
switch (r.LocalName) {
case "bond":
intParsingState = 2;
break;
default:
// ignore other nodes
break;
}
break;
default:
break;
}
}
w.WriteEndElement();
w.WriteEndDocument();
w.Flush();
w.Close();
}
r.Close();
}
}
I'd have to go back and check but if this doesn't help, maybe it's possible to skip an entire level. In that case I'd at least need some info on the position of the <bond> tag in your input document.
ASKER
I think the order is equities indices, options, bonds. At the moment I want bonds. Later on I will want indices too.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This is great. Thanks.
basically the code treat it as xml with a single element and multiple attributes.
so for each attribute it create new xelement and add it to the new xml - newXml.
Open in new window