Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Convert xml to csv 3

Posted on 2015-01-27
9
Medium Priority
?
86 Views
Last Modified: 2015-01-28
Please see

http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28603967.html
http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28600021.html

The file is very large, so these methods are killing the memory even with a filter.  Any suggestions?
Also, I'd like to put some OrganizationIDs into a text file and have the program filter the output based on that text file.
0
Comment
Question by:AlHal2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 40572948
Hi AlHal2;

I made a couple of changes to the code snippet so that you can filter on more then one origanizationID at a time. In the below code snippet I used one orginazation ID per line and read them into memory into an array. If your file is formatted differently you will need to extract the ID's into an array or into a List. Also in the code snippet you will need to change the file path and name to meet your needs.

// The values to filter on
// The File OrigIds.txt in this case ontains one OriginazationId per line
// If you have a different format in the file you will need to extract the ID's
// so that you have one ID per element of the array or List<>
string[] origIDs = File.ReadAllLines(@"C:\Working Directory\OrigIds.txt");
string typeName = "AKA";
string effectiveTo = "2005-08-18T04:00:00";

XElement doc = XElement.Load(@"C:\Working Directory\OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
              let ns = String.Format("{{{0}}}",el.Name.NamespaceName)
              where  el.Name.LocalName == "Organization" && ((origIDs.Contains(el.Element(ns + "OrganizationId").Value)) || 
                    (el.Element(ns + "OrganizationName").Attribute("organizationNameTypeCode").Value == typeName) ||
                    (el.Element(ns + "OrganizationName").Attribute("effectiveTo").Value == effectiveTo))
              select String.Format("{0},{1},{2},{3}",
              (string)el.Element(ns + "OrganizationId"),
              (string)el.Element(ns + "AdminStatus").Attribute("effectiveFrom"),
              (string)el.Element(ns + "AdminStatus"),
              Environment.NewLine
              )
              )
              .Aggregate( new StringBuilder(),  (sb, s) => sb.Append(s), sb => sb.ToString()
              );

Open in new window

0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 40572954
Also can you please explain what you mean by this statement, "The file is very large, so these methods are killing the memory even with a filter"?
0
 

Author Comment

by:AlHal2
ID: 40573036
Thanks for this.
I mean the program goes through a 30mb file in seconds, but I leave an 8gb file for over an hour. The memory usage is enormous.
0
Create CentOS 7 Newton Packstack Running Keystone

A bug was filed against RDO for the installation of Keystone v3. This guide is designed to walk you through the configuration for using Keystone v3 with Packstack. You will accomplish this using various repos and the Answers file.

 

Author Comment

by:AlHal2
ID: 40573092
I think the program treats the file like one long string.
0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 40573138
The file is being loaded, all 8 GB, into memory in order for the query to operate on it. If there is not enough memory some of it will need to be off loaded into virtual memory and will cause longer run time do those parts that were off loaded need to be reloaded. If this file continues to grow the situation will only get worse.
0
 

Author Comment

by:AlHal2
ID: 40573189
Would you be able to suggest some SQL to ingest the file into an SQL Server database?
I'm open to any other suggestions.
0
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 40573780
Storing the data on a SQL database would help seeming that the database would work on is tables and only returns the needed information. The issue now is to get the data into tables into the database and I don't know of any program available to do this directly from your XML.
0
 
LVL 36

Accepted Solution

by:
Miguel Oz earned 2000 total points
ID: 40574451
I do not think SQL can help you because you are adding extra resources to finish your task.

To load XML file partially into memory but processing the file node by node, you could use the following MSDN suggestion

Basically you load only the organization nodes one by one using this method:
        static IEnumerable<XElement> SimpleStreamAxis(
                       string filename, string matchName)
        {
            using (XmlTextReader reader =  new XmlTextReader(filename))
            {
                reader.MoveToContent();
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            if (reader.LocalName == matchName)
                            {
                                XElement el = XElement.ReadFrom(reader)
                                                      as XElement;
                                if (el != null)
                                    yield return el;
                            }
                            break;
                    }
                }
                reader.Close();
            }
        }

Open in new window

Then in the query code replace the following
XElement doc = XElement.Load(@"f:\temp\C--OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
                          where el.Name.LocalName == "Organization"

Open in new window

with:
string csv = (from el in SimpleStreamAxis(@"f:\temp\C--OAOrganization-File.xml", "Organization")

Open in new window


The code above is replacing the doc instance and where condition in your original code.
0
 

Author Closing Comment

by:AlHal2
ID: 40574889
Thanks.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
Suggested Courses

660 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question