Solved

Convert xml to csv 3

Posted on 2015-01-27
9
82 Views
Last Modified: 2015-01-28
Please see

http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28603967.html
http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28600021.html

The file is very large, so these methods are killing the memory even with a filter.  Any suggestions?
Also, I'd like to put some OrganizationIDs into a text file and have the program filter the output based on that text file.
0
Comment
Question by:AlHal2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40572948
Hi AlHal2;

I made a couple of changes to the code snippet so that you can filter on more then one origanizationID at a time. In the below code snippet I used one orginazation ID per line and read them into memory into an array. If your file is formatted differently you will need to extract the ID's into an array or into a List. Also in the code snippet you will need to change the file path and name to meet your needs.

// The values to filter on
// The File OrigIds.txt in this case ontains one OriginazationId per line
// If you have a different format in the file you will need to extract the ID's
// so that you have one ID per element of the array or List<>
string[] origIDs = File.ReadAllLines(@"C:\Working Directory\OrigIds.txt");
string typeName = "AKA";
string effectiveTo = "2005-08-18T04:00:00";

XElement doc = XElement.Load(@"C:\Working Directory\OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
              let ns = String.Format("{{{0}}}",el.Name.NamespaceName)
              where  el.Name.LocalName == "Organization" && ((origIDs.Contains(el.Element(ns + "OrganizationId").Value)) || 
                    (el.Element(ns + "OrganizationName").Attribute("organizationNameTypeCode").Value == typeName) ||
                    (el.Element(ns + "OrganizationName").Attribute("effectiveTo").Value == effectiveTo))
              select String.Format("{0},{1},{2},{3}",
              (string)el.Element(ns + "OrganizationId"),
              (string)el.Element(ns + "AdminStatus").Attribute("effectiveFrom"),
              (string)el.Element(ns + "AdminStatus"),
              Environment.NewLine
              )
              )
              .Aggregate( new StringBuilder(),  (sb, s) => sb.Append(s), sb => sb.ToString()
              );

Open in new window

0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40572954
Also can you please explain what you mean by this statement, "The file is very large, so these methods are killing the memory even with a filter"?
0
 

Author Comment

by:AlHal2
ID: 40573036
Thanks for this.
I mean the program goes through a 30mb file in seconds, but I leave an 8gb file for over an hour. The memory usage is enormous.
0
Salesforce Made Easy to Use

On-screen guidance at the moment of need enables you & your employees to focus on the core, you can now boost your adoption rates swiftly and simply with one easy tool.

 

Author Comment

by:AlHal2
ID: 40573092
I think the program treats the file like one long string.
0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40573138
The file is being loaded, all 8 GB, into memory in order for the query to operate on it. If there is not enough memory some of it will need to be off loaded into virtual memory and will cause longer run time do those parts that were off loaded need to be reloaded. If this file continues to grow the situation will only get worse.
0
 

Author Comment

by:AlHal2
ID: 40573189
Would you be able to suggest some SQL to ingest the file into an SQL Server database?
I'm open to any other suggestions.
0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40573780
Storing the data on a SQL database would help seeming that the database would work on is tables and only returns the needed information. The issue now is to get the data into tables into the database and I don't know of any program available to do this directly from your XML.
0
 
LVL 36

Accepted Solution

by:
Miguel Oz earned 500 total points
ID: 40574451
I do not think SQL can help you because you are adding extra resources to finish your task.

To load XML file partially into memory but processing the file node by node, you could use the following MSDN suggestion

Basically you load only the organization nodes one by one using this method:
        static IEnumerable<XElement> SimpleStreamAxis(
                       string filename, string matchName)
        {
            using (XmlTextReader reader =  new XmlTextReader(filename))
            {
                reader.MoveToContent();
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            if (reader.LocalName == matchName)
                            {
                                XElement el = XElement.ReadFrom(reader)
                                                      as XElement;
                                if (el != null)
                                    yield return el;
                            }
                            break;
                    }
                }
                reader.Close();
            }
        }

Open in new window

Then in the query code replace the following
XElement doc = XElement.Load(@"f:\temp\C--OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
                          where el.Name.LocalName == "Organization"

Open in new window

with:
string csv = (from el in SimpleStreamAxis(@"f:\temp\C--OAOrganization-File.xml", "Organization")

Open in new window


The code above is replacing the doc instance and where condition in your original code.
0
 

Author Closing Comment

by:AlHal2
ID: 40574889
Thanks.
0

Featured Post

Is Your DevOps Pipeline Leaking?

Is your CI/CD pipeline a hodge-podge of randomly connected tools? You’ve likely got a tool to fix one problem & then a different tool to fix another, resulting in a cluster of tools with overlapping functionality. Learn how to optimize your pipeline with Gartner's recommendations

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question