Solved

Convert xml to csv 3

Posted on 2015-01-27
9
84 Views
Last Modified: 2015-01-28
Please see

http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28603967.html
http://www.experts-exchange.com/Programming/Languages/C_Sharp/Q_28600021.html

The file is very large, so these methods are killing the memory even with a filter.  Any suggestions?
Also, I'd like to put some OrganizationIDs into a text file and have the program filter the output based on that text file.
0
Comment
Question by:AlHal2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
9 Comments
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40572948
Hi AlHal2;

I made a couple of changes to the code snippet so that you can filter on more then one origanizationID at a time. In the below code snippet I used one orginazation ID per line and read them into memory into an array. If your file is formatted differently you will need to extract the ID's into an array or into a List. Also in the code snippet you will need to change the file path and name to meet your needs.

// The values to filter on
// The File OrigIds.txt in this case ontains one OriginazationId per line
// If you have a different format in the file you will need to extract the ID's
// so that you have one ID per element of the array or List<>
string[] origIDs = File.ReadAllLines(@"C:\Working Directory\OrigIds.txt");
string typeName = "AKA";
string effectiveTo = "2005-08-18T04:00:00";

XElement doc = XElement.Load(@"C:\Working Directory\OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
              let ns = String.Format("{{{0}}}",el.Name.NamespaceName)
              where  el.Name.LocalName == "Organization" && ((origIDs.Contains(el.Element(ns + "OrganizationId").Value)) || 
                    (el.Element(ns + "OrganizationName").Attribute("organizationNameTypeCode").Value == typeName) ||
                    (el.Element(ns + "OrganizationName").Attribute("effectiveTo").Value == effectiveTo))
              select String.Format("{0},{1},{2},{3}",
              (string)el.Element(ns + "OrganizationId"),
              (string)el.Element(ns + "AdminStatus").Attribute("effectiveFrom"),
              (string)el.Element(ns + "AdminStatus"),
              Environment.NewLine
              )
              )
              .Aggregate( new StringBuilder(),  (sb, s) => sb.Append(s), sb => sb.ToString()
              );

Open in new window

0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40572954
Also can you please explain what you mean by this statement, "The file is very large, so these methods are killing the memory even with a filter"?
0
 

Author Comment

by:AlHal2
ID: 40573036
Thanks for this.
I mean the program goes through a 30mb file in seconds, but I leave an 8gb file for over an hour. The memory usage is enormous.
0
Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

 

Author Comment

by:AlHal2
ID: 40573092
I think the program treats the file like one long string.
0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40573138
The file is being loaded, all 8 GB, into memory in order for the query to operate on it. If there is not enough memory some of it will need to be off loaded into virtual memory and will cause longer run time do those parts that were off loaded need to be reloaded. If this file continues to grow the situation will only get worse.
0
 

Author Comment

by:AlHal2
ID: 40573189
Would you be able to suggest some SQL to ingest the file into an SQL Server database?
I'm open to any other suggestions.
0
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40573780
Storing the data on a SQL database would help seeming that the database would work on is tables and only returns the needed information. The issue now is to get the data into tables into the database and I don't know of any program available to do this directly from your XML.
0
 
LVL 36

Accepted Solution

by:
Miguel Oz earned 500 total points
ID: 40574451
I do not think SQL can help you because you are adding extra resources to finish your task.

To load XML file partially into memory but processing the file node by node, you could use the following MSDN suggestion

Basically you load only the organization nodes one by one using this method:
        static IEnumerable<XElement> SimpleStreamAxis(
                       string filename, string matchName)
        {
            using (XmlTextReader reader =  new XmlTextReader(filename))
            {
                reader.MoveToContent();
                while (reader.Read())
                {
                    switch (reader.NodeType)
                    {
                        case XmlNodeType.Element:
                            if (reader.LocalName == matchName)
                            {
                                XElement el = XElement.ReadFrom(reader)
                                                      as XElement;
                                if (el != null)
                                    yield return el;
                            }
                            break;
                    }
                }
                reader.Close();
            }
        }

Open in new window

Then in the query code replace the following
XElement doc = XElement.Load(@"f:\temp\C--OAOrganization-File.xml");
string csv = (from el in doc.Descendants()
                          where el.Name.LocalName == "Organization"

Open in new window

with:
string csv = (from el in SimpleStreamAxis(@"f:\temp\C--OAOrganization-File.xml", "Organization")

Open in new window


The code above is replacing the doc instance and where condition in your original code.
0
 

Author Closing Comment

by:AlHal2
ID: 40574889
Thanks.
0

Featured Post

Certified OpenStack Administrator Course

We just refreshed our COA course based on the Newton exam.  With 14 labs, this course goes over the different OpenStack services that are part of the certification: Dashboard, Identity Service, Image Service, Networking, Compute, Object Storage, Block Storage, and Orchestration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
This article is for Object-Oriented Programming (OOP) beginners. An Interface contains declarations of events, indexers, methods and/or properties. Any class which implements the Interface should provide the concrete implementation for each Inter…
In this video, viewers will be given step by step instructions on adjusting mouse, pointer and cursor visibility in Microsoft Windows 10. The video seeks to educate those who are struggling with the new Windows 10 Graphical User Interface. Change Cu…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
Suggested Courses

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question