Solved

Parsing v. large nested xml file.

Posted on 2015-01-30
4
63 Views
Last Modified: 2016-02-18
I’m using the attached c# code to parse the attached xml file.  The problem is that within each organization tag there are multiple organizationName tags.  This is because organizations change their names over time.  The code is only getting the most recent whereas I would like all the history.
Ideally, I’d also like to be able to filter on specific elements and/or attributes.  However this is lower priority as I can just iterate through the file created by the code.
C--OAOrganization-File2.txtC--OAOrganization-SourceCode.txt
0
Comment
Question by:AlHal2
  • 3
4 Comments
 
LVL 63

Expert Comment

by:Fernando Soto
ID: 40579847
Hi AlHal2;

Is something like this that you are looking for?
// Load document into memory
XDocument xdoc = XDocument.Load(@"Path to XML File\C--OAOrganization-File2.xml");

// XML NameSpace used in documents
XNamespace ns = xdoc.Root.GetDefaultNamespace();
XNamespace env = xdoc.Root.GetNamespaceOfPrefix("env");

// Query for needed information
var results = (from o in xdoc.Descendants(ns + "Organization")
               from orgn in o.Elements(ns + "OrganizationName") 
               select new
               {
                   Id = o.Element(ns + "OrganizationId").Value,
                   Name = orgn.Value
               });
               
Console.WriteLine("Id                  Name");               
foreach (var org in results)
{
    Console.WriteLine("{0}   {1}", org.Id, org.Name);
}               

Open in new window

Result of above Linq query
Id           Name
4295904866   S. Y. BANCORP, INC.
4295904866   STOCK YARDS BANCORP, INC.
4295904866   Stock Yards
4295904882   SAUL CENTERS, INC.
4295904882   Saul Centers
4295904889   SCHUFF STEEL CO
4295904889   SCHUFF INTERNATIONAL, INC.
4295904889   Schuff Intl

Open in new window

0
 

Author Comment

by:AlHal2
ID: 40581772
The advantage of the cc# code is that it parses the file bit by bit.  If I ingest the entire 8GB file into memory the program will not run.
0
 

Accepted Solution

by:
AlHal2 earned 0 total points
ID: 40586030
This code from a colleague worked for me.

using System;
using System.Xml;

namespace ReadXMLfromFile
{
    /// <summary>
    
    /// </summary>
    class pdaXMLParser
    {
        static void Main(string[] args)
        {
            XmlTextReader reader = new XmlTextReader("c:\\temp\\file2.xml");
            string csvRoot = "";
            string sep = "|";

            //write the output file header string (overwrite any existing file)
            using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt"))
            {
                file.WriteLine("OrganizationID|entityCreatedDate|entityModifiedDate|OrganizationName|OrganizationName_effectiveFrom|OrganizationName_effectiveTo|OrganizationName_organizationNameTypeCode|organizationName_LanguageID|OrganizationName_organizationNameLocalNormalized");
            }

            while (reader.Read())
            {
                // Only detect start elements.
		        if (reader.IsStartElement())
		        {
		            // Get element name and switch on it.
		            switch (reader.Name)
		            {
			        case "Data":
			            // Detect this element.
			            //Console.WriteLine("Start <data> element.");
			            break;

                    case "Organization":
                        //start a new csv string for use later... 
                        csvRoot = "";

                        // Detect the Organization element and extract the required attributes
                        string attribute = reader["entityCreatedDate"];            
			            if (attribute != null)
			            {
                            csvRoot += attribute;
			            }
                        else { csvRoot += sep; }

                        attribute = reader["entityModifiedDate"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  entityModifiedDate: " + attribute);
                            csvRoot += sep + attribute;
			            }
                        else { csvRoot += sep; }

			            break;

                    case "OrganizationId":
                        // Detect the Organization element and extract the required data from the next record
                        if (reader.Read())
                        {
                        //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                        //prefix the root data with the OrgID
                        csvRoot = reader.Value.Trim()+sep + csvRoot;
                        }
                        else { csvRoot = sep + csvRoot; }

                        break;

                    case "OrganizationName":
                        // Detect the Organization element and extract the required data from the attributes
                        
                        //reset the details field as there may be >1 Organization name per OrganizationID
                        string csvNameDetails = "";

                        attribute = reader["effectiveFrom"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  effectiveFrom: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else { csvNameDetails += sep; }
        
                        attribute = reader["effectiveTo"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  effectiveTo: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameTypeCode"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  organizationNameTypeCode: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["languageId"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  languageId: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameLocalNormalized"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  organizationNameLocalNormalized: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        // read ahead to get the Organization Name text and prefix this to the attribute data
                        if (reader.Read())
                        {
                            //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                            csvNameDetails = reader.Value.Trim() + csvNameDetails ;
                        }
                        else { csvNameDetails += sep; }

                        //write the root details for the OrganizationID along with the current OrganizationName details
                       // Console.WriteLine(csvRoot+sep+csvNameDetails);
                        
                        //write the output file data string (append to an existing file)
                        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt",true))
                        {
                            file.WriteLine(csvRoot + sep+ csvNameDetails);
                        }
                        break;

		            }
		        }
	    
                }
            Console.ReadLine();
        }
    }
}

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 40596630
it works.
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question