Solved

Parsing v. large nested xml file.

Posted on 2015-01-30
4
56 Views
Last Modified: 2016-02-18
I’m using the attached c# code to parse the attached xml file.  The problem is that within each organization tag there are multiple organizationName tags.  This is because organizations change their names over time.  The code is only getting the most recent whereas I would like all the history.
Ideally, I’d also like to be able to filter on specific elements and/or attributes.  However this is lower priority as I can just iterate through the file created by the code.
C--OAOrganization-File2.txtC--OAOrganization-SourceCode.txt
0
Comment
Question by:AlHal2
  • 3
4 Comments
 
LVL 62

Expert Comment

by:Fernando Soto
Comment Utility
Hi AlHal2;

Is something like this that you are looking for?
// Load document into memory
XDocument xdoc = XDocument.Load(@"Path to XML File\C--OAOrganization-File2.xml");

// XML NameSpace used in documents
XNamespace ns = xdoc.Root.GetDefaultNamespace();
XNamespace env = xdoc.Root.GetNamespaceOfPrefix("env");

// Query for needed information
var results = (from o in xdoc.Descendants(ns + "Organization")
               from orgn in o.Elements(ns + "OrganizationName") 
               select new
               {
                   Id = o.Element(ns + "OrganizationId").Value,
                   Name = orgn.Value
               });
               
Console.WriteLine("Id                  Name");               
foreach (var org in results)
{
    Console.WriteLine("{0}   {1}", org.Id, org.Name);
}               

Open in new window

Result of above Linq query
Id           Name
4295904866   S. Y. BANCORP, INC.
4295904866   STOCK YARDS BANCORP, INC.
4295904866   Stock Yards
4295904882   SAUL CENTERS, INC.
4295904882   Saul Centers
4295904889   SCHUFF STEEL CO
4295904889   SCHUFF INTERNATIONAL, INC.
4295904889   Schuff Intl

Open in new window

0
 

Author Comment

by:AlHal2
Comment Utility
The advantage of the cc# code is that it parses the file bit by bit.  If I ingest the entire 8GB file into memory the program will not run.
0
 

Accepted Solution

by:
AlHal2 earned 0 total points
Comment Utility
This code from a colleague worked for me.

using System;
using System.Xml;

namespace ReadXMLfromFile
{
    /// <summary>
    
    /// </summary>
    class pdaXMLParser
    {
        static void Main(string[] args)
        {
            XmlTextReader reader = new XmlTextReader("c:\\temp\\file2.xml");
            string csvRoot = "";
            string sep = "|";

            //write the output file header string (overwrite any existing file)
            using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt"))
            {
                file.WriteLine("OrganizationID|entityCreatedDate|entityModifiedDate|OrganizationName|OrganizationName_effectiveFrom|OrganizationName_effectiveTo|OrganizationName_organizationNameTypeCode|organizationName_LanguageID|OrganizationName_organizationNameLocalNormalized");
            }

            while (reader.Read())
            {
                // Only detect start elements.
		        if (reader.IsStartElement())
		        {
		            // Get element name and switch on it.
		            switch (reader.Name)
		            {
			        case "Data":
			            // Detect this element.
			            //Console.WriteLine("Start <data> element.");
			            break;

                    case "Organization":
                        //start a new csv string for use later... 
                        csvRoot = "";

                        // Detect the Organization element and extract the required attributes
                        string attribute = reader["entityCreatedDate"];            
			            if (attribute != null)
			            {
                            csvRoot += attribute;
			            }
                        else { csvRoot += sep; }

                        attribute = reader["entityModifiedDate"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  entityModifiedDate: " + attribute);
                            csvRoot += sep + attribute;
			            }
                        else { csvRoot += sep; }

			            break;

                    case "OrganizationId":
                        // Detect the Organization element and extract the required data from the next record
                        if (reader.Read())
                        {
                        //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                        //prefix the root data with the OrgID
                        csvRoot = reader.Value.Trim()+sep + csvRoot;
                        }
                        else { csvRoot = sep + csvRoot; }

                        break;

                    case "OrganizationName":
                        // Detect the Organization element and extract the required data from the attributes
                        
                        //reset the details field as there may be >1 Organization name per OrganizationID
                        string csvNameDetails = "";

                        attribute = reader["effectiveFrom"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  effectiveFrom: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else { csvNameDetails += sep; }
        
                        attribute = reader["effectiveTo"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  effectiveTo: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameTypeCode"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  organizationNameTypeCode: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["languageId"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  languageId: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameLocalNormalized"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  organizationNameLocalNormalized: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        // read ahead to get the Organization Name text and prefix this to the attribute data
                        if (reader.Read())
                        {
                            //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                            csvNameDetails = reader.Value.Trim() + csvNameDetails ;
                        }
                        else { csvNameDetails += sep; }

                        //write the root details for the OrganizationID along with the current OrganizationName details
                       // Console.WriteLine(csvRoot+sep+csvNameDetails);
                        
                        //write the output file data string (append to an existing file)
                        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt",true))
                        {
                            file.WriteLine(csvRoot + sep+ csvNameDetails);
                        }
                        break;

		            }
		        }
	    
                }
            Console.ReadLine();
        }
    }
}

Open in new window

0
 

Author Closing Comment

by:AlHal2
Comment Utility
it works.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now