Solved

Parsing v. large nested xml file.

Posted on 2015-01-30
4
60 Views
Last Modified: 2016-02-18
I’m using the attached c# code to parse the attached xml file.  The problem is that within each organization tag there are multiple organizationName tags.  This is because organizations change their names over time.  The code is only getting the most recent whereas I would like all the history.
Ideally, I’d also like to be able to filter on specific elements and/or attributes.  However this is lower priority as I can just iterate through the file created by the code.
C--OAOrganization-File2.txtC--OAOrganization-SourceCode.txt
0
Comment
Question by:AlHal2
  • 3
4 Comments
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 40579847
Hi AlHal2;

Is something like this that you are looking for?
// Load document into memory
XDocument xdoc = XDocument.Load(@"Path to XML File\C--OAOrganization-File2.xml");

// XML NameSpace used in documents
XNamespace ns = xdoc.Root.GetDefaultNamespace();
XNamespace env = xdoc.Root.GetNamespaceOfPrefix("env");

// Query for needed information
var results = (from o in xdoc.Descendants(ns + "Organization")
               from orgn in o.Elements(ns + "OrganizationName") 
               select new
               {
                   Id = o.Element(ns + "OrganizationId").Value,
                   Name = orgn.Value
               });
               
Console.WriteLine("Id                  Name");               
foreach (var org in results)
{
    Console.WriteLine("{0}   {1}", org.Id, org.Name);
}               

Open in new window

Result of above Linq query
Id           Name
4295904866   S. Y. BANCORP, INC.
4295904866   STOCK YARDS BANCORP, INC.
4295904866   Stock Yards
4295904882   SAUL CENTERS, INC.
4295904882   Saul Centers
4295904889   SCHUFF STEEL CO
4295904889   SCHUFF INTERNATIONAL, INC.
4295904889   Schuff Intl

Open in new window

0
 

Author Comment

by:AlHal2
ID: 40581772
The advantage of the cc# code is that it parses the file bit by bit.  If I ingest the entire 8GB file into memory the program will not run.
0
 

Accepted Solution

by:
AlHal2 earned 0 total points
ID: 40586030
This code from a colleague worked for me.

using System;
using System.Xml;

namespace ReadXMLfromFile
{
    /// <summary>
    
    /// </summary>
    class pdaXMLParser
    {
        static void Main(string[] args)
        {
            XmlTextReader reader = new XmlTextReader("c:\\temp\\file2.xml");
            string csvRoot = "";
            string sep = "|";

            //write the output file header string (overwrite any existing file)
            using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt"))
            {
                file.WriteLine("OrganizationID|entityCreatedDate|entityModifiedDate|OrganizationName|OrganizationName_effectiveFrom|OrganizationName_effectiveTo|OrganizationName_organizationNameTypeCode|organizationName_LanguageID|OrganizationName_organizationNameLocalNormalized");
            }

            while (reader.Read())
            {
                // Only detect start elements.
		        if (reader.IsStartElement())
		        {
		            // Get element name and switch on it.
		            switch (reader.Name)
		            {
			        case "Data":
			            // Detect this element.
			            //Console.WriteLine("Start <data> element.");
			            break;

                    case "Organization":
                        //start a new csv string for use later... 
                        csvRoot = "";

                        // Detect the Organization element and extract the required attributes
                        string attribute = reader["entityCreatedDate"];            
			            if (attribute != null)
			            {
                            csvRoot += attribute;
			            }
                        else { csvRoot += sep; }

                        attribute = reader["entityModifiedDate"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  entityModifiedDate: " + attribute);
                            csvRoot += sep + attribute;
			            }
                        else { csvRoot += sep; }

			            break;

                    case "OrganizationId":
                        // Detect the Organization element and extract the required data from the next record
                        if (reader.Read())
                        {
                        //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                        //prefix the root data with the OrgID
                        csvRoot = reader.Value.Trim()+sep + csvRoot;
                        }
                        else { csvRoot = sep + csvRoot; }

                        break;

                    case "OrganizationName":
                        // Detect the Organization element and extract the required data from the attributes
                        
                        //reset the details field as there may be >1 Organization name per OrganizationID
                        string csvNameDetails = "";

                        attribute = reader["effectiveFrom"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  effectiveFrom: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else { csvNameDetails += sep; }
        
                        attribute = reader["effectiveTo"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  effectiveTo: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameTypeCode"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  organizationNameTypeCode: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["languageId"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  languageId: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameLocalNormalized"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  organizationNameLocalNormalized: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        // read ahead to get the Organization Name text and prefix this to the attribute data
                        if (reader.Read())
                        {
                            //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                            csvNameDetails = reader.Value.Trim() + csvNameDetails ;
                        }
                        else { csvNameDetails += sep; }

                        //write the root details for the OrganizationID along with the current OrganizationName details
                       // Console.WriteLine(csvRoot+sep+csvNameDetails);
                        
                        //write the output file data string (append to an existing file)
                        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt",true))
                        {
                            file.WriteLine(csvRoot + sep+ csvNameDetails);
                        }
                        break;

		            }
		        }
	    
                }
            Console.ReadLine();
        }
    }
}

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 40596630
it works.
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…
As a trusted technology advisor to your customers you are likely getting the daily question of, ‘should I put this in the cloud?’ As customer demands for cloud services increases, companies will see a shift from traditional buying patterns to new…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

27 Experts available now in Live!

Get 1:1 Help Now