Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Parsing v. large nested xml file.

Posted on 2015-01-30
4
Medium Priority
?
73 Views
Last Modified: 2016-02-18
I’m using the attached c# code to parse the attached xml file.  The problem is that within each organization tag there are multiple organizationName tags.  This is because organizations change their names over time.  The code is only getting the most recent whereas I would like all the history.
Ideally, I’d also like to be able to filter on specific elements and/or attributes.  However this is lower priority as I can just iterate through the file created by the code.
C--OAOrganization-File2.txtC--OAOrganization-SourceCode.txt
0
Comment
Question by:AlHal2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 64

Expert Comment

by:Fernando Soto
ID: 40579847
Hi AlHal2;

Is something like this that you are looking for?
// Load document into memory
XDocument xdoc = XDocument.Load(@"Path to XML File\C--OAOrganization-File2.xml");

// XML NameSpace used in documents
XNamespace ns = xdoc.Root.GetDefaultNamespace();
XNamespace env = xdoc.Root.GetNamespaceOfPrefix("env");

// Query for needed information
var results = (from o in xdoc.Descendants(ns + "Organization")
               from orgn in o.Elements(ns + "OrganizationName") 
               select new
               {
                   Id = o.Element(ns + "OrganizationId").Value,
                   Name = orgn.Value
               });
               
Console.WriteLine("Id                  Name");               
foreach (var org in results)
{
    Console.WriteLine("{0}   {1}", org.Id, org.Name);
}               

Open in new window

Result of above Linq query
Id           Name
4295904866   S. Y. BANCORP, INC.
4295904866   STOCK YARDS BANCORP, INC.
4295904866   Stock Yards
4295904882   SAUL CENTERS, INC.
4295904882   Saul Centers
4295904889   SCHUFF STEEL CO
4295904889   SCHUFF INTERNATIONAL, INC.
4295904889   Schuff Intl

Open in new window

0
 

Author Comment

by:AlHal2
ID: 40581772
The advantage of the cc# code is that it parses the file bit by bit.  If I ingest the entire 8GB file into memory the program will not run.
0
 

Accepted Solution

by:
AlHal2 earned 0 total points
ID: 40586030
This code from a colleague worked for me.

using System;
using System.Xml;

namespace ReadXMLfromFile
{
    /// <summary>
    
    /// </summary>
    class pdaXMLParser
    {
        static void Main(string[] args)
        {
            XmlTextReader reader = new XmlTextReader("c:\\temp\\file2.xml");
            string csvRoot = "";
            string sep = "|";

            //write the output file header string (overwrite any existing file)
            using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt"))
            {
                file.WriteLine("OrganizationID|entityCreatedDate|entityModifiedDate|OrganizationName|OrganizationName_effectiveFrom|OrganizationName_effectiveTo|OrganizationName_organizationNameTypeCode|organizationName_LanguageID|OrganizationName_organizationNameLocalNormalized");
            }

            while (reader.Read())
            {
                // Only detect start elements.
		        if (reader.IsStartElement())
		        {
		            // Get element name and switch on it.
		            switch (reader.Name)
		            {
			        case "Data":
			            // Detect this element.
			            //Console.WriteLine("Start <data> element.");
			            break;

                    case "Organization":
                        //start a new csv string for use later... 
                        csvRoot = "";

                        // Detect the Organization element and extract the required attributes
                        string attribute = reader["entityCreatedDate"];            
			            if (attribute != null)
			            {
                            csvRoot += attribute;
			            }
                        else { csvRoot += sep; }

                        attribute = reader["entityModifiedDate"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  entityModifiedDate: " + attribute);
                            csvRoot += sep + attribute;
			            }
                        else { csvRoot += sep; }

			            break;

                    case "OrganizationId":
                        // Detect the Organization element and extract the required data from the next record
                        if (reader.Read())
                        {
                        //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                        //prefix the root data with the OrgID
                        csvRoot = reader.Value.Trim()+sep + csvRoot;
                        }
                        else { csvRoot = sep + csvRoot; }

                        break;

                    case "OrganizationName":
                        // Detect the Organization element and extract the required data from the attributes
                        
                        //reset the details field as there may be >1 Organization name per OrganizationID
                        string csvNameDetails = "";

                        attribute = reader["effectiveFrom"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  effectiveFrom: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else { csvNameDetails += sep; }
        
                        attribute = reader["effectiveTo"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  effectiveTo: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameTypeCode"];
			            if (attribute != null)
			            {
                            //Console.WriteLine("  organizationNameTypeCode: " + attribute);
                            csvNameDetails += sep + attribute;
			            }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["languageId"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  languageId: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        attribute = reader["organizationNameLocalNormalized"];
                        if (attribute != null)
                        {
                            //Console.WriteLine("  organizationNameLocalNormalized: " + attribute);
                            csvNameDetails += sep + attribute;
                        }
                        else
                        { csvNameDetails += sep; }

                        // read ahead to get the Organization Name text and prefix this to the attribute data
                        if (reader.Read())
                        {
                            //Console.WriteLine("  Organization ID: " + reader.Value.Trim());
                            csvNameDetails = reader.Value.Trim() + csvNameDetails ;
                        }
                        else { csvNameDetails += sep; }

                        //write the root details for the OrganizationID along with the current OrganizationName details
                       // Console.WriteLine(csvRoot+sep+csvNameDetails);
                        
                        //write the output file data string (append to an existing file)
                        using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\temp\OrganizationNameParsed.txt",true))
                        {
                            file.WriteLine(csvRoot + sep+ csvNameDetails);
                        }
                        break;

		            }
		        }
	    
                }
            Console.ReadLine();
        }
    }
}

Open in new window

0
 

Author Closing Comment

by:AlHal2
ID: 40596630
it works.
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Client Need Led Us to RSS I recently had an investment company ask me how they might notify their constituents about their newsworthy publications.  Probably you would think "Facebook" or "Twitter" but this is an interesting client.  Their cons…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question