Link to home
Start Free TrialLog in
Avatar of Brian Pringle
Brian PringleFlag for United States of America

asked on

C# - Read Through Non-Standard XML File

I am trying to create a C# application that can read the data from a non-standard XML file and have not been successful in getting it to work.  We have different platforms that create different types of XML documents, but none of them are in the format that I am used to seeing.  I am trying to play with one from a generic Tableau workbook.  

In the attached XML document, I need to get the name of each worksheet that is contained in the workbook (think of tabs in Excel).  Then, I need to get a list of each data source for each tab.

For this example...

There is one workbook.  The workbook contains two worksheets (tabs) and uses two data sources.

The two data sources are "Expedite_Today (SQLTABLENAME)" and "Employee Table".

There are two worksheets (tabs) -- "Uses both tables" and "Uses only one table"

The "Uses both tables" uses both of the data sources.  The "Uses only one table" only uses the "Employee Table" data source.  

I am trying to make a hierarchy of the worksheets and list the data sources used by each sheet.  Essentially, I want something like the following.

The elements that I need are buried under the following

Data Source - workbook \ worksheets \ worksheet \ table \ view \ datasources \ datasource --> caption='NEED THIS'
Worksheet - workbook \ worksheets \ worksheet --> name='NEED THIS'


Workbook
|
 -- Uses both tables
|  |
|    -- Expedite_Today (SQLTABLENAME)
|    -- Employee Table
|
 -- Uses only one table
    |
     -- Employee Table



I found the following program for Ruby, but need to learn how to do this in C#.  

http://tableaufriction.blogspot.ca/2015/03/danger-dont-delete-that-data-source.html

Does anyone have any idea of how I can build a C# program that can read through this XML document and get that type of output?
Tableau.xml
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Hi Brian;

The following code snippet will get the nodes you want.
using System.Xml.Linq;

// Set the file location
XDocument xdoc = XDocument.Load(@"PathTo\Tableau.xml");

// Get the datasource nodes
var result = from ws in xdoc.Descendants("worksheet")
             where ws.Attribute("name") != null
             select ws.Descendants("datasource");

// Show the results
foreach (IEnumerable<XElement> datasource in result)
{
    Console.WriteLine("Uses {0} table", datasource.Count());
    foreach (XElement source in datasource)
    {
        Console.WriteLine("    Caption: {0} \n    Name: {1}\n", source.Attribute("caption").Value, source.Attribute("name").Value);
    }
}

// Result of running above code
Uses 2 table
    Caption: Expedite_Today (SQLTABLENAME) 
    Name: sqlserver.0wdlw5t1ar9mx91fm4e2612rcd97

    Caption: Employee Table 
    Name: excel-direct.030aebq1mqhz6s174luhm0wdblj5

Uses 1 table
    Caption: Employee Table 
    Name: excel-direct.030aebq1mqhz6s174luhm0wdblj5

Open in new window

Avatar of Brian Pringle

ASKER

@Fernando Soto - Thanks!  I am going to try this out later today and will let you know the outcome.
@Fernando Soto - That is close to what I need, but how can I get something that looks like the following?  I want the name of the worksheet and then a list of all of the data sources that are used.

"Uses only one table" and "Uses both tables" are actually the names of the worksheets.

Workbook
|
 -- Uses both tables
|  |
|    -- Expedite_Today (SQLTABLENAME)
|    -- Employee Table
|
 -- Uses only one table
    |
     -- Employee Table

Open in new window

It is just a matter of reformatting the string values that are being returned by the query in the variable result. Something like this by rewriting the foreach loops.
// Show the results
Console.WriteLine ("Workbook\n|" );
foreach ( IEnumerable<XElement> datasource in result ) {
    switch ( datasource.Count() ) {
        case 1:
            Console.WriteLine ( " -- Uses only one table" );
            break;
        case 2:
            Console.WriteLine ( " -- Uses both tables" );
            break;
        default:
            continue;
    }
    Console.WriteLine ( "|    |" );
    foreach ( XElement source in datasource ) {
        Console.WriteLine ( "|    -- {0}", source.Attribute ( "caption" ).Value);
    }
}

Open in new window

I think I didn't explain something.  The example that I gave has a worksheet with one table and another worksheet with two tables.  However, that will not always be the case.  I need the program to go through the XML document, find the name of the worksheet, and the list the datasources.  The program needs to figure out the name of the worksheet, not assign one.

I need this program to work for any Tableau xml document that we throw at it.
I would also be okay with dumping the results into an array and then I can format it from there.  

I have been playing with what you sent and I can get the name of the worksheet or the name of the datasource / caption, but can't quite get them at the same time.
ASKER CERTIFIED SOLUTION
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I was travelling yesterday and will try this later today.
I've requested that this question be deleted for the following reason:

Not enough information to confirm an answer.
Hi angellll;

Please see my last post in this thread. I have been working with Brian and got it down to this solution where he stated he would try it. I am sure that if it did not do what he wanted he would have posted something.
Fernando -- It works, but I still do not get all of the modules.  However, it got me going enough to be able to work through the rest of it.
Thanks Brian. If you still need help with this please post another question.

Have a great day.