Solved

more XML to Excel data manipulation

Posted on 2011-02-14
11
400 Views
Last Modified: 2012-06-21
Hi,

Had an expert help create the code snippet attached and it does a nice job of pulling values from a particular tag out of a bunch of XML files.

However the snippet doesn't appear to look for the actual tag name "Text" which happens to be the first tag in each node and from where we are pulling our values.

But what if we wanted to pull out the 7th tag values instead of the first? Again, I don't seem to see where the tag name is spelled out anywhere. This new tag is named CID and is in the 7th position IF present, in the XML nodes in each file.

What am I missing? Where would I point to a different tag value within my XML nodes?

Thanks

Bill
Private Sub CommandButton1_Click()
    Dim strPath As String, strCurrentFile As String
    Dim MyData As String, strData() As String, strFile As String
    Dim temp() As String
    Dim I As Long, LastRow As Long
    Dim ws As Worksheet
    
    '~~> Change the path to the folder where the XML's are stored
    strPath = "C:\Temp\"
    
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    
    strCurrentFile = Dir(strPath & "c*.xml")
    
    '~~> Loop through the folder to get the xml files
    Do While strCurrentFile <> ""
        strFile = strPath & strCurrentFile
        
        Set ws = Sheets.Add
        ws.Name = Replace(strCurrentFile, ".xml", "", , , vbTextCompare)
        ws.Range("A1") = "File"
        ws.Range("B1") = "Text"
        
        '~~> Open XML as binary for faster reading in one go
        Open strFile For Binary As #1
        MyData = Space$(LOF(1))
        Get #1, , MyData
        Close #1
        
        '~~> Split the data and store it in an array
        strData() = Split(MyData, vbCrLf)
        
        '~~> Get the next available row
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        
        '~~> Generate output WITH DUPLICATES
        For I = 0 To UBound(strData())
            If InStr(strData(I), "<node template=") Then
                temp = Split(strData(I), """")
                ws.Range("A" & LastRow) = strFile
                ws.Range("B" & LastRow) = temp(3)
                LastRow = LastRow + 1
            End If
        Next
        
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        
        '~~> Remove DUPLICATES
        ws.Range("$A$1:$B$" & LastRow).RemoveDuplicates Columns:=Array(1, 2), Header _
        :=xlYes
        
        ws.Cells.EntireColumn.AutoFit
        
        Application.ScreenUpdating = True
        Application.DisplayAlerts = True
    
        strCurrentFile = Dir
    Loop
End Sub

Open in new window

0
Comment
Question by:billium99
  • 6
  • 5
11 Comments
 
LVL 11

Expert Comment

by:jkpieterse
Comment Utility
Without a sample xml file and a clear indication what needs to be extracted this is hard to answer!
0
 
LVL 1

Author Comment

by:billium99
Comment Utility
Hmm - OK here is the link to the previous question:

http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_26803892.html

But the code above is modified from the final code in this previous answer.

I'm also re-attaching the XML file, but again, I just need to understand in the current code, how I would change to a different tag on each node.

Thanks for the help!

Bill
sample.xml
0
 
LVL 11

Expert Comment

by:jkpieterse
Comment Utility
I would use the Microsoft XML library to properly navigate the XML.
But unfortunately, it doesn't like the structure of the XML.

After removing this part entirely:

<?dctm xml_app="ignore"?>
<!DOCTYPE tree [
<!ELEMENT node EMPTY>
<!ATTLIST node
  template CDATA #REQUIRED
  text CDATA #REQUIRED
      ID CDATA #REQUIRED
      Log CDATA #REQUIRED
  ShowCheckBox CDATA #REQUIRED
  EndLevel2Code CDATA #REQUIRED
      PID CDATA #REQUIRED
      Doc_Link CDATA ""
      CID CDATA ""
      Book CDATA "">
<!ELEMENT tree (node)+>
]>

I could use the code shown below to get a message of each value in the Text element of each node tag.
Option Explicit

Sub ImportXML()
'Set a reference to the Microsoft XML 6.0 library
    Dim oDOM As MSXML2.DOMDocument60
    Dim vFilename As Variant
    Dim nNodeList As MSXML2.IXMLDOMNodeList
    Dim nNode As MSXML2.IXMLDOMNode
    vFilename = Application.GetOpenFilename("XML Files (*.xml), *.xml", , "Please select an XML file", , False)
    If TypeName(vFilename) = "String" Then
        If Len(Dir(vFilename)) = 0 Then Exit Sub
        Set oDOM = New DOMDocument60
        oDOM.Load vFilename
        Set nNodeList = oDOM.getElementsByTagName("node")
        If nNodeList.Length > 0 Then
            For Each nNode In nNodeList
                MsgBox nNode.Attributes(1).NodeValue
            Next
        End If
    End If
End Sub

Open in new window

0
 
LVL 1

Author Comment

by:billium99
Comment Utility
OK - my first post has a code snippet.

How is that snippet grabbing the values in the "text" tags - in other words, each XML node has text="some value" but the code snippet does not appear to look for a tag name. So it's somehow finding the text="values" and making a list of unique values in Excel. All I'm looking for is to take a different tag - this time CID="values" which does not appear in every node, but otherwise is essentially the same requirement I had for the first code snippet.

I was thinking this is a couple of code line changes and nothing more, but anything I tried seemed to break the code.

Any advice on the existing snippet?

Thanks

Bill
0
 
LVL 11

Expert Comment

by:jkpieterse
Comment Utility
The detection of which line it wants is here:
  If InStr(strData(I), "<node template=") Then

Change that to:

  If InStr(strData(I), " CID=""") Then

and go from there
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 1

Author Comment

by:billium99
Comment Utility
I'm sorry - that's not correct. I went ahead and tried what you said nonetheless, and it pulls the exact same data.

Here is a sample node:

<node template="Doc_Frame" text="Please run the Recovery Solution from the partition. Click here for instructions." ID="3.2.1" Log="Trying Recovery from the partition" ShowCheckBox="False" EndLevel2Code="" PID="3.2" CID="c01895783" Book="#RestoreWithoutWindows"  />

The current code grabs the value in text=

I'm wanting the CID="c01895783"

Does that make sense?

Changing the line to:

 If InStr(strData(I), "CID=") Then

gives me exactly the same result, which are the values in the text= tag.

Thanks

Bill
0
 
LVL 1

Author Comment

by:billium99
Comment Utility
It may be that the original code worked because it simply grabbed the first tag= value after <node-template=" - it doesn't grab the node-template value - it grabs the subsequent text="value" - so perhaps the current code can't simply be4 modified to search for CID= since CID tag comes much later in each node...
0
 
LVL 11

Accepted Solution

by:
jkpieterse earned 500 total points
Comment Utility
Hi,

OK, lets combine both methods into one which:
- goes through all files
- reads them and removes the parts Microsoft XML doe snot like
- finds all nodes named "node"
- in such a node, find an attribute called "CID"
- Store its content
- removes duplicates.

Note you'll have to edit the path in my code and the file filter
Option Explicit

Private Sub CommandButton1_Click()
    Dim strPath As String, strCurrentFile As String
    Dim MyData As String, strData() As String, strFile As String
    Dim temp() As String
    Dim I As Long, LastRow As Long
    Dim ws As Worksheet
    'Set a reference to the Microsoft XML 6.0 library
    Dim oDOM As MSXML2.DOMDocument60
    Dim vFilename As Variant
    Dim nNodeList As MSXML2.IXMLDOMNodeList
    Dim nNode As MSXML2.IXMLDOMNode
    Dim bTreeFound As Boolean
    Dim sXML As String
    Dim aAttrib As MSXML2.IXMLDOMAttribute
    '~~> Change the path to the folder where the XML's are stored
    'Adjust!!!
    strPath = "C:\Temp\"

    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    'Adjust!!!
    strCurrentFile = Dir(strPath & "s*.xml")

    '~~> Loop through the folder to get the xml files
    Do While strCurrentFile <> ""
        strFile = strPath & strCurrentFile

        Set ws = Sheets.Add
        ws.Name = Replace(strCurrentFile, ".xml", "", , , vbTextCompare)
        ws.Range("A1") = "File"
        ws.Range("B1") = "Text"

        '~~> Open XML as binary for faster reading in one go
        Open strFile For Binary As #1
        MyData = Space$(LOF(1))
        Get #1, , MyData
        Close #1

        '~~> Split the data and store it in an array
        strData() = Split(MyData, vbCrLf)

        '~~> Get the next available row
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1

        'Only keep the lines so we end with proper XML
        For I = 0 To UBound(strData())
            If InStr(strData(I), "<tree ") Then
                bTreeFound = True
            End If
            If I = 0 Or bTreeFound Then
                sXML = sXML & strData(I) & vbNewLine
            End If
        Next
        sXML = Left(sXML, Len(sXML) - 1)
        Set oDOM = New DOMDocument60
        'Load XML into DOM object
        oDOM.LoadXML sXML
        'Get the nodes we need
        Set nNodeList = oDOM.getElementsByTagName("node")
        If nNodeList.Length > 0 Then
            For Each nNode In nNodeList
                For Each aAttrib In nNode.Attributes
                    'find the attribute having cid name
                    If LCase(aAttrib.nodeName) = "cid" Then
                        ws.Range("A" & LastRow) = strFile
                        ws.Range("B" & LastRow) = aAttrib.nodeValue
                        LastRow = LastRow + 1
                    End If
                Next
            Next
        End If
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        '~~> Remove DUPLICATES
        ws.Range("$A$1:$B$" & LastRow).RemoveDuplicates Columns:=Array(1, 2), _
                                                        Header:=xlYes
        ws.Cells.EntireColumn.AutoFit
        strCurrentFile = Dir
    Loop
    Application.ScreenUpdating = True
    Application.DisplayAlerts = True
End Sub

Open in new window

0
 
LVL 1

Author Comment

by:billium99
Comment Utility
Hmm - I don't wish to change the XML file at all.

Our proprietary system uses that information while publishing the file for better or worse.
0
 
LVL 11

Expert Comment

by:jkpieterse
Comment Utility
The XML file is NOT changed. The code reads the file into memory, then -in memory- removes the bit of the XML that the remainder of the code does not like (and is of no importance to the actual content). Then the XML is stepped through to find the nodes you need and it puts them in some cells in Excel.
But the XML file itself remains untouched.
0
 
LVL 1

Author Closing Comment

by:billium99
Comment Utility
Thanks for the help!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Convert between Excel file formats (.XLS, .XLSX, .XLSM) with/without macro option David Miller (dlmille) Intro Over this past Fall, I've had the opportunity to see several similar requests and have developed a couple related solutions associate…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The view will learn how to download and install SIMTOOLS and FORMLIST into Excel, how to use SIMTOOLS to generate a Monte Carlo simulation of 30 sales calls, and how to calculate the conditional probability based on the results of the Monte Carlo …
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now