Link to home
Start Free TrialLog in
Avatar of Bill Henderson
Bill HendersonFlag for United States of America

asked on

more XML to Excel data manipulation


Had an expert help create the code snippet attached and it does a nice job of pulling values from a particular tag out of a bunch of XML files.

However the snippet doesn't appear to look for the actual tag name "Text" which happens to be the first tag in each node and from where we are pulling our values.

But what if we wanted to pull out the 7th tag values instead of the first? Again, I don't seem to see where the tag name is spelled out anywhere. This new tag is named CID and is in the 7th position IF present, in the XML nodes in each file.

What am I missing? Where would I point to a different tag value within my XML nodes?


Private Sub CommandButton1_Click()
    Dim strPath As String, strCurrentFile As String
    Dim MyData As String, strData() As String, strFile As String
    Dim temp() As String
    Dim I As Long, LastRow As Long
    Dim ws As Worksheet
    '~~> Change the path to the folder where the XML's are stored
    strPath = "C:\Temp\"
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    strCurrentFile = Dir(strPath & "c*.xml")
    '~~> Loop through the folder to get the xml files
    Do While strCurrentFile <> ""
        strFile = strPath & strCurrentFile
        Set ws = Sheets.Add
        ws.Name = Replace(strCurrentFile, ".xml", "", , , vbTextCompare)
        ws.Range("A1") = "File"
        ws.Range("B1") = "Text"
        '~~> Open XML as binary for faster reading in one go
        Open strFile For Binary As #1
        MyData = Space$(LOF(1))
        Get #1, , MyData
        Close #1
        '~~> Split the data and store it in an array
        strData() = Split(MyData, vbCrLf)
        '~~> Get the next available row
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        '~~> Generate output WITH DUPLICATES
        For I = 0 To UBound(strData())
            If InStr(strData(I), "<node template=") Then
                temp = Split(strData(I), """")
                ws.Range("A" & LastRow) = strFile
                ws.Range("B" & LastRow) = temp(3)
                LastRow = LastRow + 1
            End If
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        '~~> Remove DUPLICATES
        ws.Range("$A$1:$B$" & LastRow).RemoveDuplicates Columns:=Array(1, 2), Header _
        Application.ScreenUpdating = True
        Application.DisplayAlerts = True
        strCurrentFile = Dir
End Sub

Open in new window

Avatar of Jan Karel Pieterse
Jan Karel Pieterse
Flag of Netherlands image

Without a sample xml file and a clear indication what needs to be extracted this is hard to answer!
Avatar of Bill Henderson


Hmm - OK here is the link to the previous question:

But the code above is modified from the final code in this previous answer.

I'm also re-attaching the XML file, but again, I just need to understand in the current code, how I would change to a different tag on each node.

Thanks for the help!

I would use the Microsoft XML library to properly navigate the XML.
But unfortunately, it doesn't like the structure of the XML.

After removing this part entirely:

<?dctm xml_app="ignore"?>
<!DOCTYPE tree [
<!ATTLIST node
  template CDATA #REQUIRED
      Doc_Link CDATA ""
      CID CDATA ""
      Book CDATA "">
<!ELEMENT tree (node)+>

I could use the code shown below to get a message of each value in the Text element of each node tag.
Option Explicit

Sub ImportXML()
'Set a reference to the Microsoft XML 6.0 library
    Dim oDOM As MSXML2.DOMDocument60
    Dim vFilename As Variant
    Dim nNodeList As MSXML2.IXMLDOMNodeList
    Dim nNode As MSXML2.IXMLDOMNode
    vFilename = Application.GetOpenFilename("XML Files (*.xml), *.xml", , "Please select an XML file", , False)
    If TypeName(vFilename) = "String" Then
        If Len(Dir(vFilename)) = 0 Then Exit Sub
        Set oDOM = New DOMDocument60
        oDOM.Load vFilename
        Set nNodeList = oDOM.getElementsByTagName("node")
        If nNodeList.Length > 0 Then
            For Each nNode In nNodeList
                MsgBox nNode.Attributes(1).NodeValue
        End If
    End If
End Sub

Open in new window

OK - my first post has a code snippet.

How is that snippet grabbing the values in the "text" tags - in other words, each XML node has text="some value" but the code snippet does not appear to look for a tag name. So it's somehow finding the text="values" and making a list of unique values in Excel. All I'm looking for is to take a different tag - this time CID="values" which does not appear in every node, but otherwise is essentially the same requirement I had for the first code snippet.

I was thinking this is a couple of code line changes and nothing more, but anything I tried seemed to break the code.

Any advice on the existing snippet?


The detection of which line it wants is here:
  If InStr(strData(I), "<node template=") Then

Change that to:

  If InStr(strData(I), " CID=""") Then

and go from there
I'm sorry - that's not correct. I went ahead and tried what you said nonetheless, and it pulls the exact same data.

Here is a sample node:

<node template="Doc_Frame" text="Please run the Recovery Solution from the partition. Click here for instructions." ID="3.2.1" Log="Trying Recovery from the partition" ShowCheckBox="False" EndLevel2Code="" PID="3.2" CID="c01895783" Book="#RestoreWithoutWindows"  />

The current code grabs the value in text=

I'm wanting the CID="c01895783"

Does that make sense?

Changing the line to:

 If InStr(strData(I), "CID=") Then

gives me exactly the same result, which are the values in the text= tag.


It may be that the original code worked because it simply grabbed the first tag= value after <node-template=" - it doesn't grab the node-template value - it grabs the subsequent text="value" - so perhaps the current code can't simply be4 modified to search for CID= since CID tag comes much later in each node...
Avatar of Jan Karel Pieterse
Jan Karel Pieterse
Flag of Netherlands image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hmm - I don't wish to change the XML file at all.

Our proprietary system uses that information while publishing the file for better or worse.
The XML file is NOT changed. The code reads the file into memory, then -in memory- removes the bit of the XML that the remainder of the code does not like (and is of no importance to the actual content). Then the XML is stepped through to find the nodes you need and it puts them in some cells in Excel.
But the XML file itself remains untouched.
Thanks for the help!