Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 469
  • Last Modified:

more XML to Excel data manipulation

Hi,

Had an expert help create the code snippet attached and it does a nice job of pulling values from a particular tag out of a bunch of XML files.

However the snippet doesn't appear to look for the actual tag name "Text" which happens to be the first tag in each node and from where we are pulling our values.

But what if we wanted to pull out the 7th tag values instead of the first? Again, I don't seem to see where the tag name is spelled out anywhere. This new tag is named CID and is in the 7th position IF present, in the XML nodes in each file.

What am I missing? Where would I point to a different tag value within my XML nodes?

Thanks

Bill
Private Sub CommandButton1_Click()
    Dim strPath As String, strCurrentFile As String
    Dim MyData As String, strData() As String, strFile As String
    Dim temp() As String
    Dim I As Long, LastRow As Long
    Dim ws As Worksheet
    
    '~~> Change the path to the folder where the XML's are stored
    strPath = "C:\Temp\"
    
    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    
    strCurrentFile = Dir(strPath & "c*.xml")
    
    '~~> Loop through the folder to get the xml files
    Do While strCurrentFile <> ""
        strFile = strPath & strCurrentFile
        
        Set ws = Sheets.Add
        ws.Name = Replace(strCurrentFile, ".xml", "", , , vbTextCompare)
        ws.Range("A1") = "File"
        ws.Range("B1") = "Text"
        
        '~~> Open XML as binary for faster reading in one go
        Open strFile For Binary As #1
        MyData = Space$(LOF(1))
        Get #1, , MyData
        Close #1
        
        '~~> Split the data and store it in an array
        strData() = Split(MyData, vbCrLf)
        
        '~~> Get the next available row
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        
        '~~> Generate output WITH DUPLICATES
        For I = 0 To UBound(strData())
            If InStr(strData(I), "<node template=") Then
                temp = Split(strData(I), """")
                ws.Range("A" & LastRow) = strFile
                ws.Range("B" & LastRow) = temp(3)
                LastRow = LastRow + 1
            End If
        Next
        
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        
        '~~> Remove DUPLICATES
        ws.Range("$A$1:$B$" & LastRow).RemoveDuplicates Columns:=Array(1, 2), Header _
        :=xlYes
        
        ws.Cells.EntireColumn.AutoFit
        
        Application.ScreenUpdating = True
        Application.DisplayAlerts = True
    
        strCurrentFile = Dir
    Loop
End Sub

Open in new window

0
billium99
Asked:
billium99
  • 6
  • 5
1 Solution
 
jkpieterseCommented:
Without a sample xml file and a clear indication what needs to be extracted this is hard to answer!
0
 
billium99Author Commented:
Hmm - OK here is the link to the previous question:

http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_26803892.html

But the code above is modified from the final code in this previous answer.

I'm also re-attaching the XML file, but again, I just need to understand in the current code, how I would change to a different tag on each node.

Thanks for the help!

Bill
sample.xml
0
 
jkpieterseCommented:
I would use the Microsoft XML library to properly navigate the XML.
But unfortunately, it doesn't like the structure of the XML.

After removing this part entirely:

<?dctm xml_app="ignore"?>
<!DOCTYPE tree [
<!ELEMENT node EMPTY>
<!ATTLIST node
  template CDATA #REQUIRED
  text CDATA #REQUIRED
      ID CDATA #REQUIRED
      Log CDATA #REQUIRED
  ShowCheckBox CDATA #REQUIRED
  EndLevel2Code CDATA #REQUIRED
      PID CDATA #REQUIRED
      Doc_Link CDATA ""
      CID CDATA ""
      Book CDATA "">
<!ELEMENT tree (node)+>
]>

I could use the code shown below to get a message of each value in the Text element of each node tag.
Option Explicit

Sub ImportXML()
'Set a reference to the Microsoft XML 6.0 library
    Dim oDOM As MSXML2.DOMDocument60
    Dim vFilename As Variant
    Dim nNodeList As MSXML2.IXMLDOMNodeList
    Dim nNode As MSXML2.IXMLDOMNode
    vFilename = Application.GetOpenFilename("XML Files (*.xml), *.xml", , "Please select an XML file", , False)
    If TypeName(vFilename) = "String" Then
        If Len(Dir(vFilename)) = 0 Then Exit Sub
        Set oDOM = New DOMDocument60
        oDOM.Load vFilename
        Set nNodeList = oDOM.getElementsByTagName("node")
        If nNodeList.Length > 0 Then
            For Each nNode In nNodeList
                MsgBox nNode.Attributes(1).NodeValue
            Next
        End If
    End If
End Sub

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
billium99Author Commented:
OK - my first post has a code snippet.

How is that snippet grabbing the values in the "text" tags - in other words, each XML node has text="some value" but the code snippet does not appear to look for a tag name. So it's somehow finding the text="values" and making a list of unique values in Excel. All I'm looking for is to take a different tag - this time CID="values" which does not appear in every node, but otherwise is essentially the same requirement I had for the first code snippet.

I was thinking this is a couple of code line changes and nothing more, but anything I tried seemed to break the code.

Any advice on the existing snippet?

Thanks

Bill
0
 
jkpieterseCommented:
The detection of which line it wants is here:
  If InStr(strData(I), "<node template=") Then

Change that to:

  If InStr(strData(I), " CID=""") Then

and go from there
0
 
billium99Author Commented:
I'm sorry - that's not correct. I went ahead and tried what you said nonetheless, and it pulls the exact same data.

Here is a sample node:

<node template="Doc_Frame" text="Please run the Recovery Solution from the partition. Click here for instructions." ID="3.2.1" Log="Trying Recovery from the partition" ShowCheckBox="False" EndLevel2Code="" PID="3.2" CID="c01895783" Book="#RestoreWithoutWindows"  />

The current code grabs the value in text=

I'm wanting the CID="c01895783"

Does that make sense?

Changing the line to:

 If InStr(strData(I), "CID=") Then

gives me exactly the same result, which are the values in the text= tag.

Thanks

Bill
0
 
billium99Author Commented:
It may be that the original code worked because it simply grabbed the first tag= value after <node-template=" - it doesn't grab the node-template value - it grabs the subsequent text="value" - so perhaps the current code can't simply be4 modified to search for CID= since CID tag comes much later in each node...
0
 
jkpieterseCommented:
Hi,

OK, lets combine both methods into one which:
- goes through all files
- reads them and removes the parts Microsoft XML doe snot like
- finds all nodes named "node"
- in such a node, find an attribute called "CID"
- Store its content
- removes duplicates.

Note you'll have to edit the path in my code and the file filter
Option Explicit

Private Sub CommandButton1_Click()
    Dim strPath As String, strCurrentFile As String
    Dim MyData As String, strData() As String, strFile As String
    Dim temp() As String
    Dim I As Long, LastRow As Long
    Dim ws As Worksheet
    'Set a reference to the Microsoft XML 6.0 library
    Dim oDOM As MSXML2.DOMDocument60
    Dim vFilename As Variant
    Dim nNodeList As MSXML2.IXMLDOMNodeList
    Dim nNode As MSXML2.IXMLDOMNode
    Dim bTreeFound As Boolean
    Dim sXML As String
    Dim aAttrib As MSXML2.IXMLDOMAttribute
    '~~> Change the path to the folder where the XML's are stored
    'Adjust!!!
    strPath = "C:\Temp\"

    Application.ScreenUpdating = False
    Application.DisplayAlerts = False
    'Adjust!!!
    strCurrentFile = Dir(strPath & "s*.xml")

    '~~> Loop through the folder to get the xml files
    Do While strCurrentFile <> ""
        strFile = strPath & strCurrentFile

        Set ws = Sheets.Add
        ws.Name = Replace(strCurrentFile, ".xml", "", , , vbTextCompare)
        ws.Range("A1") = "File"
        ws.Range("B1") = "Text"

        '~~> Open XML as binary for faster reading in one go
        Open strFile For Binary As #1
        MyData = Space$(LOF(1))
        Get #1, , MyData
        Close #1

        '~~> Split the data and store it in an array
        strData() = Split(MyData, vbCrLf)

        '~~> Get the next available row
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1

        'Only keep the lines so we end with proper XML
        For I = 0 To UBound(strData())
            If InStr(strData(I), "<tree ") Then
                bTreeFound = True
            End If
            If I = 0 Or bTreeFound Then
                sXML = sXML & strData(I) & vbNewLine
            End If
        Next
        sXML = Left(sXML, Len(sXML) - 1)
        Set oDOM = New DOMDocument60
        'Load XML into DOM object
        oDOM.LoadXML sXML
        'Get the nodes we need
        Set nNodeList = oDOM.getElementsByTagName("node")
        If nNodeList.Length > 0 Then
            For Each nNode In nNodeList
                For Each aAttrib In nNode.Attributes
                    'find the attribute having cid name
                    If LCase(aAttrib.nodeName) = "cid" Then
                        ws.Range("A" & LastRow) = strFile
                        ws.Range("B" & LastRow) = aAttrib.nodeValue
                        LastRow = LastRow + 1
                    End If
                Next
            Next
        End If
        LastRow = ws.Range("A" & Rows.Count).End(xlUp).Row + 1
        '~~> Remove DUPLICATES
        ws.Range("$A$1:$B$" & LastRow).RemoveDuplicates Columns:=Array(1, 2), _
                                                        Header:=xlYes
        ws.Cells.EntireColumn.AutoFit
        strCurrentFile = Dir
    Loop
    Application.ScreenUpdating = True
    Application.DisplayAlerts = True
End Sub

Open in new window

0
 
billium99Author Commented:
Hmm - I don't wish to change the XML file at all.

Our proprietary system uses that information while publishing the file for better or worse.
0
 
jkpieterseCommented:
The XML file is NOT changed. The code reads the file into memory, then -in memory- removes the bit of the XML that the remainder of the code does not like (and is of no importance to the actual content). Then the XML is stepped through to find the nodes you need and it puts them in some cells in Excel.
But the XML file itself remains untouched.
0
 
billium99Author Commented:
Thanks for the help!
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now