Link to home
Start Free TrialLog in
Avatar of katerina-p
katerina-p

asked on

Complex XPATH filter checking child node count and value, MS ACCESS VBA

Hi All,

I would like help trying to get this working.

The background of the requirement: I'm receiving a 100MB (35,000 unique ID nodes) response XML file from the source, containing a list of errors. I have no control over the XML's creation. Looping through the XML takes about 13 minutes, at about 45nodes/s. From what I understand of DOMDocument it has poor performance with large files.

I can significantly cut down the nodes that actually need to be parsed by ignoring IDs with single insignificant errors I know I wish to ignore. Unfortunately at the moment an 'insignificant' error is being returned for almost every ID. Some IDs will then also have other errors.

I'd like to know whether I can use xPath to return a nodelist collection of <RepeatingContainer> excluding those<RepeatingContainer> nodes which have a single <Errors> child node with a specific error code for its child node <ErrorCode>. If I can I can then bulk-process the excluded nodes by process of elimination, as I also have a list of the IDs being returned.

So in the example XML I provided, ID 001 would be ignored however ID 002 would not as it has 2 errors and the 2nd error might be significant.

Private Sub TestXPath()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917091
Const AnOtherError As Long = 191
Const XML As String = _
  "<RootContainer>" & _
    "<RepeatingContainer>" & _
      "<ID>001</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
      "</Errors>" & _
      "<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
    "</RepeatingContainer>" & _
    "<RepeatingContainer>" & _
      "<ID>002</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
      "</Errors>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & AnOtherError & "</ErrorCode>" & _
      "</Errors>" & _
      "<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
    "</RepeatingContainer>" & _
"</RootContainer>"



'Const XPATH As String = "//RepeatingContainer"
'Const XPATH As String = "//RepeatingContainer[Errors/NodeE1='Data']"
'Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & ErrorCodeRequiredAsOnlyNode & "']"
Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & AnOtherError & "']"
'Const XPATH As String = "//RepeatingContainer[count(Errors/ErrorCode)>1]"


Dim o As New DOMDocument
Dim NL         As IXMLDOMNodeList
Dim Node               As IXMLDOMNode



o.loadXML XML

Set NL = o.selectNodes(XPATH)
For Each Node In NL
  If NL.length > 0 Then
   Debug.Print "Matched ID " & Node.selectSingleNode("ID").Text & " for xpath " & XPATH
  Else
     Debug.Print "Not (NL.length > 0)"
  End If
Next Node
Set o = Nothing

End Sub

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of zc2
zc2
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of katerina-p
katerina-p

ASKER

zc2, thanks for your fast response. A slight misunderstanding but it's academic - I need where only one node so I've removed the 'not' in your xpath

However, Access throws exception on the line Set NL = o.selectNodes(XPATH)

?err.Number, err.Description
-2147467259   Unknown method.
/RootContainer/RepeatingContainer[(-->count(Errors<--) = 1 and normalize-space(Errors/ErrorCode)='21917091')]

Open in new window


It appears to be with the count function. Is that valid within VBA and DOMDocument?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Try to insert the following line before the o.selectNodes()

o.SetProperty "SelectionLanguage", "XPath"

Very good! That does that trick. However, this does not work if the XML has the initial line              

<?xml version="1.0" encoding="UTF-8"?>

Open in new window


Can you explain why/resolve?

Thank you!

K.
No I can not explain that. What exactly does not work? The selectNodes() method returns other that expected?
I tried to execute your script with that XML declaration prepended, it gives the same result.
The only change I made, I replaced double quotes with single quotes to avoid escaping in the VB string.

Const XML As String = _
  "<?xml version='1.0' encoding='UTF-8'?><RootContainer>" & _
I'm getting varying changes which I don't understand.

Hopefully I'm doing something very silly! xPath(2) is (I think) what I'm looking for. Have attached actual file snippet. So I really want to return only the final node of the XML, for CorrelationID 9780300124606

Private Sub TestXPath_AFPI()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917164
Const NumToCheck As Integer = 2
Dim x As Integer
Dim xPath(0 To NumToCheck) As String
xPath(0) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse/Ack"
xPath(1) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning')]"
xPath(2) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning' and count(Errors) = 1 and normalize-space(Errors/ErrorCode)='" & ErrorCodeRequiredAsOnlyNode & "')]"

Dim o As New DOMDocument
Dim NL As IXMLDOMNodeList
Dim Node As IXMLDOMNode

  o.Load "H:/AFPI_responses.xml"
  o.SetProperty "SelectionLanguage", "XPath"
  For x = 0 To NumToCheck
  Set NL = o.selectNodes(xPath(x))
  If NL.length > 0 Then
    For Each Node In NL
      Debug.Print "Matched ID " & Node.selectSingleNode("CorrelationID").Text & " for xpath(" & x & ") " & xPath(x)
    Next Node
  Else
    Debug.Print "No matching nodes for xpath(" & x & ") " & xPath(x)
  End If
  Next x
  Set o = Nothing
End Sub

Open in new window


My output:
No matching nodes for xpath(0) /BulkDataExchangeResponses/AddFixedPriceItemResponse/Ack
No matching nodes for xpath(1) /BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning')]
No matching nodes for xpath(2) /BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning' and count(Errors) = 1 and normalize-space(Errors/ErrorCode)='21917164')]
AFPI-responses.xml
I see, there is a different XML input now. And this time it has a default namespace xmlns="urn:ebay:apis:eBLBaseComponents"

To make queries in an XML with a namespace,
you have to set the "SelectionNamespaces" property first.

To do so call the o.SetProperty() method, pass the property name ("SelectionNamespaces")
and the namespace with some alias (I named it "e") like follows:
o.SetProperty "SelectionNamespaces", "xmlns:e='urn:ebay:apis:eBLBaseComponents'"

Open in new window

Then use the alias in your XPath expressions:
xPath(1) = "/e:BulkDataExchangeResponses/e:AddFixedPriceItemResponse[(e:Ack='Warning')]"

Open in new window

zc2, thank you.

Indeed, you already answered my original, more generic question in 40720865 and 40721209. I had seemingly then not properly escaped the double quotes. Thank you for that help. I was previously able to parse this xml without needing to specify anything about the namespace therefore nor use aliases I guess. If you know: can you bypass this and/or will there be a performance issue?

Thank you for your help! Much appreciated
You are welcome.
You need to set the namespace with an alias only if you work with XPath using the selectNodes() and selectSingleNode() methods. You don't need the namespace alias if you traversing the loaded tree using the childNodes collections.
I have no idea how having a namespace affects the performance.