Complex XPATH filter checking child node count and value, MS ACCESS VBA

Hi All,

I would like help trying to get this working.

The background of the requirement: I'm receiving a 100MB (35,000 unique ID nodes) response XML file from the source, containing a list of errors. I have no control over the XML's creation. Looping through the XML takes about 13 minutes, at about 45nodes/s. From what I understand of DOMDocument it has poor performance with large files.

I can significantly cut down the nodes that actually need to be parsed by ignoring IDs with single insignificant errors I know I wish to ignore. Unfortunately at the moment an 'insignificant' error is being returned for almost every ID. Some IDs will then also have other errors.

I'd like to know whether I can use xPath to return a nodelist collection of <RepeatingContainer> excluding those<RepeatingContainer> nodes which have a single <Errors> child node with a specific error code for its child node <ErrorCode>. If I can I can then bulk-process the excluded nodes by process of elimination, as I also have a list of the IDs being returned.

So in the example XML I provided, ID 001 would be ignored however ID 002 would not as it has 2 errors and the 2nd error might be significant.

Private Sub TestXPath()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917091
Const AnOtherError As Long = 191
Const XML As String = _
  "<RootContainer>" & _
    "<RepeatingContainer>" & _
      "<ID>001</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
      "</Errors>" & _
      "<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
    "</RepeatingContainer>" & _
    "<RepeatingContainer>" & _
      "<ID>002</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
      "</Errors>" & _
      "<Errors>" & _
        "<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & AnOtherError & "</ErrorCode>" & _
      "</Errors>" & _
      "<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
    "</RepeatingContainer>" & _
"</RootContainer>"



'Const XPATH As String = "//RepeatingContainer"
'Const XPATH As String = "//RepeatingContainer[Errors/NodeE1='Data']"
'Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & ErrorCodeRequiredAsOnlyNode & "']"
Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & AnOtherError & "']"
'Const XPATH As String = "//RepeatingContainer[count(Errors/ErrorCode)>1]"


Dim o As New DOMDocument
Dim NL         As IXMLDOMNodeList
Dim Node               As IXMLDOMNode



o.loadXML XML

Set NL = o.selectNodes(XPATH)
For Each Node In NL
  If NL.length > 0 Then
   Debug.Print "Matched ID " & Node.selectSingleNode("ID").Text & " for xpath " & XPATH
  Else
     Debug.Print "Not (NL.length > 0)"
  End If
Next Node
Set o = Nothing

End Sub

Open in new window

katerina-pAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

zc2Commented:
If I understood you correctly, the following XPath should select only those RepeatingContainer which have either more than one Errors or even if there only one Errors, the ErrorCode is equal to the value of ErrorCodeRequiredAsOnlyNode:
"/RootContainer/RepeatingContainer[not(count(Errors) = 1 and normalize-space(Errors/ErrorCode)='" & ErrorCodeRequiredAsOnlyNode & "')]

Open in new window

"
Note, that I replaced the "//" with "/RootContainer/". If the input file is huge you should avoid those expressions which will make the processor to look on every node of the tree.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
katerina-pAuthor Commented:
zc2, thanks for your fast response. A slight misunderstanding but it's academic - I need where only one node so I've removed the 'not' in your xpath

However, Access throws exception on the line Set NL = o.selectNodes(XPATH)

?err.Number, err.Description
-2147467259   Unknown method.
/RootContainer/RepeatingContainer[(-->count(Errors<--) = 1 and normalize-space(Errors/ErrorCode)='21917091')]

Open in new window


It appears to be with the count function. Is that valid within VBA and DOMDocument?
zc2Commented:
Try to insert the following line before the o.selectNodes()
o.SetProperty "SelectionLanguage", "XPath"

Open in new window

Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

katerina-pAuthor Commented:
Try to insert the following line before the o.selectNodes()

o.SetProperty "SelectionLanguage", "XPath"

Very good! That does that trick. However, this does not work if the XML has the initial line              

<?xml version="1.0" encoding="UTF-8"?>

Open in new window


Can you explain why/resolve?

Thank you!

K.
zc2Commented:
No I can not explain that. What exactly does not work? The selectNodes() method returns other that expected?
I tried to execute your script with that XML declaration prepended, it gives the same result.
The only change I made, I replaced double quotes with single quotes to avoid escaping in the VB string.

Const XML As String = _
  "<?xml version='1.0' encoding='UTF-8'?><RootContainer>" & _
katerina-pAuthor Commented:
I'm getting varying changes which I don't understand.

Hopefully I'm doing something very silly! xPath(2) is (I think) what I'm looking for. Have attached actual file snippet. So I really want to return only the final node of the XML, for CorrelationID 9780300124606

Private Sub TestXPath_AFPI()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917164
Const NumToCheck As Integer = 2
Dim x As Integer
Dim xPath(0 To NumToCheck) As String
xPath(0) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse/Ack"
xPath(1) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning')]"
xPath(2) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning' and count(Errors) = 1 and normalize-space(Errors/ErrorCode)='" & ErrorCodeRequiredAsOnlyNode & "')]"

Dim o As New DOMDocument
Dim NL As IXMLDOMNodeList
Dim Node As IXMLDOMNode

  o.Load "H:/AFPI_responses.xml"
  o.SetProperty "SelectionLanguage", "XPath"
  For x = 0 To NumToCheck
  Set NL = o.selectNodes(xPath(x))
  If NL.length > 0 Then
    For Each Node In NL
      Debug.Print "Matched ID " & Node.selectSingleNode("CorrelationID").Text & " for xpath(" & x & ") " & xPath(x)
    Next Node
  Else
    Debug.Print "No matching nodes for xpath(" & x & ") " & xPath(x)
  End If
  Next x
  Set o = Nothing
End Sub

Open in new window


My output:
No matching nodes for xpath(0) /BulkDataExchangeResponses/AddFixedPriceItemResponse/Ack
No matching nodes for xpath(1) /BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning')]
No matching nodes for xpath(2) /BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning' and count(Errors) = 1 and normalize-space(Errors/ErrorCode)='21917164')]
AFPI-responses.xml
zc2Commented:
I see, there is a different XML input now. And this time it has a default namespace xmlns="urn:ebay:apis:eBLBaseComponents"

To make queries in an XML with a namespace,
you have to set the "SelectionNamespaces" property first.

To do so call the o.SetProperty() method, pass the property name ("SelectionNamespaces")
and the namespace with some alias (I named it "e") like follows:
o.SetProperty "SelectionNamespaces", "xmlns:e='urn:ebay:apis:eBLBaseComponents'"

Open in new window

Then use the alias in your XPath expressions:
xPath(1) = "/e:BulkDataExchangeResponses/e:AddFixedPriceItemResponse[(e:Ack='Warning')]"

Open in new window

katerina-pAuthor Commented:
zc2, thank you.

Indeed, you already answered my original, more generic question in 40720865 and 40721209. I had seemingly then not properly escaped the double quotes. Thank you for that help. I was previously able to parse this xml without needing to specify anything about the namespace therefore nor use aliases I guess. If you know: can you bypass this and/or will there be a performance issue?

Thank you for your help! Much appreciated
zc2Commented:
You are welcome.
You need to set the namespace with an alias only if you work with XPath using the selectNodes() and selectSingleNode() methods. You don't need the namespace alias if you traversing the loaded tree using the childNodes collections.
I have no idea how having a namespace affects the performance.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.