katerina-p
asked on
Complex XPATH filter checking child node count and value, MS ACCESS VBA
Hi All,
I would like help trying to get this working.
The background of the requirement: I'm receiving a 100MB (35,000 unique ID nodes) response XML file from the source, containing a list of errors. I have no control over the XML's creation. Looping through the XML takes about 13 minutes, at about 45nodes/s. From what I understand of DOMDocument it has poor performance with large files.
I can significantly cut down the nodes that actually need to be parsed by ignoring IDs with single insignificant errors I know I wish to ignore. Unfortunately at the moment an 'insignificant' error is being returned for almost every ID. Some IDs will then also have other errors.
I'd like to know whether I can use xPath to return a nodelist collection of <RepeatingContainer> excluding those<RepeatingContainer> nodes which have a single <Errors> child node with a specific error code for its child node <ErrorCode>. If I can I can then bulk-process the excluded nodes by process of elimination, as I also have a list of the IDs being returned.
So in the example XML I provided, ID 001 would be ignored however ID 002 would not as it has 2 errors and the 2nd error might be significant.
I would like help trying to get this working.
The background of the requirement: I'm receiving a 100MB (35,000 unique ID nodes) response XML file from the source, containing a list of errors. I have no control over the XML's creation. Looping through the XML takes about 13 minutes, at about 45nodes/s. From what I understand of DOMDocument it has poor performance with large files.
I can significantly cut down the nodes that actually need to be parsed by ignoring IDs with single insignificant errors I know I wish to ignore. Unfortunately at the moment an 'insignificant' error is being returned for almost every ID. Some IDs will then also have other errors.
I'd like to know whether I can use xPath to return a nodelist collection of <RepeatingContainer> excluding those<RepeatingContainer> nodes which have a single <Errors> child node with a specific error code for its child node <ErrorCode>. If I can I can then bulk-process the excluded nodes by process of elimination, as I also have a list of the IDs being returned.
So in the example XML I provided, ID 001 would be ignored however ID 002 would not as it has 2 errors and the 2nd error might be significant.
Private Sub TestXPath()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917091
Const AnOtherError As Long = 191
Const XML As String = _
"<RootContainer>" & _
"<RepeatingContainer>" & _
"<ID>001</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
"<Errors>" & _
"<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
"</Errors>" & _
"<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
"</RepeatingContainer>" & _
"<RepeatingContainer>" & _
"<ID>002</ID><NodeN1>Data</NodeN1><NodeN2>Data</NodeN2>" & _
"<Errors>" & _
"<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & ErrorCodeRequiredAsOnlyNode & "</ErrorCode>" & _
"</Errors>" & _
"<Errors>" & _
"<NodeE1>Data</NodeE1><NodeE2>Data</NodeE2><ErrorCode>" & AnOtherError & "</ErrorCode>" & _
"</Errors>" & _
"<NodeN3>Data</NodeN3><NodeN4>Data</NodeN4>" & _
"</RepeatingContainer>" & _
"</RootContainer>"
'Const XPATH As String = "//RepeatingContainer"
'Const XPATH As String = "//RepeatingContainer[Errors/NodeE1='Data']"
'Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & ErrorCodeRequiredAsOnlyNode & "']"
Const XPATH As String = "//RepeatingContainer[Errors/ErrorCode='" & AnOtherError & "']"
'Const XPATH As String = "//RepeatingContainer[count(Errors/ErrorCode)>1]"
Dim o As New DOMDocument
Dim NL As IXMLDOMNodeList
Dim Node As IXMLDOMNode
o.loadXML XML
Set NL = o.selectNodes(XPATH)
For Each Node In NL
If NL.length > 0 Then
Debug.Print "Matched ID " & Node.selectSingleNode("ID").Text & " for xpath " & XPATH
Else
Debug.Print "Not (NL.length > 0)"
End If
Next Node
Set o = Nothing
End Sub
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Try to insert the following line before the o.selectNodes()
o.SetProperty "SelectionLanguage", "XPath"
Very good! That does that trick. However, this does not work if the XML has the initial line
<?xml version="1.0" encoding="UTF-8"?>
Can you explain why/resolve?
Thank you!
K.
No I can not explain that. What exactly does not work? The selectNodes() method returns other that expected?
I tried to execute your script with that XML declaration prepended, it gives the same result.
The only change I made, I replaced double quotes with single quotes to avoid escaping in the VB string.
Const XML As String = _
"<?xml version='1.0' encoding='UTF-8'?><RootCon tainer>" & _
I tried to execute your script with that XML declaration prepended, it gives the same result.
The only change I made, I replaced double quotes with single quotes to avoid escaping in the VB string.
Const XML As String = _
"<?xml version='1.0' encoding='UTF-8'?><RootCon
ASKER
I'm getting varying changes which I don't understand.
Hopefully I'm doing something very silly! xPath(2) is (I think) what I'm looking for. Have attached actual file snippet. So I really want to return only the final node of the XML, for CorrelationID 9780300124606
My output:
Hopefully I'm doing something very silly! xPath(2) is (I think) what I'm looking for. Have attached actual file snippet. So I really want to return only the final node of the XML, for CorrelationID 9780300124606
Private Sub TestXPath_AFPI()
Const ErrorCodeRequiredAsOnlyNode As Long = 21917164
Const NumToCheck As Integer = 2
Dim x As Integer
Dim xPath(0 To NumToCheck) As String
xPath(0) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse/Ack"
xPath(1) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning')]"
xPath(2) = "/BulkDataExchangeResponses/AddFixedPriceItemResponse[(Ack='Warning' and count(Errors) = 1 and normalize-space(Errors/ErrorCode)='" & ErrorCodeRequiredAsOnlyNode & "')]"
Dim o As New DOMDocument
Dim NL As IXMLDOMNodeList
Dim Node As IXMLDOMNode
o.Load "H:/AFPI_responses.xml"
o.SetProperty "SelectionLanguage", "XPath"
For x = 0 To NumToCheck
Set NL = o.selectNodes(xPath(x))
If NL.length > 0 Then
For Each Node In NL
Debug.Print "Matched ID " & Node.selectSingleNode("CorrelationID").Text & " for xpath(" & x & ") " & xPath(x)
Next Node
Else
Debug.Print "No matching nodes for xpath(" & x & ") " & xPath(x)
End If
Next x
Set o = Nothing
End Sub
My output:
No matching nodes for xpath(0) /BulkDataExchangeResponsesAFPI-responses.xml/AddFixedP riceItemRe sponse/Ack
No matching nodes for xpath(1) /BulkDataExchangeResponses/AddFixedP riceItemRe sponse[(Ac k='Warning ')]
No matching nodes for xpath(2) /BulkDataExchangeResponses/AddFixedP riceItemRe sponse[(Ac k='Warning ' and count(Errors) = 1 and normalize-space(Errors/Err orCode)='2 1917164')]
I see, there is a different XML input now. And this time it has a default namespace xmlns="urn:ebay:apis:eBLBa seComponen ts"
To make queries in an XML with a namespace,
you have to set the "SelectionNamespaces" property first.
To do so call the o.SetProperty() method, pass the property name ("SelectionNamespaces")
and the namespace with some alias (I named it "e") like follows:
To make queries in an XML with a namespace,
you have to set the "SelectionNamespaces" property first.
To do so call the o.SetProperty() method, pass the property name ("SelectionNamespaces")
and the namespace with some alias (I named it "e") like follows:
o.SetProperty "SelectionNamespaces", "xmlns:e='urn:ebay:apis:eBLBaseComponents'"
Then use the alias in your XPath expressions:xPath(1) = "/e:BulkDataExchangeResponses/e:AddFixedPriceItemResponse[(e:Ack='Warning')]"
ASKER
zc2, thank you.
Indeed, you already answered my original, more generic question in 40720865 and 40721209. I had seemingly then not properly escaped the double quotes. Thank you for that help. I was previously able to parse this xml without needing to specify anything about the namespace therefore nor use aliases I guess. If you know: can you bypass this and/or will there be a performance issue?
Thank you for your help! Much appreciated
Indeed, you already answered my original, more generic question in 40720865 and 40721209. I had seemingly then not properly escaped the double quotes. Thank you for that help. I was previously able to parse this xml without needing to specify anything about the namespace therefore nor use aliases I guess. If you know: can you bypass this and/or will there be a performance issue?
Thank you for your help! Much appreciated
You are welcome.
You need to set the namespace with an alias only if you work with XPath using the selectNodes() and selectSingleNode() methods. You don't need the namespace alias if you traversing the loaded tree using the childNodes collections.
I have no idea how having a namespace affects the performance.
You need to set the namespace with an alias only if you work with XPath using the selectNodes() and selectSingleNode() methods. You don't need the namespace alias if you traversing the loaded tree using the childNodes collections.
I have no idea how having a namespace affects the performance.
ASKER
However, Access throws exception on the line Set NL = o.selectNodes(XPATH)
Open in new window
It appears to be with the count function. Is that valid within VBA and DOMDocument?