Help setting up script and query for XML

Hi,

I would like to set up a script to query a publicly available XML file. I have used Microsoft Access in the past with scripts etc. but it's been a while, so I am hoping someone can point me in the right direction.

There is a publicly available XML file which is released from the Australian Business Register every few weeks.

https://data.gov.au/dataset/abn-bulk-extract (available from the links on the page titled 'ABN Bulk Extract Part 1' and 'ABN Bulk Extract Part 2')

It lists all of the Australian Business Numbers (ABNs) for Australian businesses.

What I would like to do is extract the new Australian Private Companies which have been registered in the state of the ACT (Australian Capital Territory) during the last release period.

I envisaged I would be able to do this by querying the following fields in the XML file:

EntityTypeIn = PRV
EntityTypeText = Australian Private Company
State = ACT
ABNStatusFromDate = 20180117

The latest release date is the 31 January 2018 and the previous release date was the 16 January 2018, so I thought I could extract the dates for businesses registered in between those release dates, 20180117, 20180118, 20180119, and so forth.

I have downloaded the XML and imported using the Get External Data function, however it doesn't import the folder structures correctly so I couldn't query the ABNStatusFromDate.

I realize it is possible to work around this by using a script but I am not familiar enough to be able to do this. The files are quite large and I wasn't able to import into Excel because it ran out of memory.

If someone can help with this it would be greatly appreciated.

Many thanks Cathryn
LetsLearnAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

John TsioumprisSoftware & Systems EngineerCommented:
I think you need to take a look at the guide from Microsoft..the paragraph "How Do I Traverse a Document?" is what you are looking.
1
aikimarkCommented:
Have you written any XPath or worked with an DOM Document?
0
aikimarkCommented:
State = ACT
Do you mean status=ACT
0
Introducing the "443 Security Simplified" Podcast

This new podcast puts you inside the minds of leading white-hat hackers and security researchers. Hosts Marc Laliberte and Corey Nachreiner turn complex security concepts into easily understood and actionable insights on the latest cyber security headlines and trends.

aikimarkCommented:
If the following works for the first XML file in the downloaded zip file, then it can be packaged to loop through all the XML files once they've been extracted from the zip file.  Such functional packaging should include a parameter for the date, unless the date can be determined programmatically.
select-xml -Path "C:\Users\Mark\Downloads\20180131_Public01.xml" `
    -XPath '//ABR[ABN[@status="ACT" and @ABNStatusFromDate>"20180116"] | EntityType[EntityTypeIn="PRV" and EntityTypeText="Australian Private Company"]]' |
% {$_.node.outerxml} > "C:\Users\Mark\Downloads\20180131_Public01_New.xml"

Open in new window

The PS script produced a 1225 line file that looks like this:
<ABR recordLastUpdatedDate="20180129" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180127">11001845059</ABN><EntityType><EntityTypeInd>IND</EntityTypeInd><EntityTypeText>Individual/Sole Trader</EntityTypeText></EntityType><LegalEntity><IndividualName type="LGL"><GivenName>ADVAITA</GivenName><GivenName>JUBILEE</GivenName><FamilyName>FOORD</FamilyName></IndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2093</Postcode></AddressDetails></BusinessAddress></LegalEntity><OtherEntity><NonIndividualName type="TRD"><NonIndividualNameText>EXPRESS DISPENSARY</NonIndividualNameText></NonIndividualName></OtherEntity></ABR>
<ABR recordLastUpdatedDate="20180130" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180130">11008259859</ABN><EntityType><EntityTypeInd>IND</EntityTypeInd><EntityTypeText>Individual/Sole Trader</EntityTypeText></EntityType><LegalEntity><IndividualName type="LGL"><GivenName>ANNE</GivenName><GivenName>VERONICA</GivenName><FamilyName>LONDON</FamilyName></IndividualName><BusinessAddress><AddressDetails><State>WA</State><Postcode>6208</Postcode></AddressDetails></BusinessAddress></LegalEntity><OtherEntity><NonIndividualName type="BN"><NonIndividualNameText>AROMA BODY BALANCE</NonIndividualNameText></NonIndividualName></OtherEntity></ABR>
<ABR recordLastUpdatedDate="20180124" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180123">11030625056</ABN><EntityType><EntityTypeInd>IND</EntityTypeInd><EntityTypeText>Individual/Sole Trader</EntityTypeText></EntityType><LegalEntity><IndividualName type="LGL"><GivenName>LAI MAN</GivenName><FamilyName>LO</FamilyName></IndividualName><BusinessAddress><AddressDetails><State>VIC</State><Postcode>3805</Postcode></AddressDetails></BusinessAddress></LegalEntity><OtherEntity><NonIndividualName type="TRD"><NonIndividualNameText>Asia Pacific Trading</NonIndividualNameText></NonIndividualName></OtherEntity></ABR>

Open in new window

0
aikimarkCommented:
output file produced
20180131_Public01_New.xml
0
LetsLearnAuthor Commented:
Hi aikimark,

Thanks so much for your help with this.

In answer to your earlier question no I haven't written any XPath or worked with an DOM Document previously. I have done some modules in Access years ago, so it's been a while.

It should be <State>ACT</State> from the Business Address rather than status="ACT".

If I am adding this in as a module what should I have as the opening and closing statements for the module? Sounds basic, but it's a long time.

Thanks once again,
Cathryn
0
aikimarkCommented:
This is Powershell, not VBA
0
LetsLearnAuthor Commented:
Thank you aikimark, works perfectly, really appreciate your help.
0
aikimarkCommented:
I think there's a problem with the XPath string.  Please test this version.
select-xml -Path "C:\Users\Mark\Downloads\20180131_Public01.xml" `
    -XPath '//ABR[./ABN[@ABNStatusFromDate>"20180116"]][./EntityType[EntityTypeInd="PRV" and EntityTypeText="Australian Private Company"]][.//AddressDetails[State="ACT"]]'
% {$_.node.outerxml} > "C:\Users\Mark\Downloads\20180131_Public01_New.xml"

Open in new window

I didn't have the State="ACT" condition in the accepted comment.

I've been playing with a VBA version of this, trying to tweak the performance.
0
LetsLearnAuthor Commented:
I was able to revise the string to return the ABN[@ABNStatusFromDate="20180116", however I have struggled to be able to include the other fields.

I tried the above string but it is returning an empty file.

When I revised the string to return the first field I closed the question thinking I would be able to work out the remaining fields but have struggled.
0
aikimarkCommented:
please request that the question be reopened
0
aikimarkCommented:
to return the first field
When it works properly, it will output the ABR node/element text, based on the date and other criteria.  What are you trying to output?
0
LetsLearnAuthor Commented:
I would like to output the lines where:

State = ACT
ABNStatusFromDate= "20180116" through to "20180131"
EntityTypeInd="PRV"
EntityTypeText="Australian Private Company"
0
LetsLearnAuthor Commented:
This works for extracting an XML with the date criteria only. Not sure how to add in the extra criteria/fields.

select-xml -Path "C:\Users\cathr\Desktop\extract\20180131_Public01.xml" `
    -XPath '//ABR[ABN[@ABNStatusFromDate="20180116"]]' |
% {$_.node.outerxml} > "C:\Users\cathr\Desktop\extract\20180116_Public01_1.xml"

Open in new window


I'll request to have the question re-opened.
0
aikimarkCommented:
This is a version that uses your paths.  I broke up the XPath onto separate lines.  Since you want to include 20180116, I changed the date from my prior comment.  Running this on my laptop, it completed in 38 seconds and produced three lines. (see below)

Note: The 38 seconds is about as fast as I can get on my laptop in any programming environment.
select-xml -Path "C:\Users\cathr\Desktop\extract\20180131_Public01.xml" `
        -XPath ('//ABR' + `
                '[./ABN[@ABNStatusFromDate>"20180115"]]' + `
                '[./EntityType[EntityTypeInd="PRV" and EntityTypeText="Australian Private Company"]]' + `
                '[.//AddressDetails[State="ACT"]]') |
% {$_.node.outerxml} > "C:\Users\cathr\Desktop\extract\20180116_Public01_1.xml"

Open in new window


Output:
<ABR recordLastUpdatedDate="20180124" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180124">11623790189</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>PMG &amp; CMC HOLDINGS PTY LIMITED</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>ACT</State><Postcode>2612</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">623790189</ASICNumber></ABR>
<ABR recordLastUpdatedDate="20180122" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180122">12623892259</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>RUNTO INVESTMENT PTY LTD</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>ACT</State><Postcode>2913</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">623892259</ASICNumber></ABR>
<ABR recordLastUpdatedDate="20180117" replaced="N"><ABN status="ACT" ABNStatusFromDate="20180117">14623872391</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>SHINING PAINTING SERVICES PTY LTD</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>ACT</State><Postcode>2617</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">623872391</ASICNumber><GST status="ACT" GSTStatusFromDate="20180117" /></ABR>

Open in new window

1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
LetsLearnAuthor Commented:
That's perfect Mark! Thank you!
0
aikimarkCommented:
When they reopen the question, accept the most recent solution comment.
0
LetsLearnAuthor Commented:
I sent through a support request to have the question re-opened, and received an email referring me to use a 'help bell' on the page. I can't see a help bell, or am I looking in the wrong place?
0
aikimarkCommented:
I don't know what 'bell' they are referring to, either.  I usually click the Report Question link near your question text.  Maybe they will clarify if you reply to their email.
0
LetsLearnAuthor Commented:
Thanks Scott
0
LetsLearnAuthor Commented:
Thanks Mark this works perfectly
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.