Large XML file extraction

I have some very large XML files which are provided from an external source. I need to search through the files to isolate certain lines based on search criteria, eg ABNStatusFromDate="20180101", <State>ACT</State> and <EntityTypeText>Australian Private Company</EntityTypeText>.

I have tried opening the files in Excel and the Windows XML Notepad reader. They crash due to the size. I can open them in the FirstObject reader very quickly and I can search one by one by each date and then double check to see if the state and entity type matches the two above criteria. But this will obviously take forever.

Is there a way I can do this quickly and extract by date and matching criteria into a smaller XML file or Excel?

I won't attach the file because of the size. Below are a couple of lines from the XML file.

Many thanks for any assistance which can be given.

<ABR recordLastUpdatedDate="20171121" replaced="N"><ABN status="ACT" ABNStatusFromDate="20171109">14622745297</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>FUMAO PROPERTY PTY LTD</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>VIC</State><Postcode>3149</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">622745297</ASICNumber><GST status="ACT" GSTStatusFromDate="20171109" /></ABR>
<ABR recordLastUpdatedDate="20000701" replaced="N"><ABN status="CAN" ABNStatusFromDate="20000228">14622746156</ABN><EntityType><EntityTypeInd>CUT</EntityTypeInd><EntityTypeText>Corporate Unit Trust</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>THE JOKA UNIT TRUST</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2320</Postcode></AddressDetails></BusinessAddress></MainEntity></ABR>
<ABR recordLastUpdatedDate="20150202" replaced="N"><ABN status="CAN" ABNStatusFromDate="20150202">14622746237</ABN><EntityType><EntityTypeInd>IND</EntityTypeInd><EntityTypeText>Individual/Sole Trader</EntityTypeText></EntityType><LegalEntity><IndividualName type="LGL"><GivenName>BOUASRI</GivenName><FamilyName>VILLAROSA</FamilyName></IndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2560</Postcode></AddressDetails></BusinessAddress></LegalEntity><OtherEntity><NonIndividualName type="TRD"><NonIndividualNameText>Villarosa Photography</NonIndividualNameText></NonIndividualName></OtherEntity></ABR>
<ABR recordLastUpdatedDate="20171109" replaced="N"><ABN status="ACT" ABNStatusFromDate="20171109">14622749795</ABN><EntityType><EntityTypeInd>PRV</EntityTypeInd><EntityTypeText>Australian Private Company</EntityTypeText></EntityType><MainEntity><NonIndividualName type="MN"><NonIndividualNameText>3 DU YOGA PTY LTD</NonIndividualNameText></NonIndividualName><BusinessAddress><AddressDetails><State>NSW</State><Postcode>2150</Postcode></AddressDetails></BusinessAddress></MainEntity><ASICNumber ASICNumberType="undetermined">622749795</ASICNumber></ABR>
<ABR recordLastUpdatedDate="20030601" replaced="N"><ABN status="CAN" ABNStatusFromDate="20030601">14622750886</ABN><EntityType><EntityTypeInd>IND</EntityTypeInd><EntityTypeText>Individual/Sole Trader</EntityTypeText></EntityType><LegalEntity><IndividualName type="LGL"><NameTitle>MS</NameTitle><GivenName>JANETTE</GivenName><GivenName>ROSEMARY</GivenName><FamilyName>CAMUGLIA</FamilyName></IndividualName><BusinessAddress><AddressDetails><State>QLD</State><Postcode>4059</Postcode></AddressDetails></BusinessAddress></LegalEntity></ABR>
LetsLearnAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
Is there a way I can do this quickly and extract by date and matching criteria into a smaller XML file or Excel?
not sure about loading XML to Excel, but certainly it can be loaded into MS SQL, from there we probably can do a query thing to filter what you want or what you don't want.

hope that make sense?
0
LetsLearnAuthor Commented:
I didn't think of doing that Ryan. Yes that would be great.
0
Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
something like this:

declare @Output VARCHAR(MAX)
set @Output = '
<ABR recordLastUpdatedDate="20171121" replaced="N">
	<ABN status="ACT" ABNStatusFromDate="20171109">14622745297</ABN>
	<EntityType>
		<EntityTypeInd>PRV</EntityTypeInd>
		<EntityTypeText>Australian Private Company</EntityTypeText>
	</EntityType>
	<MainEntity>
		<NonIndividualName type="MN">
			<NonIndividualNameText>FUMAO PROPERTY PTY LTD</NonIndividualNameText>
		</NonIndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>VIC</State>
				<Postcode>3149</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</MainEntity>
	<ASICNumber ASICNumberType="undetermined">622745297</ASICNumber>
	<GST status="ACT" GSTStatusFromDate="20171109" />
</ABR>
<ABR recordLastUpdatedDate="20000701" replaced="N">
	<ABN status="CAN" ABNStatusFromDate="20000228">14622746156</ABN>
	<EntityType>
		<EntityTypeInd>CUT</EntityTypeInd>
		<EntityTypeText>Corporate Unit Trust</EntityTypeText>
	</EntityType>
	<MainEntity>
		<NonIndividualName type="MN">
			<NonIndividualNameText>THE JOKA UNIT TRUST</NonIndividualNameText>
		</NonIndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>NSW</State>
				<Postcode>2320</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</MainEntity>
</ABR>
<ABR recordLastUpdatedDate="20150202" replaced="N">
	<ABN status="CAN" ABNStatusFromDate="20150202">14622746237</ABN>
	<EntityType>
		<EntityTypeInd>IND</EntityTypeInd>
		<EntityTypeText>Individual/Sole Trader</EntityTypeText>
	</EntityType>
	<LegalEntity>
		<IndividualName type="LGL">
			<GivenName>BOUASRI</GivenName>
			<FamilyName>VILLAROSA</FamilyName>
		</IndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>NSW</State>
				<Postcode>2560</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</LegalEntity>
	<OtherEntity>
		<NonIndividualName type="TRD">
			<NonIndividualNameText>Villarosa Photography</NonIndividualNameText>
		</NonIndividualName>
	</OtherEntity>
</ABR>
<ABR recordLastUpdatedDate="20171109" replaced="N">
	<ABN status="ACT" ABNStatusFromDate="20171109">14622749795</ABN>
	<EntityType>
		<EntityTypeInd>PRV</EntityTypeInd>
		<EntityTypeText>Australian Private Company</EntityTypeText>
	</EntityType>
	<MainEntity>
		<NonIndividualName type="MN">
			<NonIndividualNameText>3 DU YOGA PTY LTD</NonIndividualNameText>
		</NonIndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>NSW</State>
				<Postcode>2150</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</MainEntity>
	<ASICNumber ASICNumberType="undetermined">622749795</ASICNumber>
</ABR>
<ABR recordLastUpdatedDate="20030601" replaced="N">
	<ABN status="CAN" ABNStatusFromDate="20030601">14622750886</ABN>
	<EntityType>
		<EntityTypeInd>IND</EntityTypeInd>
		<EntityTypeText>Individual/Sole Trader</EntityTypeText>
	</EntityType>
	<LegalEntity>
		<IndividualName type="LGL">
			<NameTitle>MS</NameTitle>
			<GivenName>JANETTE</GivenName>
			<GivenName>ROSEMARY</GivenName>
			<FamilyName>CAMUGLIA</FamilyName>
		</IndividualName>
		<BusinessAddress>
			<AddressDetails>
				<State>QLD</State>
				<Postcode>4059</Postcode>
			</AddressDetails>
		</BusinessAddress>
	</LegalEntity>
</ABR>
'

DECLARE @XML XML

SET @XML = CONVERT (XML, @Output)
	
SELECT 
	c.value('(./@recordLastUpdatedDate)', 'VARCHAR(MAX)') recordLastUpdatedDate,
	c.value('(./@replaced)', 'VARCHAR(MAX)') replaced,
	c.value('ABN[1]', 'nvarchar(max)') ABN_Text,
	c.value('(ABN/@status)[1]', 'VARCHAR(MAX)') ABN_status,
	c.value('(ABN/@ABNStatusFromDate)[1]', 'VARCHAR(MAX)') ABNStatusFromDate,
	c.value('(EntityType/EntityTypeInd)[1]', 'nvarchar(max)') EntityTypeInd,
	c.value('(EntityType/EntityTypeText)[1]', 'nvarchar(max)') EntityTypeText
	-- other fields ...
FROM @XML.nodes('//ABR') t(c)

Open in new window


SnapShot.png
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

LetsLearnAuthor Commented:
Many thanks for doing this Ryan, I'll test it and let you know how I get on.
0
LetsLearnAuthor Commented:
Hi Ryan,

I was thinking of using this in MS Access as the files open reasonably quickly. Could I trouble you for one extra piece of help, how would I run this in Access? It's been a while since I have done anything with modules, classes and queries in Access.

Thanks in advance,
Cathryn
0
Ryan ChongBusiness Systems Analyst , ex-Senior Application EngineerCommented:
you probably can try the existing Access wizard to import the XML file:

Import XML File In Access 2010
https://www.addictivetips.com/microsoft-office/import-xml-file-in-access-2010/

but personally seldom using this feature before.

see if you can achieve that using the Wizard mentioned in article above?

anyway, still think MS SQL should be pretty fast as well but just that you need to familiar yourself with the SQL scripts I have posted above, which could be the more difficult part.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
LetsLearnAuthor Commented:
Thanks Ryan, I'm closing this as the best solution.

I might ask a similar question in the Microsoft Access area as well as it is a program I am more familiar with.

Thank you for your help.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.