Powershell: Using a select-xml command to get node value and tags from a .xml file

simon vitali
simon vitali used Ask the Experts™
on
HI this is a continuation of thread from before
old thread

the conditions for this project has changed a little rendering the script that Michelangelo had provided no longer useful for my cause although it worked for the previous purpose

heres what changed:
the files in questions are XML format (but are not named .xml they are .901 files)
i need to remove a whole node from the *.xml files ,  including the tag

i.e.
 <errsql yadadayda > 
random sql error the statement has terminated
</errsql>

this whole line need to be removed form the file

here is a snippet of the xml file:

<chgsql ts="07:01:58.971" op="O" jb="ADD" tb="CLL_TAB" sc="BATCH"> <fld fd="F1000" af="PAL"/>
<fld fd="F1148" af="43000135163"/>
<idx fd="F1577" af="5167599565"/>
<idx fd="F1578" af="LINK"/>
<fld fd="F1001" af="1"/>
<fld fd="F902" af="00003814"/>
</chgsql>
<errsql ts="07:01:58.986" op="O" lv="50" ob="">
<sql>INSERT INTO CLL_TAB [F1000,F1148,F1577,F1578,F1001,F902] VALUES [:F1000,:F1148,:F1577,:F1578,:F1001,:F902] </sql>
<prm>PAL,43000135163,5167599565,LINK,1,00003814,</prm>
</errsql>
<chgsql ts="07:01:58.986" op="O" jb="ADD" tb="CLL_TAB" sc="BATCH">
<fld fd="F1000" af="PAL"/>
<fld fd="F1148" af="43000135164"/>
<idx fd="F1577" af="5166282554"/>
<idx fd="F1578" af="LINK"/>
<fld fd="F1001" af="1"/>
<fld fd="F902" af="00003814"/>
</chgsql>
<errsql ts="07:01:58.986" op="O" lv="50" ob="">
<sql>INSERT INTO CLL_TAB [F1000,F1148,F1577,F1578,F1001,F902] VALUES [:F1000,:F1148,:F1577,:F1578,:F1001,:F902] </sql>
<prm>PAL,43000135164,5166282554,LINK,1,00003814,</prm>
</errsql>


here is what i have as a command thats not working so far which is Michealangelo's script modified to search using xpath

$root = 'C:\storeman\Office\ZY999901\*.*'
# set exclusions: none 
$exclude = $null
$SearchString = select-xml -Path 'C:\storeman\Office\ZY999901\*.*' -XPath //errsql | Select-Object -ExpandProperty node
# find all files which have $searchstring inside 
$object = Get-ChildItem -Path $root -Recurse -exclude $exclude | Select-String -pattern $SearchString  -list # list: first occurence
# delete all lines which contain $searchstring and overwrite source file
$object | % { (get-content  $_.path) | select-string -pattern $SearchString -notmatch | Set-Content $_.path}

Open in new window



my xpath command works on its own just fine except it doesn't include the <errsql> </errsql> tags in results , i could live with this

when i run the command in powershell ISE it runs for a very long time and doesn't return and results or errors but it doesn't do anything
pls help, thank you
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
Don't treat XML as text.
This will remove all <errsql> nodes from all .901 documents that in the folder specified.
Matching documents will be overwritten without warning.
As usual, try it on a test folder first ...
$root = 'C:\Temp\29157334'
$filter = '*.901'
Get-ChildItem -Path $root -Filter $filter |
	Select-String -Pattern '<errsql' -CaseSensitive -List |
	ForEach-Object {
		Write-Host "Processing '$($_.Path)'"
		Try {
			$xml = [xml](Get-Content -Path $_.Path)
			$xml.SelectNodes('//errsql') | ForEach-Object {
				[void]$_.ParentNode.RemoveChild($_)
			}
			$xml.Save($_.Path)
		} Catch {
			$_ | Write-Error
		}
	}

Open in new window

Author

Commented:
Hi Obda
thank you for that, it worked great! my only issue is that it takes a heck of a long time to run on one 434MB file(2hr50min) the file did get down to 175MB after the script ran.  Is there anyway with using a different file encoding that the script run time could be dramtically sped up?
Most Valuable Expert 2018
Distinguished Expert 2018

Commented:
I'm afraid I somewhat doubt the "dramatically".
Can you attach a zipped sample file of, say, ~1MB expanded size (a full file, not a snippet), or do they contain sensitive information?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
no there is no sensitive data, i will upload that
thank you

Author

Commented:
here is the data file in question, a bit larger than i hoped but i was breaking the formatting when try to go smaller, sorry about that
90727999.901
Most Valuable Expert 2018
Distinguished Expert 2018
Commented:
This should be a bit faster:
$root = 'C:\Temp\29157334'
$filter = '*.901'
$xml = New-Object -TypeName System.Xml.XmlDocument
Get-ChildItem -Path $root -Filter $filter |
	Select-String -Pattern '<errsql' -CaseSensitive -List |
	ForEach-Object {
		Write-Host "Processing '$($_.Path)'"
		Try {
			$xml.Load($_.Path)
			ForEach ($node in $xml.SelectNodes('//errsql')) {
				[void]$node.ParentNode.RemoveChild($node)
			}
			$xml.Save($_.Path)
		} Catch {
			$_ | Write-Error
		}
	}

Open in new window

Author

Commented:
Well i would say you missed the mark on one thing!, it is working dramatically faster :)
Processed one file of 434MB in about 3 minutes saving alot of time.
I saw the differences in your script and while im not that powershell techsavvy yet to fully understand I continue to learn in hopes of one day being able to write scripts half this good.  Thank you for your invaluable help it is greatly appreciated.
Cheers

Author

Commented:
Nailed it

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial