Solved

Powershell XML parsing

Posted on 2011-03-21
10
1,337 Views
Last Modified: 2012-05-11
I want to be able to parse some XML.
We have a file that contains some employee info that I want to query.
Here is a sample of the file.
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID=0000002>
</ArrayOfEmployeeInfo>

Part1: I want to be able to search for a field or combination of fields of my choosing.
Say for instance: FIRST and LAST name. And return all info for that person.

Part 2: there is a glitch with the data in that file. Example: An employee's title changes and the XML record for that employee is duplicated (appended) to the file as another record.
I would love to be able to parse the ugly file into a new file that weeds out the obsolete records.
We do have EmployeeID to use as a unique field. The newer records are at the bottom of the file, as it is just appended as time goes by.

Thanks in advance.

0
Comment
Question by:FDiskWizard
  • 5
  • 5
10 Comments
 
LVL 13

Accepted Solution

by:
soostibi earned 500 total points
ID: 35182561
Part1:

$s = [xml] @"
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID="0000002"/>
</ArrayOfEmployeeInfo>
"@


$s | Select-Xml -XPath "//EmployeeInfo[@firstName='John' and @lastName='Doe']" | %{$_.node}

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35183298
Good stuff.
it took me a few minutes to figure out how to do it from a file.
I think I just had a typo somewhere, but finally got it corrected.

This works:
  [xml] @(Get-Content 'C:\Files\EmpFile.xml') | Select-Xml -XPath "//EmployeeInfo[@firstName='Johne' and @lastName='Doe']" | %{$_.node}


0
 
LVL 13

Expert Comment

by:soostibi
ID: 35183825
For Part2 please check this:
$s = Get-Content 'C:\Files\EmpFile.xml'

$h = @{}
$i = 0
$s | % -process {
    if($_ -match 'EmployeeID="(\d{7})"'){ 
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}
        $i++
    }
    else{
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}
        $i++
    }
 }
 
$toleave = @($h.keys | %{$h.$_} | ?{$_.id} | Group-Object -Property id | 
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 } | Select-Object -ExpandProperty line) +
    @($h.keys | %{$h.$_} | ?{!$_.id} | Select-Object -ExpandProperty line )
$toleave | Sort-Object

$toleave | %{$h.$_} | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35190786
Thanks. But, that isn't working for me. It seems to output the same as the original, but the file is reversed. The XML header is at the bottom of the result, and the </ArrayOfEmployeeInfo> is at the top.

I can't really begin to debug THAT :) You're doing a number of things there, that I just can't quite follow.
It also puts header columns like:
ID  Text   Line
--- ----- -------

Could you help my newbie Powershell self understand some of the script you posted?
Looks like some good stuff happening there that I have not seen.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35191207
Sorry, I copied my wrong prototype.

But I have a question. Are you sure, that your sample is correct? Actually in PowerShell it seems not to be valid XML file, as EmployeeIDs should be between quation marks, and the EmployeeInfo tags should be closed.

So instead of
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
It shoud be
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
 
So you might have to change my code accordingly. Here is the corrected, simplified code, so if it works I'll explain.
$s = Get-Content 'C:\Files\EmpFile.xml' 
$i = 0  
$lines = $s | % -process {  
    if($_ -match 'EmployeeID=(\d{7})'){   # if there are quationmarks at the ID, 'EmployeeID=(\d{7})' should be change to 'EmployeeID="(\d{7})"'
        New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}  
        $i++  
    }  
    else{  
        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}  
        $i++  
    }  
 }  
   
$toleave = @($lines | ?{$_.id} | Group-Object -Property id |   
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +  
    @($lines| ?{!$_.id})  
$toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 12

Author Comment

by:FDiskWizard
ID: 35192173
Outstanding!!! it worked on a small test file I created.

I had to figure out why it wasn't working on the full file... it was the field length. The real field was only 6 characters, and I gave you 7. I changed the 7 to a 6 in the script and it worked! Good stuff!

I did add the quotes also:
'EmployeeID="(\d{6})"'




0
 
LVL 13

Expert Comment

by:soostibi
ID: 35192568
Do you need an explanation?
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35193201
That would great, if you could provide some info on what's happening.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35193893
read content as text into $s:  $s = Get-Content 'C:\Files\EmpFile.xml'  
initialize $i: $i = 0    
create new object from each line : $lines = $s | % -process {    
if there is a pattern of EmployeeID with equal sign and seven digits:    if($_ -match 'EmployeeID=(\d{7})'){        
then the new object will include the following properties: original text, the digits from the found pattern and the line number, to preseve the right sequence and determine the last row of duplicate employees: New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}    
increase $i:       $i++    
    }    
    else{    

otherwise create a very similar object, except the EmployeeID is empty:        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}    
increase $i:        $i++    
    }    
 }    
 
   
select the lines with EmployeeID in them and group them by this ID: $toleave = @($lines | ?{$_.id} | Group-Object -Property id |    
within each group sort them by line number in descending order and select the first only:    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +    
Plus keep all the line without any EmployeeID:    @($lines| ?{!$_.id})  
and restore the original line sequence an take the original text form the customized object and write them to file: $toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35197958
Thanks a million, man! Hopefully I can do something a little more complex now on my next script.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Microsoft Windows Server Update Service (WSUS) is free for everyone, but it lacks of some desirable features like send an e-mail to the administrator with the status of all computers on the WSUS server. This article is based on my PowerShell script …
In this previous article (https://oddytee.wordpress.com/2016/05/05/provision-new-office-365-user-and-mailbox-from-exchange-hybrid-via-powershell/), we made basic license assignments to users in O365. When I say basic, the method is the simplest way …
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now