Powershell XML parsing

I want to be able to parse some XML.
We have a file that contains some employee info that I want to query.
Here is a sample of the file.
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID=0000002>
</ArrayOfEmployeeInfo>

Part1: I want to be able to search for a field or combination of fields of my choosing.
Say for instance: FIRST and LAST name. And return all info for that person.

Part 2: there is a glitch with the data in that file. Example: An employee's title changes and the XML record for that employee is duplicated (appended) to the file as another record.
I would love to be able to parse the ugly file into a new file that weeds out the obsolete records.
We do have EmployeeID to use as a unique field. The newer records are at the bottom of the file, as it is just appended as time goes by.

Thanks in advance.

LVL 12
FDiskWizardAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
soostibiConnect With a Mentor Commented:
Part1:

$s = [xml] @"
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID="0000002"/>
</ArrayOfEmployeeInfo>
"@


$s | Select-Xml -XPath "//EmployeeInfo[@firstName='John' and @lastName='Doe']" | %{$_.node}

Open in new window

0
 
FDiskWizardAuthor Commented:
Good stuff.
it took me a few minutes to figure out how to do it from a file.
I think I just had a typo somewhere, but finally got it corrected.

This works:
  [xml] @(Get-Content 'C:\Files\EmpFile.xml') | Select-Xml -XPath "//EmployeeInfo[@firstName='Johne' and @lastName='Doe']" | %{$_.node}


0
 
soostibiCommented:
For Part2 please check this:
$s = Get-Content 'C:\Files\EmpFile.xml'

$h = @{}
$i = 0
$s | % -process {
    if($_ -match 'EmployeeID="(\d{7})"'){ 
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}
        $i++
    }
    else{
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}
        $i++
    }
 }
 
$toleave = @($h.keys | %{$h.$_} | ?{$_.id} | Group-Object -Property id | 
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 } | Select-Object -ExpandProperty line) +
    @($h.keys | %{$h.$_} | ?{!$_.id} | Select-Object -ExpandProperty line )
$toleave | Sort-Object

$toleave | %{$h.$_} | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
FDiskWizardAuthor Commented:
Thanks. But, that isn't working for me. It seems to output the same as the original, but the file is reversed. The XML header is at the bottom of the result, and the </ArrayOfEmployeeInfo> is at the top.

I can't really begin to debug THAT :) You're doing a number of things there, that I just can't quite follow.
It also puts header columns like:
ID  Text   Line
--- ----- -------

Could you help my newbie Powershell self understand some of the script you posted?
Looks like some good stuff happening there that I have not seen.
0
 
soostibiCommented:
Sorry, I copied my wrong prototype.

But I have a question. Are you sure, that your sample is correct? Actually in PowerShell it seems not to be valid XML file, as EmployeeIDs should be between quation marks, and the EmployeeInfo tags should be closed.

So instead of
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
It shoud be
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
 
So you might have to change my code accordingly. Here is the corrected, simplified code, so if it works I'll explain.
$s = Get-Content 'C:\Files\EmpFile.xml' 
$i = 0  
$lines = $s | % -process {  
    if($_ -match 'EmployeeID=(\d{7})'){   # if there are quationmarks at the ID, 'EmployeeID=(\d{7})' should be change to 'EmployeeID="(\d{7})"'
        New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}  
        $i++  
    }  
    else{  
        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}  
        $i++  
    }  
 }  
   
$toleave = @($lines | ?{$_.id} | Group-Object -Property id |   
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +  
    @($lines| ?{!$_.id})  
$toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
 
FDiskWizardAuthor Commented:
Outstanding!!! it worked on a small test file I created.

I had to figure out why it wasn't working on the full file... it was the field length. The real field was only 6 characters, and I gave you 7. I changed the 7 to a 6 in the script and it worked! Good stuff!

I did add the quotes also:
'EmployeeID="(\d{6})"'




0
 
soostibiCommented:
Do you need an explanation?
0
 
FDiskWizardAuthor Commented:
That would great, if you could provide some info on what's happening.
0
 
soostibiCommented:
read content as text into $s:  $s = Get-Content 'C:\Files\EmpFile.xml'  
initialize $i: $i = 0    
create new object from each line : $lines = $s | % -process {    
if there is a pattern of EmployeeID with equal sign and seven digits:    if($_ -match 'EmployeeID=(\d{7})'){        
then the new object will include the following properties: original text, the digits from the found pattern and the line number, to preseve the right sequence and determine the last row of duplicate employees: New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}    
increase $i:       $i++    
    }    
    else{    

otherwise create a very similar object, except the EmployeeID is empty:        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}    
increase $i:        $i++    
    }    
 }    
 
   
select the lines with EmployeeID in them and group them by this ID: $toleave = @($lines | ?{$_.id} | Group-Object -Property id |    
within each group sort them by line number in descending order and select the first only:    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +    
Plus keep all the line without any EmployeeID:    @($lines| ?{!$_.id})  
and restore the original line sequence an take the original text form the customized object and write them to file: $toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'
0
 
FDiskWizardAuthor Commented:
Thanks a million, man! Hopefully I can do something a little more complex now on my next script.
0
All Courses

From novice to tech pro — start learning today.