Solved

Powershell XML parsing

Posted on 2011-03-21
10
1,358 Views
Last Modified: 2012-05-11
I want to be able to parse some XML.
We have a file that contains some employee info that I want to query.
Here is a sample of the file.
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID=0000002>
</ArrayOfEmployeeInfo>

Part1: I want to be able to search for a field or combination of fields of my choosing.
Say for instance: FIRST and LAST name. And return all info for that person.

Part 2: there is a glitch with the data in that file. Example: An employee's title changes and the XML record for that employee is duplicated (appended) to the file as another record.
I would love to be able to parse the ugly file into a new file that weeds out the obsolete records.
We do have EmployeeID to use as a unique field. The newer records are at the bottom of the file, as it is just appended as time goes by.

Thanks in advance.

0
Comment
Question by:FDiskWizard
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 13

Accepted Solution

by:
soostibi earned 500 total points
ID: 35182561
Part1:

$s = [xml] @"
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID="0000002"/>
</ArrayOfEmployeeInfo>
"@


$s | Select-Xml -XPath "//EmployeeInfo[@firstName='John' and @lastName='Doe']" | %{$_.node}

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35183298
Good stuff.
it took me a few minutes to figure out how to do it from a file.
I think I just had a typo somewhere, but finally got it corrected.

This works:
  [xml] @(Get-Content 'C:\Files\EmpFile.xml') | Select-Xml -XPath "//EmployeeInfo[@firstName='Johne' and @lastName='Doe']" | %{$_.node}


0
 
LVL 13

Expert Comment

by:soostibi
ID: 35183825
For Part2 please check this:
$s = Get-Content 'C:\Files\EmpFile.xml'

$h = @{}
$i = 0
$s | % -process {
    if($_ -match 'EmployeeID="(\d{7})"'){ 
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}
        $i++
    }
    else{
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}
        $i++
    }
 }
 
$toleave = @($h.keys | %{$h.$_} | ?{$_.id} | Group-Object -Property id | 
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 } | Select-Object -ExpandProperty line) +
    @($h.keys | %{$h.$_} | ?{!$_.id} | Select-Object -ExpandProperty line )
$toleave | Sort-Object

$toleave | %{$h.$_} | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
LVL 12

Author Comment

by:FDiskWizard
ID: 35190786
Thanks. But, that isn't working for me. It seems to output the same as the original, but the file is reversed. The XML header is at the bottom of the result, and the </ArrayOfEmployeeInfo> is at the top.

I can't really begin to debug THAT :) You're doing a number of things there, that I just can't quite follow.
It also puts header columns like:
ID  Text   Line
--- ----- -------

Could you help my newbie Powershell self understand some of the script you posted?
Looks like some good stuff happening there that I have not seen.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35191207
Sorry, I copied my wrong prototype.

But I have a question. Are you sure, that your sample is correct? Actually in PowerShell it seems not to be valid XML file, as EmployeeIDs should be between quation marks, and the EmployeeInfo tags should be closed.

So instead of
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
It shoud be
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
 
So you might have to change my code accordingly. Here is the corrected, simplified code, so if it works I'll explain.
$s = Get-Content 'C:\Files\EmpFile.xml' 
$i = 0  
$lines = $s | % -process {  
    if($_ -match 'EmployeeID=(\d{7})'){   # if there are quationmarks at the ID, 'EmployeeID=(\d{7})' should be change to 'EmployeeID="(\d{7})"'
        New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}  
        $i++  
    }  
    else{  
        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}  
        $i++  
    }  
 }  
   
$toleave = @($lines | ?{$_.id} | Group-Object -Property id |   
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +  
    @($lines| ?{!$_.id})  
$toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35192173
Outstanding!!! it worked on a small test file I created.

I had to figure out why it wasn't working on the full file... it was the field length. The real field was only 6 characters, and I gave you 7. I changed the 7 to a 6 in the script and it worked! Good stuff!

I did add the quotes also:
'EmployeeID="(\d{6})"'




0
 
LVL 13

Expert Comment

by:soostibi
ID: 35192568
Do you need an explanation?
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35193201
That would great, if you could provide some info on what's happening.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35193893
read content as text into $s:  $s = Get-Content 'C:\Files\EmpFile.xml'  
initialize $i: $i = 0    
create new object from each line : $lines = $s | % -process {    
if there is a pattern of EmployeeID with equal sign and seven digits:    if($_ -match 'EmployeeID=(\d{7})'){        
then the new object will include the following properties: original text, the digits from the found pattern and the line number, to preseve the right sequence and determine the last row of duplicate employees: New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}    
increase $i:       $i++    
    }    
    else{    

otherwise create a very similar object, except the EmployeeID is empty:        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}    
increase $i:        $i++    
    }    
 }    
 
   
select the lines with EmployeeID in them and group them by this ID: $toleave = @($lines | ?{$_.id} | Group-Object -Property id |    
within each group sort them by line number in descending order and select the first only:    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +    
Plus keep all the line without any EmployeeID:    @($lines| ?{!$_.id})  
and restore the original line sequence an take the original text form the customized object and write them to file: $toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35197958
Thanks a million, man! Hopefully I can do something a little more complex now on my next script.
0

Featured Post

Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Set OWA language and time zone in Exchange for individuals, all users or per database.
A procedure for exporting installed hotfix details of remote computers using powershell
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question