Solved

Powershell XML parsing

Posted on 2011-03-21
10
1,369 Views
Last Modified: 2012-05-11
I want to be able to parse some XML.
We have a file that contains some employee info that I want to query.
Here is a sample of the file.
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID=0000002>
</ArrayOfEmployeeInfo>

Part1: I want to be able to search for a field or combination of fields of my choosing.
Say for instance: FIRST and LAST name. And return all info for that person.

Part 2: there is a glitch with the data in that file. Example: An employee's title changes and the XML record for that employee is duplicated (appended) to the file as another record.
I would love to be able to parse the ugly file into a new file that weeds out the obsolete records.
We do have EmployeeID to use as a unique field. The newer records are at the bottom of the file, as it is just appended as time goes by.

Thanks in advance.

0
Comment
Question by:FDiskWizard
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 13

Accepted Solution

by:
soostibi earned 500 total points
ID: 35182561
Part1:

$s = [xml] @"
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID="0000002"/>
</ArrayOfEmployeeInfo>
"@


$s | Select-Xml -XPath "//EmployeeInfo[@firstName='John' and @lastName='Doe']" | %{$_.node}

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35183298
Good stuff.
it took me a few minutes to figure out how to do it from a file.
I think I just had a typo somewhere, but finally got it corrected.

This works:
  [xml] @(Get-Content 'C:\Files\EmpFile.xml') | Select-Xml -XPath "//EmployeeInfo[@firstName='Johne' and @lastName='Doe']" | %{$_.node}


0
 
LVL 13

Expert Comment

by:soostibi
ID: 35183825
For Part2 please check this:
$s = Get-Content 'C:\Files\EmpFile.xml'

$h = @{}
$i = 0
$s | % -process {
    if($_ -match 'EmployeeID="(\d{7})"'){ 
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}
        $i++
    }
    else{
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}
        $i++
    }
 }
 
$toleave = @($h.keys | %{$h.$_} | ?{$_.id} | Group-Object -Property id | 
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 } | Select-Object -ExpandProperty line) +
    @($h.keys | %{$h.$_} | ?{!$_.id} | Select-Object -ExpandProperty line )
$toleave | Sort-Object

$toleave | %{$h.$_} | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
Guide to Performance: Optimization & Monitoring

Nowadays, monitoring is a mixture of tools, systems, and codes—making it a very complex process. And with this complexity, comes variables for failure. Get DZone’s new Guide to Performance to learn how to proactively find these variables and solve them before a disruption occurs.

 
LVL 12

Author Comment

by:FDiskWizard
ID: 35190786
Thanks. But, that isn't working for me. It seems to output the same as the original, but the file is reversed. The XML header is at the bottom of the result, and the </ArrayOfEmployeeInfo> is at the top.

I can't really begin to debug THAT :) You're doing a number of things there, that I just can't quite follow.
It also puts header columns like:
ID  Text   Line
--- ----- -------

Could you help my newbie Powershell self understand some of the script you posted?
Looks like some good stuff happening there that I have not seen.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35191207
Sorry, I copied my wrong prototype.

But I have a question. Are you sure, that your sample is correct? Actually in PowerShell it seems not to be valid XML file, as EmployeeIDs should be between quation marks, and the EmployeeInfo tags should be closed.

So instead of
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
It shoud be
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
 
So you might have to change my code accordingly. Here is the corrected, simplified code, so if it works I'll explain.
$s = Get-Content 'C:\Files\EmpFile.xml' 
$i = 0  
$lines = $s | % -process {  
    if($_ -match 'EmployeeID=(\d{7})'){   # if there are quationmarks at the ID, 'EmployeeID=(\d{7})' should be change to 'EmployeeID="(\d{7})"'
        New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}  
        $i++  
    }  
    else{  
        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}  
        $i++  
    }  
 }  
   
$toleave = @($lines | ?{$_.id} | Group-Object -Property id |   
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +  
    @($lines| ?{!$_.id})  
$toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35192173
Outstanding!!! it worked on a small test file I created.

I had to figure out why it wasn't working on the full file... it was the field length. The real field was only 6 characters, and I gave you 7. I changed the 7 to a 6 in the script and it worked! Good stuff!

I did add the quotes also:
'EmployeeID="(\d{6})"'




0
 
LVL 13

Expert Comment

by:soostibi
ID: 35192568
Do you need an explanation?
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35193201
That would great, if you could provide some info on what's happening.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35193893
read content as text into $s:  $s = Get-Content 'C:\Files\EmpFile.xml'  
initialize $i: $i = 0    
create new object from each line : $lines = $s | % -process {    
if there is a pattern of EmployeeID with equal sign and seven digits:    if($_ -match 'EmployeeID=(\d{7})'){        
then the new object will include the following properties: original text, the digits from the found pattern and the line number, to preseve the right sequence and determine the last row of duplicate employees: New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}    
increase $i:       $i++    
    }    
    else{    

otherwise create a very similar object, except the EmployeeID is empty:        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}    
increase $i:        $i++    
    }    
 }    
 
   
select the lines with EmployeeID in them and group them by this ID: $toleave = @($lines | ?{$_.id} | Group-Object -Property id |    
within each group sort them by line number in descending order and select the first only:    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +    
Plus keep all the line without any EmployeeID:    @($lines| ?{!$_.id})  
and restore the original line sequence an take the original text form the customized object and write them to file: $toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35197958
Thanks a million, man! Hopefully I can do something a little more complex now on my next script.
0

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In previous parts of this Nano Server deployment series, we learned how to create, deploy and configure Nano Server as a Hyper-V host. In this part, we will look for a clustering option. We will create a Hyper-V cluster of 3 Nano Server host nodes w…
Auditing domain password hashes is a commonly overlooked but critical requirement to ensuring secure passwords practices are followed. Methods exist to extract hashes directly for a live domain however this article describes a process to extract u…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question