Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Powershell XML parsing

Posted on 2011-03-21
10
Medium Priority
?
1,421 Views
Last Modified: 2012-05-11
I want to be able to parse some XML.
We have a file that contains some employee info that I want to query.
Here is a sample of the file.
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID=0000002>
</ArrayOfEmployeeInfo>

Part1: I want to be able to search for a field or combination of fields of my choosing.
Say for instance: FIRST and LAST name. And return all info for that person.

Part 2: there is a glitch with the data in that file. Example: An employee's title changes and the XML record for that employee is duplicated (appended) to the file as another record.
I would love to be able to parse the ugly file into a new file that weeds out the obsolete records.
We do have EmployeeID to use as a unique field. The newer records are at the bottom of the file, as it is just appended as time goes by.

Thanks in advance.

0
Comment
Question by:FDiskWizard
  • 5
  • 5
10 Comments
 
LVL 13

Accepted Solution

by:
soostibi earned 2000 total points
ID: 35182561
Part1:

$s = [xml] @"
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfEmployeeInfo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
  <EmployeeInfo firstName="Jane" middleName="Q" lastName="Doe" Phone="555-5552" EmployeeID="0000002"/>
</ArrayOfEmployeeInfo>
"@


$s | Select-Xml -XPath "//EmployeeInfo[@firstName='John' and @lastName='Doe']" | %{$_.node}

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35183298
Good stuff.
it took me a few minutes to figure out how to do it from a file.
I think I just had a typo somewhere, but finally got it corrected.

This works:
  [xml] @(Get-Content 'C:\Files\EmpFile.xml') | Select-Xml -XPath "//EmployeeInfo[@firstName='Johne' and @lastName='Doe']" | %{$_.node}


0
 
LVL 13

Expert Comment

by:soostibi
ID: 35183825
For Part2 please check this:
$s = Get-Content 'C:\Files\EmpFile.xml'

$h = @{}
$i = 0
$s | % -process {
    if($_ -match 'EmployeeID="(\d{7})"'){ 
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}
        $i++
    }
    else{
        $h.$i = New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}
        $i++
    }
 }
 
$toleave = @($h.keys | %{$h.$_} | ?{$_.id} | Group-Object -Property id | 
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 } | Select-Object -ExpandProperty line) +
    @($h.keys | %{$h.$_} | ?{!$_.id} | Select-Object -ExpandProperty line )
$toleave | Sort-Object

$toleave | %{$h.$_} | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 12

Author Comment

by:FDiskWizard
ID: 35190786
Thanks. But, that isn't working for me. It seems to output the same as the original, but the file is reversed. The XML header is at the bottom of the result, and the </ArrayOfEmployeeInfo> is at the top.

I can't really begin to debug THAT :) You're doing a number of things there, that I just can't quite follow.
It also puts header columns like:
ID  Text   Line
--- ----- -------

Could you help my newbie Powershell self understand some of the script you posted?
Looks like some good stuff happening there that I have not seen.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35191207
Sorry, I copied my wrong prototype.

But I have a question. Are you sure, that your sample is correct? Actually in PowerShell it seems not to be valid XML file, as EmployeeIDs should be between quation marks, and the EmployeeInfo tags should be closed.

So instead of
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID=0000001>
It shoud be
  <EmployeeInfo firstName="John" middleName="Q" lastName="Doe" Phone="555-5551" EmployeeID="0000001"/>
 
So you might have to change my code accordingly. Here is the corrected, simplified code, so if it works I'll explain.
$s = Get-Content 'C:\Files\EmpFile.xml' 
$i = 0  
$lines = $s | % -process {  
    if($_ -match 'EmployeeID=(\d{7})'){   # if there are quationmarks at the ID, 'EmployeeID=(\d{7})' should be change to 'EmployeeID="(\d{7})"'
        New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}  
        $i++  
    }  
    else{  
        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}  
        $i++  
    }  
 }  
   
$toleave = @($lines | ?{$_.id} | Group-Object -Property id |   
    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +  
    @($lines| ?{!$_.id})  
$toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'

Open in new window

0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35192173
Outstanding!!! it worked on a small test file I created.

I had to figure out why it wasn't working on the full file... it was the field length. The real field was only 6 characters, and I gave you 7. I changed the 7 to a 6 in the script and it worked! Good stuff!

I did add the quotes also:
'EmployeeID="(\d{6})"'




0
 
LVL 13

Expert Comment

by:soostibi
ID: 35192568
Do you need an explanation?
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35193201
That would great, if you could provide some info on what's happening.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 35193893
read content as text into $s:  $s = Get-Content 'C:\Files\EmpFile.xml'  
initialize $i: $i = 0    
create new object from each line : $lines = $s | % -process {    
if there is a pattern of EmployeeID with equal sign and seven digits:    if($_ -match 'EmployeeID=(\d{7})'){        
then the new object will include the following properties: original text, the digits from the found pattern and the line number, to preseve the right sequence and determine the last row of duplicate employees: New-Object -TypeName PSObject -Property @{text = $_; id = $matches[1]; line = $i}    
increase $i:       $i++    
    }    
    else{    

otherwise create a very similar object, except the EmployeeID is empty:        New-Object -TypeName PSObject -Property @{text = $_; id = $null; line = $i}    
increase $i:        $i++    
    }    
 }    
 
   
select the lines with EmployeeID in them and group them by this ID: $toleave = @($lines | ?{$_.id} | Group-Object -Property id |    
within each group sort them by line number in descending order and select the first only:    %{$_.group | Sort-Object -Property line -Descending | Select-Object -first 1 })  +    
Plus keep all the line without any EmployeeID:    @($lines| ?{!$_.id})  
and restore the original line sequence an take the original text form the customized object and write them to file: $toleave | Sort-Object -Property line | Select-Object -ExpandProperty text | Out-File 'C:\Files\NewEmpFile.xml'
0
 
LVL 12

Author Comment

by:FDiskWizard
ID: 35197958
Thanks a million, man! Hopefully I can do something a little more complex now on my next script.
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A walk-through example of how to obtain and apply new DID phone numbers to your cloud PBX enabled users that are configured in Office 365. Whether you have 1, 10 or 100+ users in your tenant, it's quite easy to get them phone-enabled and making/rece…
In this post, I will showcase the steps for how to create groups in Office 365. Office 365 groups allow for ease of flexibility and collaboration between staff members.
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…
Loops Section Overview

877 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question