Solved

PowerShell Script to remove rows in a CSV that have a duplicate value in specified column.

Posted on 2013-12-13
11
2,385 Views
Last Modified: 2013-12-16
Hey guys,

I am starting to learn PowerShell 3.0 and have successfully managed to massage and filter data from a CSV into a new format I want but I am struggling with this particular task.

I have a CSV with 51 columns and no header row. I only need the data in columns 1, 4 and 7 and I need to filter out all rows that have a duplicate entry in column 1.

I have tried using 'select - unique' but I think I have been trying to use it incorrectly on the values rather than the parameters. I think I need to use a 'where' but I am really unsure.

Here is what I have been working with.
$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv

Open in new window


And help would be greatly appreciated.

Thanks in advance.
AC
0
Comment
Question by:defecta
  • 6
  • 5
11 Comments
 
LVL 40

Accepted Solution

by:
footech earned 500 total points
ID: 39718295
There's no automatic way that I know of.  All the commands that check for uniqueness compare all properties.  Since you want to only check one we've got to contrive something.  A good way is with the Group-Object cmdlet and then check the number of items found.
$import | Select "1","4","7" | group "1" | % {if ($_.count -gt 1){$_.group[0]}else{$_.group}}

Open in new window

0
 

Author Comment

by:defecta
ID: 39718307
Thanks for the quick response footech

That seems to do most of what I was trying to do. I was leaving off the quotes around the columns i was trying to select and getting an error that I couldn't make sense of, so thanks for that.

However the $_.group variable is causing an error saying there is no method called 'group'

But if l leave off that 3rd pipe starting with "%" I can see the data being grouped correctly.

I just need to figure out how to get the rows I need. Perhaps using a 'where'?
0
 
LVL 40

Expert Comment

by:footech
ID: 39718371
I tested the code above.  Are you sure you copied it correctly?
The "group" property is available from the objects output by the Group-Object cmdlet.
If you're still getting an error, please post a screenshot of your console showing the command and error.
0
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

 

Author Closing Comment

by:defecta
ID: 39718388
my bad. I just noticed that I was using brackets "()" where I should have been using square brackets "[]" because I couldn't copy/paste it at the time.

That's awesome, thank you so much!
0
 

Author Comment

by:defecta
ID: 39718426
If I can just confirm something for my understanding?

I've done a bit of reading on the function of the different types of brackets or braces and I still don't quite understand what function "[0]" at the end of the first "$_.group" is playing in the code. Would you care to elaborate for me?
0
 
LVL 40

Expert Comment

by:footech
ID: 39718969
Square brackets are used to reference a particular element of an array, so inside them is the index number (or numbers, or range) for the element you want to retrieve.  The "group" property returns a kind of array.
So I used it to call only the first element in the array if more than one is present in order to avoid duplicates.  Actually, reviewing it now, we could change it to remove the If-Else statements and always just call the first element regardless of the number found, since there will always be at least one element.
$import | Select "1","4","7" | Group "1" | %{$_.group[0]}

Open in new window

0
 

Author Comment

by:defecta
ID: 39719417
Awesome! And thanks for your easy to understand explanation too. =)
0
 
LVL 40

Expert Comment

by:footech
ID: 39720188
You're welcome!  I'm always happy to explain what I can.
0
 

Author Comment

by:defecta
ID: 39720792
I just found out that I didn't understand the requirements correctly but I am pretty sure that I have since modified the script to acheive the desired output but I would appreciate you casting an eye over it?

My understanding now is that we want to to keep all columns of data but exlude all rows that have values in columns 1, 4 and 7 that are not unique. So I have modified the code as such.

$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv | group "1","4","7" | foreach-object {$_.group[0]} |Export-Csv c:\temp\fileclean.csv -NoTypeInformation

Open in new window


If I am wrong and this requires some more input I will open a new question (with some more points. :) )
0
 
LVL 40

Expert Comment

by:footech
ID: 39720893
The basic idea looks like it would work, but I can't say I fully understand the requirements.  For example, do all the properties 1, 4, and 7 have to be unique for each row?   What if for a particular row 1 and 4 are unique, but 7 isn't?

A problem with your code -
If you're going to pipe to Export-CSV, there's no point in assigning it to a variable.

Test what you have, and if it's not producing what you expect I'd advise opening a new question.  I should see the question in the PowerShell TA and I'll help if I can.
0
 

Author Comment

by:defecta
ID: 39722602
Thanks footech.

Yeah as long as the 'group' of values in 1, 4 and 7 are unique they are to be included in the export. Its being tested this morning so I will know shortly and report back wither way.

Whoops, your right. The variable was a hangover from a previous attempt at a solution. Thanks again.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A procedure for exporting installed hotfix details of remote computers using powershell
In threads here at EE, each comment has a unique Identifier (ID). It is easy to get the full path for an ID via the right-click context menu. However, we often want to post a short link within a thread rather than the full link. This article shows a…
This tutorial will walk an individual through the process of transferring the five major, necessary Active Directory Roles, commonly referred to as the FSMO roles to another domain controller. Log onto the new domain controller with a user account t…
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question