• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2669
  • Last Modified:

PowerShell Script to remove rows in a CSV that have a duplicate value in specified column.

Hey guys,

I am starting to learn PowerShell 3.0 and have successfully managed to massage and filter data from a CSV into a new format I want but I am struggling with this particular task.

I have a CSV with 51 columns and no header row. I only need the data in columns 1, 4 and 7 and I need to filter out all rows that have a duplicate entry in column 1.

I have tried using 'select - unique' but I think I have been trying to use it incorrectly on the values rather than the parameters. I think I need to use a 'where' but I am really unsure.

Here is what I have been working with.
$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv

Open in new window


And help would be greatly appreciated.

Thanks in advance.
AC
0
defecta
Asked:
defecta
  • 6
  • 5
1 Solution
 
footechCommented:
There's no automatic way that I know of.  All the commands that check for uniqueness compare all properties.  Since you want to only check one we've got to contrive something.  A good way is with the Group-Object cmdlet and then check the number of items found.
$import | Select "1","4","7" | group "1" | % {if ($_.count -gt 1){$_.group[0]}else{$_.group}}

Open in new window

0
 
defectaAuthor Commented:
Thanks for the quick response footech

That seems to do most of what I was trying to do. I was leaving off the quotes around the columns i was trying to select and getting an error that I couldn't make sense of, so thanks for that.

However the $_.group variable is causing an error saying there is no method called 'group'

But if l leave off that 3rd pipe starting with "%" I can see the data being grouped correctly.

I just need to figure out how to get the rows I need. Perhaps using a 'where'?
0
 
footechCommented:
I tested the code above.  Are you sure you copied it correctly?
The "group" property is available from the objects output by the Group-Object cmdlet.
If you're still getting an error, please post a screenshot of your console showing the command and error.
0
Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

 
defectaAuthor Commented:
my bad. I just noticed that I was using brackets "()" where I should have been using square brackets "[]" because I couldn't copy/paste it at the time.

That's awesome, thank you so much!
0
 
defectaAuthor Commented:
If I can just confirm something for my understanding?

I've done a bit of reading on the function of the different types of brackets or braces and I still don't quite understand what function "[0]" at the end of the first "$_.group" is playing in the code. Would you care to elaborate for me?
0
 
footechCommented:
Square brackets are used to reference a particular element of an array, so inside them is the index number (or numbers, or range) for the element you want to retrieve.  The "group" property returns a kind of array.
So I used it to call only the first element in the array if more than one is present in order to avoid duplicates.  Actually, reviewing it now, we could change it to remove the If-Else statements and always just call the first element regardless of the number found, since there will always be at least one element.
$import | Select "1","4","7" | Group "1" | %{$_.group[0]}

Open in new window

0
 
defectaAuthor Commented:
Awesome! And thanks for your easy to understand explanation too. =)
0
 
footechCommented:
You're welcome!  I'm always happy to explain what I can.
0
 
defectaAuthor Commented:
I just found out that I didn't understand the requirements correctly but I am pretty sure that I have since modified the script to acheive the desired output but I would appreciate you casting an eye over it?

My understanding now is that we want to to keep all columns of data but exlude all rows that have values in columns 1, 4 and 7 that are not unique. So I have modified the code as such.

$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv | group "1","4","7" | foreach-object {$_.group[0]} |Export-Csv c:\temp\fileclean.csv -NoTypeInformation

Open in new window


If I am wrong and this requires some more input I will open a new question (with some more points. :) )
0
 
footechCommented:
The basic idea looks like it would work, but I can't say I fully understand the requirements.  For example, do all the properties 1, 4, and 7 have to be unique for each row?   What if for a particular row 1 and 4 are unique, but 7 isn't?

A problem with your code -
If you're going to pipe to Export-CSV, there's no point in assigning it to a variable.

Test what you have, and if it's not producing what you expect I'd advise opening a new question.  I should see the question in the PowerShell TA and I'll help if I can.
0
 
defectaAuthor Commented:
Thanks footech.

Yeah as long as the 'group' of values in 1, 4 and 7 are unique they are to be included in the export. Its being tested this morning so I will know shortly and report back wither way.

Whoops, your right. The variable was a hangover from a previous attempt at a solution. Thanks again.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 6
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now