Solved

PowerShell Script to remove rows in a CSV that have a duplicate value in specified column.

Posted on 2013-12-13
11
2,320 Views
Last Modified: 2013-12-16
Hey guys,

I am starting to learn PowerShell 3.0 and have successfully managed to massage and filter data from a CSV into a new format I want but I am struggling with this particular task.

I have a CSV with 51 columns and no header row. I only need the data in columns 1, 4 and 7 and I need to filter out all rows that have a duplicate entry in column 1.

I have tried using 'select - unique' but I think I have been trying to use it incorrectly on the values rather than the parameters. I think I need to use a 'where' but I am really unsure.

Here is what I have been working with.
$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv

Open in new window


And help would be greatly appreciated.

Thanks in advance.
AC
0
Comment
Question by:defecta
  • 6
  • 5
11 Comments
 
LVL 39

Accepted Solution

by:
footech earned 500 total points
ID: 39718295
There's no automatic way that I know of.  All the commands that check for uniqueness compare all properties.  Since you want to only check one we've got to contrive something.  A good way is with the Group-Object cmdlet and then check the number of items found.
$import | Select "1","4","7" | group "1" | % {if ($_.count -gt 1){$_.group[0]}else{$_.group}}

Open in new window

0
 

Author Comment

by:defecta
ID: 39718307
Thanks for the quick response footech

That seems to do most of what I was trying to do. I was leaving off the quotes around the columns i was trying to select and getting an error that I couldn't make sense of, so thanks for that.

However the $_.group variable is causing an error saying there is no method called 'group'

But if l leave off that 3rd pipe starting with "%" I can see the data being grouped correctly.

I just need to figure out how to get the rows I need. Perhaps using a 'where'?
0
 
LVL 39

Expert Comment

by:footech
ID: 39718371
I tested the code above.  Are you sure you copied it correctly?
The "group" property is available from the objects output by the Group-Object cmdlet.
If you're still getting an error, please post a screenshot of your console showing the command and error.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Closing Comment

by:defecta
ID: 39718388
my bad. I just noticed that I was using brackets "()" where I should have been using square brackets "[]" because I couldn't copy/paste it at the time.

That's awesome, thank you so much!
0
 

Author Comment

by:defecta
ID: 39718426
If I can just confirm something for my understanding?

I've done a bit of reading on the function of the different types of brackets or braces and I still don't quite understand what function "[0]" at the end of the first "$_.group" is playing in the code. Would you care to elaborate for me?
0
 
LVL 39

Expert Comment

by:footech
ID: 39718969
Square brackets are used to reference a particular element of an array, so inside them is the index number (or numbers, or range) for the element you want to retrieve.  The "group" property returns a kind of array.
So I used it to call only the first element in the array if more than one is present in order to avoid duplicates.  Actually, reviewing it now, we could change it to remove the If-Else statements and always just call the first element regardless of the number found, since there will always be at least one element.
$import | Select "1","4","7" | Group "1" | %{$_.group[0]}

Open in new window

0
 

Author Comment

by:defecta
ID: 39719417
Awesome! And thanks for your easy to understand explanation too. =)
0
 
LVL 39

Expert Comment

by:footech
ID: 39720188
You're welcome!  I'm always happy to explain what I can.
0
 

Author Comment

by:defecta
ID: 39720792
I just found out that I didn't understand the requirements correctly but I am pretty sure that I have since modified the script to acheive the desired output but I would appreciate you casting an eye over it?

My understanding now is that we want to to keep all columns of data but exlude all rows that have values in columns 1, 4 and 7 that are not unique. So I have modified the code as such.

$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv | group "1","4","7" | foreach-object {$_.group[0]} |Export-Csv c:\temp\fileclean.csv -NoTypeInformation

Open in new window


If I am wrong and this requires some more input I will open a new question (with some more points. :) )
0
 
LVL 39

Expert Comment

by:footech
ID: 39720893
The basic idea looks like it would work, but I can't say I fully understand the requirements.  For example, do all the properties 1, 4, and 7 have to be unique for each row?   What if for a particular row 1 and 4 are unique, but 7 isn't?

A problem with your code -
If you're going to pipe to Export-CSV, there's no point in assigning it to a variable.

Test what you have, and if it's not producing what you expect I'd advise opening a new question.  I should see the question in the PowerShell TA and I'll help if I can.
0
 

Author Comment

by:defecta
ID: 39722602
Thanks footech.

Yeah as long as the 'group' of values in 1, 4 and 7 are unique they are to be included in the export. Its being tested this morning so I will know shortly and report back wither way.

Whoops, your right. The variable was a hangover from a previous attempt at a solution. Thanks again.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question