Solved

PowerShell Script to remove rows in a CSV that have a duplicate value in specified column.

Posted on 2013-12-13
11
2,358 Views
Last Modified: 2013-12-16
Hey guys,

I am starting to learn PowerShell 3.0 and have successfully managed to massage and filter data from a CSV into a new format I want but I am struggling with this particular task.

I have a CSV with 51 columns and no header row. I only need the data in columns 1, 4 and 7 and I need to filter out all rows that have a duplicate entry in column 1.

I have tried using 'select - unique' but I think I have been trying to use it incorrectly on the values rather than the parameters. I think I need to use a 'where' but I am really unsure.

Here is what I have been working with.
$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv

Open in new window


And help would be greatly appreciated.

Thanks in advance.
AC
0
Comment
Question by:defecta
  • 6
  • 5
11 Comments
 
LVL 40

Accepted Solution

by:
footech earned 500 total points
ID: 39718295
There's no automatic way that I know of.  All the commands that check for uniqueness compare all properties.  Since you want to only check one we've got to contrive something.  A good way is with the Group-Object cmdlet and then check the number of items found.
$import | Select "1","4","7" | group "1" | % {if ($_.count -gt 1){$_.group[0]}else{$_.group}}

Open in new window

0
 

Author Comment

by:defecta
ID: 39718307
Thanks for the quick response footech

That seems to do most of what I was trying to do. I was leaving off the quotes around the columns i was trying to select and getting an error that I couldn't make sense of, so thanks for that.

However the $_.group variable is causing an error saying there is no method called 'group'

But if l leave off that 3rd pipe starting with "%" I can see the data being grouped correctly.

I just need to figure out how to get the rows I need. Perhaps using a 'where'?
0
 
LVL 40

Expert Comment

by:footech
ID: 39718371
I tested the code above.  Are you sure you copied it correctly?
The "group" property is available from the objects output by the Group-Object cmdlet.
If you're still getting an error, please post a screenshot of your console showing the command and error.
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 

Author Closing Comment

by:defecta
ID: 39718388
my bad. I just noticed that I was using brackets "()" where I should have been using square brackets "[]" because I couldn't copy/paste it at the time.

That's awesome, thank you so much!
0
 

Author Comment

by:defecta
ID: 39718426
If I can just confirm something for my understanding?

I've done a bit of reading on the function of the different types of brackets or braces and I still don't quite understand what function "[0]" at the end of the first "$_.group" is playing in the code. Would you care to elaborate for me?
0
 
LVL 40

Expert Comment

by:footech
ID: 39718969
Square brackets are used to reference a particular element of an array, so inside them is the index number (or numbers, or range) for the element you want to retrieve.  The "group" property returns a kind of array.
So I used it to call only the first element in the array if more than one is present in order to avoid duplicates.  Actually, reviewing it now, we could change it to remove the If-Else statements and always just call the first element regardless of the number found, since there will always be at least one element.
$import | Select "1","4","7" | Group "1" | %{$_.group[0]}

Open in new window

0
 

Author Comment

by:defecta
ID: 39719417
Awesome! And thanks for your easy to understand explanation too. =)
0
 
LVL 40

Expert Comment

by:footech
ID: 39720188
You're welcome!  I'm always happy to explain what I can.
0
 

Author Comment

by:defecta
ID: 39720792
I just found out that I didn't understand the requirements correctly but I am pretty sure that I have since modified the script to acheive the desired output but I would appreciate you casting an eye over it?

My understanding now is that we want to to keep all columns of data but exlude all rows that have values in columns 1, 4 and 7 that are not unique. So I have modified the code as such.

$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv | group "1","4","7" | foreach-object {$_.group[0]} |Export-Csv c:\temp\fileclean.csv -NoTypeInformation

Open in new window


If I am wrong and this requires some more input I will open a new question (with some more points. :) )
0
 
LVL 40

Expert Comment

by:footech
ID: 39720893
The basic idea looks like it would work, but I can't say I fully understand the requirements.  For example, do all the properties 1, 4, and 7 have to be unique for each row?   What if for a particular row 1 and 4 are unique, but 7 isn't?

A problem with your code -
If you're going to pipe to Export-CSV, there's no point in assigning it to a variable.

Test what you have, and if it's not producing what you expect I'd advise opening a new question.  I should see the question in the PowerShell TA and I'll help if I can.
0
 

Author Comment

by:defecta
ID: 39722602
Thanks footech.

Yeah as long as the 'group' of values in 1, 4 and 7 are unique they are to be included in the export. Its being tested this morning so I will know shortly and report back wither way.

Whoops, your right. The variable was a hangover from a previous attempt at a solution. Thanks again.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Set OWA language and time zone in Exchange for individuals, all users or per database.
This article explains how to install and use the NTBackup utility that comes with Windows Server.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question