Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

PowerShell Script to remove rows in a CSV that have a duplicate value in specified column.

Posted on 2013-12-13
11
Medium Priority
?
2,543 Views
Last Modified: 2013-12-16
Hey guys,

I am starting to learn PowerShell 3.0 and have successfully managed to massage and filter data from a CSV into a new format I want but I am struggling with this particular task.

I have a CSV with 51 columns and no header row. I only need the data in columns 1, 4 and 7 and I need to filter out all rows that have a duplicate entry in column 1.

I have tried using 'select - unique' but I think I have been trying to use it incorrectly on the values rather than the parameters. I think I need to use a 'where' but I am really unsure.

Here is what I have been working with.
$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv

Open in new window


And help would be greatly appreciated.

Thanks in advance.
AC
0
Comment
Question by:defecta
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
11 Comments
 
LVL 41

Accepted Solution

by:
footech earned 2000 total points
ID: 39718295
There's no automatic way that I know of.  All the commands that check for uniqueness compare all properties.  Since you want to only check one we've got to contrive something.  A good way is with the Group-Object cmdlet and then check the number of items found.
$import | Select "1","4","7" | group "1" | % {if ($_.count -gt 1){$_.group[0]}else{$_.group}}

Open in new window

0
 

Author Comment

by:defecta
ID: 39718307
Thanks for the quick response footech

That seems to do most of what I was trying to do. I was leaving off the quotes around the columns i was trying to select and getting an error that I couldn't make sense of, so thanks for that.

However the $_.group variable is causing an error saying there is no method called 'group'

But if l leave off that 3rd pipe starting with "%" I can see the data being grouped correctly.

I just need to figure out how to get the rows I need. Perhaps using a 'where'?
0
 
LVL 41

Expert Comment

by:footech
ID: 39718371
I tested the code above.  Are you sure you copied it correctly?
The "group" property is available from the objects output by the Group-Object cmdlet.
If you're still getting an error, please post a screenshot of your console showing the command and error.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Closing Comment

by:defecta
ID: 39718388
my bad. I just noticed that I was using brackets "()" where I should have been using square brackets "[]" because I couldn't copy/paste it at the time.

That's awesome, thank you so much!
0
 

Author Comment

by:defecta
ID: 39718426
If I can just confirm something for my understanding?

I've done a bit of reading on the function of the different types of brackets or braces and I still don't quite understand what function "[0]" at the end of the first "$_.group" is playing in the code. Would you care to elaborate for me?
0
 
LVL 41

Expert Comment

by:footech
ID: 39718969
Square brackets are used to reference a particular element of an array, so inside them is the index number (or numbers, or range) for the element you want to retrieve.  The "group" property returns a kind of array.
So I used it to call only the first element in the array if more than one is present in order to avoid duplicates.  Actually, reviewing it now, we could change it to remove the If-Else statements and always just call the first element regardless of the number found, since there will always be at least one element.
$import | Select "1","4","7" | Group "1" | %{$_.group[0]}

Open in new window

0
 

Author Comment

by:defecta
ID: 39719417
Awesome! And thanks for your easy to understand explanation too. =)
0
 
LVL 41

Expert Comment

by:footech
ID: 39720188
You're welcome!  I'm always happy to explain what I can.
0
 

Author Comment

by:defecta
ID: 39720792
I just found out that I didn't understand the requirements correctly but I am pretty sure that I have since modified the script to acheive the desired output but I would appreciate you casting an eye over it?

My understanding now is that we want to to keep all columns of data but exlude all rows that have values in columns 1, 4 and 7 that are not unique. So I have modified the code as such.

$header = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51
$import = Import-Csv -header $header C:\temp\file.csv | group "1","4","7" | foreach-object {$_.group[0]} |Export-Csv c:\temp\fileclean.csv -NoTypeInformation

Open in new window


If I am wrong and this requires some more input I will open a new question (with some more points. :) )
0
 
LVL 41

Expert Comment

by:footech
ID: 39720893
The basic idea looks like it would work, but I can't say I fully understand the requirements.  For example, do all the properties 1, 4, and 7 have to be unique for each row?   What if for a particular row 1 and 4 are unique, but 7 isn't?

A problem with your code -
If you're going to pipe to Export-CSV, there's no point in assigning it to a variable.

Test what you have, and if it's not producing what you expect I'd advise opening a new question.  I should see the question in the PowerShell TA and I'll help if I can.
0
 

Author Comment

by:defecta
ID: 39722602
Thanks footech.

Yeah as long as the 'group' of values in 1, 4 and 7 are unique they are to be included in the export. Its being tested this morning so I will know shortly and report back wither way.

Whoops, your right. The variable was a hangover from a previous attempt at a solution. Thanks again.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In previous parts of this Nano Server deployment series, we learned how to create, deploy and configure Nano Server as a Hyper-V host. In this part, we will look for a clustering option. We will create a Hyper-V cluster of 3 Nano Server host nodes w…
Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
The viewer will learn how to count occurrences of each item in an array.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question