?
Solved

Windows batch or Powershell script to filter out duplicate records

Posted on 2014-01-15
6
Medium Priority
?
912 Views
Last Modified: 2014-01-19
I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.


If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39784536
Can you provide a sample?
0
 

Author Comment

by:sunhux
ID: 39784781
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

To make it more readable,
a) I converted attachment 1 into an xls file: the colored records
   of blue are duplicate of each other, green are duplicate of each
   other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
    leaving what's shown in attachment 3 (in xls format for ease
    of reading)
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 1200 total points
ID: 39785150
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Open in new window

Of course, change paths accordingly.

HTH,
Dan
0
Not sure which OpenStack Certification to get?

So you’ve realized you might want to get certified in OpenStack, but you’re not sure what the benefits might be or even which one you should take. You know there are several certification courses you can choose from, but how do you know which one is right for you?

 
LVL 56

Assisted Solution

by:Bill Prew
Bill Prew earned 800 total points
ID: 39785187
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
  echo USAGE %~nx0 : [input-file] [output-file]
  exit /b
)

set InFile=%~1
set OutFile=%~2

(
  for /f "usebackq tokens=*" %%A in ("%InFile%") do (
    for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
      set Key=%%~B_%%~C
      set Key=!Key: =_!
      if not defined !Key! (
        set !Key!=Y
        echo %%~A
      )
    )
  )
) > "%OutFIle%"

Open in new window

~bp
0
 

Author Comment

by:sunhux
ID: 39788365
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 1200 total points
ID: 39788376
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv",
[string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv $outputPath -NoTypeInformation

Open in new window


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0

Featured Post

The Orion Papers

Are you interested in becoming an AWS Certified Solutions Architect?

Discover a new interactive way of training for the exam.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background Still having to process all these year-end "csv" files received from all these sources (including Government entities), sometimes we have the need to examine the contents due to data error, etc... As a "Unix" shop, our only readily …
Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Suggested Courses

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question