Solved

Windows batch or Powershell script to filter out duplicate records

Posted on 2014-01-15
839 Views
I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.

If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
Question by:sunhux
• 3
• 2

LVL 34

Expert Comment

ID: 39784536
Can you provide a sample?
0

Author Comment

ID: 39784781
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

a) I converted attachment 1 into an xls file: the colored records
of blue are duplicate of each other, green are duplicate of each
other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
leaving what's shown in attachment 3 (in xls format for ease
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0

LVL 34

Assisted Solution

Dan Craciun earned 300 total points
ID: 39785150
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Of course, change paths accordingly.

HTH,
Dan
0

LVL 52

Assisted Solution

Bill Prew earned 200 total points
ID: 39785187
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
echo USAGE %~nx0 : [input-file] [output-file]
exit /b
)

set InFile=%~1
set OutFile=%~2

(
for /f "usebackq tokens=*" %%A in ("%InFile%") do (
for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
set Key=%%~B_%%~C
set Key=!Key: =_!
if not defined !Key! (
set !Key!=Y
echo %%~A
)
)
)
) > "%OutFIle%"

~bp
0

Author Comment

ID: 39788365
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0

LVL 34

Accepted Solution

Dan Craciun earned 300 total points
ID: 39788376
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv", [string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv$outputPath -NoTypeInformation


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction: Recently, I got a requirement to zip all files individually with batch file script in Windows OS. I don't know much about scripting, but I searched Google and found a lot of examples and websites to complete my task. Finally, I was ab…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…