Solved

Windows batch or Powershell script to filter out duplicate records

Posted on 2014-01-15
Medium Priority
933 Views
I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.

If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 3
• 2

LVL 35

Expert Comment

ID: 39784536
Can you provide a sample?
0

Author Comment

ID: 39784781
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

a) I converted attachment 1 into an xls file: the colored records
of blue are duplicate of each other, green are duplicate of each
other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
leaving what's shown in attachment 3 (in xls format for ease
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0

LVL 35

Assisted Solution

Dan Craciun earned 1200 total points
ID: 39785150
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Of course, change paths accordingly.

HTH,
Dan
0

LVL 58

Assisted Solution

Bill Prew earned 800 total points
ID: 39785187
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
echo USAGE %~nx0 : [input-file] [output-file]
exit /b
)

set InFile=%~1
set OutFile=%~2

(
for /f "usebackq tokens=*" %%A in ("%InFile%") do (
for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
set Key=%%~B_%%~C
set Key=!Key: =_!
if not defined !Key! (
set !Key!=Y
echo %%~A
)
)
)
) > "%OutFIle%"

~bp
0

Author Comment

ID: 39788365
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0

LVL 35

Accepted Solution

Dan Craciun earned 1200 total points
ID: 39788376
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv", [string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv$outputPath -NoTypeInformation


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

AutoHotkey is an excellent, free, open source programming/scripting language for Windows. It started out as a keyboard/mouse macros product, but has expanded into a robust language. This article provides an introduction to it, with links to addition…
In threads here at EE, each comment has a unique Identifier (ID). It is easy to get the full path for an ID via the right-click context menu. However, we often want to post a short link within a thread rather than the full link. This article shows a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Suggested Courses
Course of the Month12 days, 9 hours left to enroll