• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 960
  • Last Modified:

Windows batch or Powershell script to filter out duplicate records

I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.

If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
  • 3
  • 2
3 Solutions
Dan CraciunIT ConsultantCommented:
Can you provide a sample?
sunhuxAuthor Commented:
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

To make it more readable,
a) I converted attachment 1 into an xls file: the colored records
   of blue are duplicate of each other, green are duplicate of each
   other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
    leaving what's shown in attachment 3 (in xls format for ease
    of reading)
Dan CraciunIT ConsultantCommented:
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Open in new window

Of course, change paths accordingly.

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

Bill PrewCommented:
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
  echo USAGE %~nx0 : [input-file] [output-file]
  exit /b

set InFile=%~1
set OutFile=%~2

  for /f "usebackq tokens=*" %%A in ("%InFile%") do (
    for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
      set Key=%%~B_%%~C
      set Key=!Key: =_!
      if not defined !Key! (
        set !Key!=Y
        echo %%~A
) > "%OutFIle%"

Open in new window

sunhuxAuthor Commented:
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
Dan CraciunIT ConsultantCommented:
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv",
[string]$outputPath = "output.csv"

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv $outputPath -NoTypeInformation

Open in new window

Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now