Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 967
  • Last Modified:

Windows batch or Powershell script to filter out duplicate records

I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.


If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
sunhux
Asked:
sunhux
  • 3
  • 2
3 Solutions
 
Dan CraciunIT ConsultantCommented:
Can you provide a sample?
0
 
sunhuxAuthor Commented:
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

To make it more readable,
a) I converted attachment 1 into an xls file: the colored records
   of blue are duplicate of each other, green are duplicate of each
   other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
    leaving what's shown in attachment 3 (in xls format for ease
    of reading)
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0
 
Dan CraciunIT ConsultantCommented:
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Open in new window

Of course, change paths accordingly.

HTH,
Dan
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
Bill PrewCommented:
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
  echo USAGE %~nx0 : [input-file] [output-file]
  exit /b
)

set InFile=%~1
set OutFile=%~2

(
  for /f "usebackq tokens=*" %%A in ("%InFile%") do (
    for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
      set Key=%%~B_%%~C
      set Key=!Key: =_!
      if not defined !Key! (
        set !Key!=Y
        echo %%~A
      )
    )
  )
) > "%OutFIle%"

Open in new window

~bp
0
 
sunhuxAuthor Commented:
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0
 
Dan CraciunIT ConsultantCommented:
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv",
[string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv $outputPath -NoTypeInformation

Open in new window


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now