Solved

Windows batch or Powershell script to filter out duplicate records

Posted on 2014-01-15
6
839 Views
Last Modified: 2014-01-19
I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.


If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
Comment
Question by:sunhux
  • 3
  • 2
6 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39784536
Can you provide a sample?
0
 

Author Comment

by:sunhux
ID: 39784781
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

To make it more readable,
a) I converted attachment 1 into an xls file: the colored records
   of blue are duplicate of each other, green are duplicate of each
   other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
    leaving what's shown in attachment 3 (in xls format for ease
    of reading)
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 300 total points
ID: 39785150
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Open in new window

Of course, change paths accordingly.

HTH,
Dan
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 52

Assisted Solution

by:Bill Prew
Bill Prew earned 200 total points
ID: 39785187
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
  echo USAGE %~nx0 : [input-file] [output-file]
  exit /b
)

set InFile=%~1
set OutFile=%~2

(
  for /f "usebackq tokens=*" %%A in ("%InFile%") do (
    for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
      set Key=%%~B_%%~C
      set Key=!Key: =_!
      if not defined !Key! (
        set !Key!=Y
        echo %%~A
      )
    )
  )
) > "%OutFIle%"

Open in new window

~bp
0
 

Author Comment

by:sunhux
ID: 39788365
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 300 total points
ID: 39788376
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv",
[string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv $outputPath -NoTypeInformation

Open in new window


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction: Recently, I got a requirement to zip all files individually with batch file script in Windows OS. I don't know much about scripting, but I searched Google and found a lot of examples and websites to complete my task. Finally, I was ab…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now