Solved

Windows batch or Powershell script to filter out duplicate records

Posted on 2014-01-15
6
889 Views
Last Modified: 2014-01-19
I have a daily csv (comma separated value) text file :
need  a Windows batch (preferred) or Powershell
script to sort by 2nd & 3rd columns (which are the
user date+time & userid values) : if there's duplicate
(by the primary+secondary keys ie col 2 & 3), only
extract out the 1st record & output (ie append >> )
to a file & continue with the rest of the records
& output (ie append) to the output file.


If necessary, a Shell script is welcome too, just
that I hv to hunt for a Linux box to do this filtering
job
0
Comment
Question by:sunhux
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 39784536
Can you provide a sample?
0
 

Author Comment

by:sunhux
ID: 39784781
Correction, the primary key is column 1 while column 5 is
the secondary key

Attachment 1 is a sample csv file that has duplicate entries.

To make it more readable,
a) I converted attachment 1 into an xls file: the colored records
   of blue are duplicate of each other, green are duplicate of each
   other & red are duplicate of each other ==> attachment 2
b) After running the filtering script, four records will be removed,
    leaving what's shown in attachment 3 (in xls format for ease
    of reading)
Samplepassreset.csv
Samplepassreset1.xls
Samplepassreset2.xls
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 300 total points
ID: 39785150
Try this, in powershell:

Import-Csv X:\path\to\input.csv -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv X:\path\to\output.csv -NoTypeInformation

Open in new window

Of course, change paths accordingly.

HTH,
Dan
0
Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

 
LVL 55

Assisted Solution

by:Bill Prew
Bill Prew earned 200 total points
ID: 39785187
Here is a small BAT script that should do the job.  Save as a BAT and run passing it two file names, the existing input CSV file to be processed, and then the new output filename to write the de-duplicated results to, for example:

EE28339972.bat Samplepassreset.csv out.csv

@echo off
setlocal EnableDelayedExpansion

if "%~2" EQU "" (
  echo USAGE %~nx0 : [input-file] [output-file]
  exit /b
)

set InFile=%~1
set OutFile=%~2

(
  for /f "usebackq tokens=*" %%A in ("%InFile%") do (
    for /f "tokens=1,5 delims=," %%B in ("%%~A") do (
      set Key=%%~B_%%~C
      set Key=!Key: =_!
      if not defined !Key! (
        set !Key!=Y
        echo %%~A
      )
    )
  )
) > "%OutFIle%"

Open in new window

~bp
0
 

Author Comment

by:sunhux
ID: 39788365
Thanks chaps.

Btw, how do I run the 1-liner Powershell?  Just issue that
1 line exactly as shown?
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 300 total points
ID: 39788376
In Powershell, yes.

Or, you can use it as a script:

Param (
[string]$inputPath = "input.csv",
[string]$outputPath = "output.csv"
)

Import-Csv $inputPath -Header h1, h2, h3, h4, h5, h6, h7, h8 | Sort-Object h1, h5 -Unique | Export-Csv $outputPath -NoTypeInformation

Open in new window


Save it as whatever.ps1 and then run it as
whatever.ps1 X:\path\to\input.csv X:\path\to\output.csv
0

Featured Post

What Is Transaction Monitoring and who needs it?

Synthetic Transaction Monitoring that you need for the day to day, which ensures your business website keeps running optimally, and that there is no downtime to impact your customer experience.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Introduction: Recently, I got a requirement to zip all files individually with batch file script in Windows OS. I don't know much about scripting, but I searched Google and found a lot of examples and websites to complete my task. Finally, I was ab…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question