Solved

VbScript to remove Duplicate rows from CSV File

Posted on 2011-09-15
8
2,341 Views
Last Modified: 2012-05-12
greetings Experts,

in the attempt to keep my procedure completely automated I'm wondering if anyone out there has a Vbscript that will scan a CSV file and  remove the duplicate rows

IE the CSV has 10000 rows  1000 duplicates
The script removes the 1000 duplicates and is left with 9000
0
Comment
Question by:plusone3055
  • 3
  • 3
  • 2
8 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 36544128
By duplicate, do you mean duplicating the entire row, or just one or a few of the fields?
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36544831
entire row
0
 
LVL 51

Expert Comment

by:Bill Prew
ID: 36544863
Can they be sorted first, or do they have to stay in a random order that they occur?

~bp
0
 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 400 total points
ID: 36545408
Going by entire row...



Dim fso, tsIn, tsOut, dic, TheLine, Repeat

Set fso = CreateObject("Scripting.FileSystemObject")
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = vbTextCompare 'NOT case sensitive. omit for case sensitive
Set tsIn = fso.OpenTextFile("c:\Input.csv")
Set tsOut = fso.OpenTextFile("c:\Output.csv")

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    If TheLine <> "" Then
        If dic.Exists(TheLine) Then
            Repeat = True
        Else
            Repeat = False
            dic.Add TheLine, TheLine
        End If
    Else
        Repeat = False
    End If
    If Not Repeat Then tsOut.WriteLine TheLine
Loop

tsIn.Close
tsOut.Close
Set tsIn = Nothing
Set tsOut = Nothing
Set fso = Nothing
Set dic = Nothing

Open in new window




For more about the Dictionary, please see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html (although it is written from a VBA and not a VBScript perspective)
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 22

Author Comment

by:plusone3055
ID: 36550091
mathewsPatrick

when running that i get

Line: 21
Char: 24
Error: bad File Name
Code :800A0036
Source: VbScript Runtime error
0
 
LVL 51

Assisted Solution

by:Bill Prew
Bill Prew earned 100 total points
ID: 36550571
Try changing this line:

Set tsOut = fso.OpenTextFile("c:\Output.csv")

to

Set tsOut = fso.OpenTextFile("c:\Output.csv", 2, True)

I think the output file is being opened in read mode currently, and needs to be opened in write mode.

~bp
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550763
that worked

interesting thing though
if i remove duplicates manually from the CSV file it leaves me with 125,694 records

using the script to remove scripts and place in a new CSV leaves me with 125,204 records



 
 
0
 
LVL 51

Expert Comment

by:Bill Prew
ID: 36550916
Thanks for the assist.

What is the "manual process" that you use, the script seems to be setup to only find the true duplicates and remove them, with the exception of blank lines which are always written out.

~bp
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

This article is the result of a quest to better understand Task Scheduler 2.0 and all the newer objects available in vbscript in this version over  the limited options we had scripting in Task Scheduler 1.0.  As I started my journey of knowledge I f…
Deploying a Microsoft Access application in a Citrix environment is not difficult but takes a few steps. However, Citrix system people are often of little help, as they typically know next to nothing about Access. The script provided here will take …
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now