Solved

VbScript to remove Duplicate rows from CSV File

Posted on 2011-09-15
8
2,440 Views
Last Modified: 2012-05-12
greetings Experts,

in the attempt to keep my procedure completely automated I'm wondering if anyone out there has a Vbscript that will scan a CSV file and  remove the duplicate rows

IE the CSV has 10000 rows  1000 duplicates
The script removes the 1000 duplicates and is left with 9000
0
Comment
Question by:plusone3055
  • 3
  • 3
  • 2
8 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 36544128
By duplicate, do you mean duplicating the entire row, or just one or a few of the fields?
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36544831
entire row
0
 
LVL 53

Expert Comment

by:Bill Prew
ID: 36544863
Can they be sorted first, or do they have to stay in a random order that they occur?

~bp
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 400 total points
ID: 36545408
Going by entire row...



Dim fso, tsIn, tsOut, dic, TheLine, Repeat

Set fso = CreateObject("Scripting.FileSystemObject")
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = vbTextCompare 'NOT case sensitive. omit for case sensitive
Set tsIn = fso.OpenTextFile("c:\Input.csv")
Set tsOut = fso.OpenTextFile("c:\Output.csv")

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    If TheLine <> "" Then
        If dic.Exists(TheLine) Then
            Repeat = True
        Else
            Repeat = False
            dic.Add TheLine, TheLine
        End If
    Else
        Repeat = False
    End If
    If Not Repeat Then tsOut.WriteLine TheLine
Loop

tsIn.Close
tsOut.Close
Set tsIn = Nothing
Set tsOut = Nothing
Set fso = Nothing
Set dic = Nothing

Open in new window




For more about the Dictionary, please see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html (although it is written from a VBA and not a VBScript perspective)
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550091
mathewsPatrick

when running that i get

Line: 21
Char: 24
Error: bad File Name
Code :800A0036
Source: VbScript Runtime error
0
 
LVL 53

Assisted Solution

by:Bill Prew
Bill Prew earned 100 total points
ID: 36550571
Try changing this line:

Set tsOut = fso.OpenTextFile("c:\Output.csv")

to

Set tsOut = fso.OpenTextFile("c:\Output.csv", 2, True)

I think the output file is being opened in read mode currently, and needs to be opened in write mode.

~bp
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550763
that worked

interesting thing though
if i remove duplicates manually from the CSV file it leaves me with 125,694 records

using the script to remove scripts and place in a new CSV leaves me with 125,204 records



 
 
0
 
LVL 53

Expert Comment

by:Bill Prew
ID: 36550916
Thanks for the assist.

What is the "manual process" that you use, the script seems to be setup to only find the true duplicates and remove them, with the exception of blank lines which are always written out.

~bp
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is an addendum to the following article: Acitve Directory based Outlook Signature (http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_24950055.html) The script is fine, and works in normal client-server domains…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question