Solved

VbScript to remove Duplicate rows from CSV File

Posted on 2011-09-15
8
2,397 Views
Last Modified: 2012-05-12
greetings Experts,

in the attempt to keep my procedure completely automated I'm wondering if anyone out there has a Vbscript that will scan a CSV file and  remove the duplicate rows

IE the CSV has 10000 rows  1000 duplicates
The script removes the 1000 duplicates and is left with 9000
0
Comment
Question by:plusone3055
  • 3
  • 3
  • 2
8 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 36544128
By duplicate, do you mean duplicating the entire row, or just one or a few of the fields?
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36544831
entire row
0
 
LVL 53

Expert Comment

by:Bill Prew
ID: 36544863
Can they be sorted first, or do they have to stay in a random order that they occur?

~bp
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 400 total points
ID: 36545408
Going by entire row...



Dim fso, tsIn, tsOut, dic, TheLine, Repeat

Set fso = CreateObject("Scripting.FileSystemObject")
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = vbTextCompare 'NOT case sensitive. omit for case sensitive
Set tsIn = fso.OpenTextFile("c:\Input.csv")
Set tsOut = fso.OpenTextFile("c:\Output.csv")

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    If TheLine <> "" Then
        If dic.Exists(TheLine) Then
            Repeat = True
        Else
            Repeat = False
            dic.Add TheLine, TheLine
        End If
    Else
        Repeat = False
    End If
    If Not Repeat Then tsOut.WriteLine TheLine
Loop

tsIn.Close
tsOut.Close
Set tsIn = Nothing
Set tsOut = Nothing
Set fso = Nothing
Set dic = Nothing

Open in new window




For more about the Dictionary, please see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html (although it is written from a VBA and not a VBScript perspective)
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550091
mathewsPatrick

when running that i get

Line: 21
Char: 24
Error: bad File Name
Code :800A0036
Source: VbScript Runtime error
0
 
LVL 53

Assisted Solution

by:Bill Prew
Bill Prew earned 100 total points
ID: 36550571
Try changing this line:

Set tsOut = fso.OpenTextFile("c:\Output.csv")

to

Set tsOut = fso.OpenTextFile("c:\Output.csv", 2, True)

I think the output file is being opened in read mode currently, and needs to be opened in write mode.

~bp
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550763
that worked

interesting thing though
if i remove duplicates manually from the CSV file it leaves me with 125,694 records

using the script to remove scripts and place in a new CSV leaves me with 125,204 records



 
 
0
 
LVL 53

Expert Comment

by:Bill Prew
ID: 36550916
Thanks for the assist.

What is the "manual process" that you use, the script seems to be setup to only find the true duplicates and remove them, with the exception of blank lines which are always written out.

~bp
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

When it comes to writing scripts for a Client/Server computing environment it is essential to consider some way of enabling the authentication functionality within a script. This sort of consideration mainly comes into the picture when we are dealin…
This script will sweep a range of IP addresses (class c only, 255.255.255.0) and report to a log the version of office installed. What it does: 1.)      Creates log file in the directory the script is run from (if it doesn't already exist) 2.)      Sweep…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question