Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

VbScript to remove Duplicate rows from CSV File

Posted on 2011-09-15
8
Medium Priority
?
2,750 Views
Last Modified: 2012-05-12
greetings Experts,

in the attempt to keep my procedure completely automated I'm wondering if anyone out there has a Vbscript that will scan a CSV file and  remove the duplicate rows

IE the CSV has 10000 rows  1000 duplicates
The script removes the 1000 duplicates and is left with 9000
0
Comment
Question by:plusone3055
  • 3
  • 3
  • 2
8 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 36544128
By duplicate, do you mean duplicating the entire row, or just one or a few of the fields?
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36544831
entire row
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 36544863
Can they be sorted first, or do they have to stay in a random order that they occur?

~bp
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 93

Accepted Solution

by:
Patrick Matthews earned 1600 total points
ID: 36545408
Going by entire row...



Dim fso, tsIn, tsOut, dic, TheLine, Repeat

Set fso = CreateObject("Scripting.FileSystemObject")
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = vbTextCompare 'NOT case sensitive. omit for case sensitive
Set tsIn = fso.OpenTextFile("c:\Input.csv")
Set tsOut = fso.OpenTextFile("c:\Output.csv")

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    If TheLine <> "" Then
        If dic.Exists(TheLine) Then
            Repeat = True
        Else
            Repeat = False
            dic.Add TheLine, TheLine
        End If
    Else
        Repeat = False
    End If
    If Not Repeat Then tsOut.WriteLine TheLine
Loop

tsIn.Close
tsOut.Close
Set tsIn = Nothing
Set tsOut = Nothing
Set fso = Nothing
Set dic = Nothing

Open in new window




For more about the Dictionary, please see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html (although it is written from a VBA and not a VBScript perspective)
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550091
mathewsPatrick

when running that i get

Line: 21
Char: 24
Error: bad File Name
Code :800A0036
Source: VbScript Runtime error
0
 
LVL 59

Assisted Solution

by:Bill Prew
Bill Prew earned 400 total points
ID: 36550571
Try changing this line:

Set tsOut = fso.OpenTextFile("c:\Output.csv")

to

Set tsOut = fso.OpenTextFile("c:\Output.csv", 2, True)

I think the output file is being opened in read mode currently, and needs to be opened in write mode.

~bp
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550763
that worked

interesting thing though
if i remove duplicates manually from the CSV file it leaves me with 125,694 records

using the script to remove scripts and place in a new CSV leaves me with 125,204 records



 
 
0
 
LVL 59

Expert Comment

by:Bill Prew
ID: 36550916
Thanks for the assist.

What is the "manual process" that you use, the script seems to be setup to only find the true duplicates and remove them, with the exception of blank lines which are always written out.

~bp
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Welcome back!  My apologies for taking so long to write part two of this series; it's been a long time coming!  As I promised in Part 1, this article will focus on how to locate those elusive AD properties that you are searching for.  Why is this us…
This script will sweep a range of IP addresses (class c only, 255.255.255.0) and report to a log the version of office installed. What it does: 1.)      Creates log file in the directory the script is run from (if it doesn't already exist) 2.)      Sweep…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…
Suggested Courses

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question