Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

VbScript to remove Duplicate rows from CSV File

Posted on 2011-09-15
8
Medium Priority
?
2,686 Views
Last Modified: 2012-05-12
greetings Experts,

in the attempt to keep my procedure completely automated I'm wondering if anyone out there has a Vbscript that will scan a CSV file and  remove the duplicate rows

IE the CSV has 10000 rows  1000 duplicates
The script removes the 1000 duplicates and is left with 9000
0
Comment
Question by:plusone3055
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 36544128
By duplicate, do you mean duplicating the entire row, or just one or a few of the fields?
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36544831
entire row
0
 
LVL 57

Expert Comment

by:Bill Prew
ID: 36544863
Can they be sorted first, or do they have to stay in a random order that they occur?

~bp
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 93

Accepted Solution

by:
Patrick Matthews earned 1600 total points
ID: 36545408
Going by entire row...



Dim fso, tsIn, tsOut, dic, TheLine, Repeat

Set fso = CreateObject("Scripting.FileSystemObject")
Set dic = CreateObject("Scripting.Dictionary")
dic.CompareMode = vbTextCompare 'NOT case sensitive. omit for case sensitive
Set tsIn = fso.OpenTextFile("c:\Input.csv")
Set tsOut = fso.OpenTextFile("c:\Output.csv")

Do Until tsIn.AtEndOfStream
    TheLine = tsIn.ReadLine
    If TheLine <> "" Then
        If dic.Exists(TheLine) Then
            Repeat = True
        Else
            Repeat = False
            dic.Add TheLine, TheLine
        End If
    Else
        Repeat = False
    End If
    If Not Repeat Then tsOut.WriteLine TheLine
Loop

tsIn.Close
tsOut.Close
Set tsIn = Nothing
Set tsOut = Nothing
Set fso = Nothing
Set dic = Nothing

Open in new window




For more about the Dictionary, please see http://www.experts-exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/A_3391-Using-the-Dictionary-Class-in-VBA.html (although it is written from a VBA and not a VBScript perspective)
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550091
mathewsPatrick

when running that i get

Line: 21
Char: 24
Error: bad File Name
Code :800A0036
Source: VbScript Runtime error
0
 
LVL 57

Assisted Solution

by:Bill Prew
Bill Prew earned 400 total points
ID: 36550571
Try changing this line:

Set tsOut = fso.OpenTextFile("c:\Output.csv")

to

Set tsOut = fso.OpenTextFile("c:\Output.csv", 2, True)

I think the output file is being opened in read mode currently, and needs to be opened in write mode.

~bp
0
 
LVL 22

Author Comment

by:plusone3055
ID: 36550763
that worked

interesting thing though
if i remove duplicates manually from the CSV file it leaves me with 125,694 records

using the script to remove scripts and place in a new CSV leaves me with 125,204 records



 
 
0
 
LVL 57

Expert Comment

by:Bill Prew
ID: 36550916
Thanks for the assist.

What is the "manual process" that you use, the script seems to be setup to only find the true duplicates and remove them, with the exception of blank lines which are always written out.

~bp
0

Featured Post

Enroll in September's Course of the Month

This month’s featured course covers 16 hours of training in installation, management, and deployment of VMware vSphere virtualization environments. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Well hello again!  Glad to see you've made it this far without giving up.  In this, the fourth installment of my popular series, I'm going to cover functions and subroutines, what they are, and why they are useful.  Just in case you stumbled onto th…
Over the years I have built up my own little library of code snippets that I refer to when programming or writing a script.  Many of these have come from the web or adaptations from snippets I find on the Web.  Periodically I add to them when I come…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
Suggested Courses

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question