Solved

Remove duplicate words from list

Posted on 2006-06-23
15
518 Views
Last Modified: 2010-04-30
Need help with my syntax, my attempt is below, commented out with '<-------------
(language=vbscript)

Result needed: Remove duplicate words from a text file.

Current script:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     'If word1 = word2 then   '<----------------------------------------------
       ' objtest.writeline         '<---------------------------------------------------
     'End if       '<-----------------------------------------------------------------
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Wscript.Echo "Completed"
Wscript.quit

0
Comment
Question by:a23m2000
  • 5
  • 4
  • 3
  • +2
15 Comments
 
LVL 9

Expert Comment

by:justchat_1
ID: 16971993
Can you clarify the question:
are you trying to remove consecutive duplicates or all duplicates?

The code you gave looks correct to remove consecutive duplicates
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 16972268
Is it only one word per line?
0
 
LVL 9

Expert Comment

by:justchat_1
ID: 16972320
yes but are you trying to remove all duplicates:
a
b
c
b

would be:
a
b
c

or just consecutive:
a
b
b
c
b

would be:
a
b
c
b

...because your code only does the second option
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 9

Expert Comment

by:justchat_1
ID: 16972374
to do the first one you need to read all the items in text1 into an array

then filter it (http://www.devx.com/vb2themax/Tip/18977)

finally write it back to text2
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 16972455
Here is one way to remove all duplicate lines in the file:

Const ForReading = 1
Const ForWriting = 2

inputFile = "D:\Temp\sort\test3.txt"
outputFile = "D:\Temp\sort\test4.txt"

Set dict = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInput = objFSO.OpenTextFile(inputFile, ForReading)
Set objOutput = objFSO.OpenTextFile(outputFile, ForWriting, True)

While Not objInput.AtEndOfStream
    line = objInput.Readline
    If Not dict.Exists(line) Then
        dict.Add line, Nothing          
        objOutput.WriteLine line
    End If
Wend

objInput.Close
objOutput.Close

Wscript.Echo "Completed"
0
 
LVL 1

Expert Comment

by:Brownhead
ID: 16981640
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, " ")
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, " ")
Do Until (InStr(1, DelRepeats, "  ") <= 0)
    DelRepeats = Replace(DelRepeats, "  ", " ")
Loop
End Function
'\\Code//

Try that out :D, worked fine for me. It uses a space as the delimeter between the items, and is case sensitive. I can change or make variable either of those. But is this what you want?
0
 

Author Comment

by:a23m2000
ID: 17004151
I am trying to remove all duplicates from the file, even if they are not consecutive. Also, it is one (1) word per line. Example as shown above.

File1
a
b
c
b

would be: (File2)
a
b
c


Thanks
0
 
LVL 1

Expert Comment

by:Brownhead
ID: 17004899
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, vbNewLine)
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, vbNewLine)
Do Until (InStr(1, DelRepeats, vbNewLine & vbNewLine) <= 0)
    DelRepeats = Replace(DelRepeats, vbNewLine & vbNewLine, vbNewLine)
Loop
End Function
'\\Code//

So you could, for example, say:

'//Code\\
Dim sHold as String
Open "C:/File1.txt" For Input As #1
    sHold = DelRepeats(Input(LOF(1), 1))
Close #1
Open "C:/File2.txt" For Output as #1
    Print #1, sHold
Close #1
'\\Code//

The above code would open File1.txt in the C drive, clear the repeats and save the modified file to File2.txt in the C drive.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17022070
Did you try my code a23m2000?...it does what you asked for...   =)
0
 
LVL 9

Expert Comment

by:justchat_1
ID: 17022117
If your a little confused, idle_mind did have the best working method...
0
 
LVL 1

Accepted Solution

by:
tguez earned 190 total points
ID: 17040095
I am sorry, the solutions you received are good for small files. If your word list is large, these solutions wil take a long time.

Your code is basically correct.  The only problem is that it will work only if your input file is sorted.  If it is sorted, then all duplicates will be removed.  It it is not sorted, then only consequetivies will be removed.

So if you want to remove all duplicates, you need to make sure your input file is sorted.

There are a few ways to sort the file. And then run the alogrithim you wrote.  This will be ok.

Now, if you sort the file the easiest way, then it will take exactly as long as the other guys wrote.  Which is not good for large files.  Something like 10,000 words or more, you will heat a big preformance hit.

What you can do to avoid sorting yourself.  You can use a small dirty trick.  Load all your words into a list box, which is placed on your form, button hidden form the user.  Set the sorted prompty of the listbox to true.  Then the list box will automatically sort all items added.

So, then just write a first route to load all words and add them to the list box.

When you are done, all the words are sorted.  Now just run your first program on the sorted list, and you'll be all set.

Of course, the dirty trick here is that I used the list box to the the sorting.  If you want to a professional programmer, then load all the file into an array, and write an algorithm to sort the array.  But this is a bit more difficult and requires carefully debugging to make sure your sort works.

Of course, this is all good for you if your files are 10K words or more.  For files under 1,000 words, you can use the code the guys gave you above.

Tomer
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17091239
So how did you end up solving this problem?

You stated you were using VBScript...which does not have a ListBox!

???
0
 

Author Comment

by:a23m2000
ID: 17108210
I sorted the file and then used this code below to Remove Duplicates.

Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
    Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
    Set objcopy = objfso.getfile ("D:\Temp\sort\test4.txt")
    Word1 = objFile.Readline
    Do Until objFile.AtEndOfStream 'Check to see if EOF
        word2 = objFile.Readline
        If word1 = word2 then
        else    
            objtest.writeline (word1)
        end if
        word1=word2
    loop
    objtest.writeline (word2)
    objfile.Close
    objtest.Close
    objcopy.copy("D:\Temp\sort\test3.txt")
Wscript.Echo "Completed sort"
Wscript.quit
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17111153
Out of curiosity...how did you sort the file?
0
 

Author Comment

by:a23m2000
ID: 17146701
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Dim Word3 'Tempholder variable
SW=1

Do until SW=0  
SW=0

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test2.txt", ForWriting)
Set objcopy = objfso.getfile ("D:\Temp\sort\test2.txt")

     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     If word1 > word2 then 'Swaps alpebetical order of words
        word3=word1
        word1=word2
        word2=word3
        SW=1
     end if
     'If word1 = word2 then
       ' objtest.writeline
     'End if
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Objcopy.copy("D:\Temp\sort\test.txt")
Loop
Wscript.Echo "Completed sort"
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Article by: Martin
Here are a few simple, working, games that you can use as-is or as the basis for your own games. Tic-Tac-Toe This is one of the simplest of all games.   The game allows for a choice of who goes first and keeps track of the number of wins for…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question