Link to home
Start Free TrialLog in
Avatar of a23m2000
a23m2000Flag for United States of America

asked on

Remove duplicate words from list

Need help with my syntax, my attempt is below, commented out with '<-------------
(language=vbscript)

Result needed: Remove duplicate words from a text file.

Current script:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     'If word1 = word2 then   '<----------------------------------------------
       ' objtest.writeline         '<---------------------------------------------------
     'End if       '<-----------------------------------------------------------------
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Wscript.Echo "Completed"
Wscript.quit

Avatar of justchat_1
justchat_1

Can you clarify the question:
are you trying to remove consecutive duplicates or all duplicates?

The code you gave looks correct to remove consecutive duplicates
Avatar of Mike Tomlinson
Is it only one word per line?
yes but are you trying to remove all duplicates:
a
b
c
b

would be:
a
b
c

or just consecutive:
a
b
b
c
b

would be:
a
b
c
b

...because your code only does the second option
to do the first one you need to read all the items in text1 into an array

then filter it (http://www.devx.com/vb2themax/Tip/18977)

finally write it back to text2
Here is one way to remove all duplicate lines in the file:

Const ForReading = 1
Const ForWriting = 2

inputFile = "D:\Temp\sort\test3.txt"
outputFile = "D:\Temp\sort\test4.txt"

Set dict = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInput = objFSO.OpenTextFile(inputFile, ForReading)
Set objOutput = objFSO.OpenTextFile(outputFile, ForWriting, True)

While Not objInput.AtEndOfStream
    line = objInput.Readline
    If Not dict.Exists(line) Then
        dict.Add line, Nothing          
        objOutput.WriteLine line
    End If
Wend

objInput.Close
objOutput.Close

Wscript.Echo "Completed"
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, " ")
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, " ")
Do Until (InStr(1, DelRepeats, "  ") <= 0)
    DelRepeats = Replace(DelRepeats, "  ", " ")
Loop
End Function
'\\Code//

Try that out :D, worked fine for me. It uses a space as the delimeter between the items, and is case sensitive. I can change or make variable either of those. But is this what you want?
Avatar of a23m2000

ASKER

I am trying to remove all duplicates from the file, even if they are not consecutive. Also, it is one (1) word per line. Example as shown above.

File1
a
b
c
b

would be: (File2)
a
b
c


Thanks
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, vbNewLine)
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, vbNewLine)
Do Until (InStr(1, DelRepeats, vbNewLine & vbNewLine) <= 0)
    DelRepeats = Replace(DelRepeats, vbNewLine & vbNewLine, vbNewLine)
Loop
End Function
'\\Code//

So you could, for example, say:

'//Code\\
Dim sHold as String
Open "C:/File1.txt" For Input As #1
    sHold = DelRepeats(Input(LOF(1), 1))
Close #1
Open "C:/File2.txt" For Output as #1
    Print #1, sHold
Close #1
'\\Code//

The above code would open File1.txt in the C drive, clear the repeats and save the modified file to File2.txt in the C drive.
Did you try my code a23m2000?...it does what you asked for...   =)
If your a little confused, idle_mind did have the best working method...
ASKER CERTIFIED SOLUTION
Avatar of tguez
tguez

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So how did you end up solving this problem?

You stated you were using VBScript...which does not have a ListBox!

???
I sorted the file and then used this code below to Remove Duplicates.

Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
    Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
    Set objcopy = objfso.getfile ("D:\Temp\sort\test4.txt")
    Word1 = objFile.Readline
    Do Until objFile.AtEndOfStream 'Check to see if EOF
        word2 = objFile.Readline
        If word1 = word2 then
        else    
            objtest.writeline (word1)
        end if
        word1=word2
    loop
    objtest.writeline (word2)
    objfile.Close
    objtest.Close
    objcopy.copy("D:\Temp\sort\test3.txt")
Wscript.Echo "Completed sort"
Wscript.quit
Out of curiosity...how did you sort the file?
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Dim Word3 'Tempholder variable
SW=1

Do until SW=0  
SW=0

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test2.txt", ForWriting)
Set objcopy = objfso.getfile ("D:\Temp\sort\test2.txt")

     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     If word1 > word2 then 'Swaps alpebetical order of words
        word3=word1
        word1=word2
        word2=word3
        SW=1
     end if
     'If word1 = word2 then
       ' objtest.writeline
     'End if
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Objcopy.copy("D:\Temp\sort\test.txt")
Loop
Wscript.Echo "Completed sort"