a23m2000
asked on
Remove duplicate words from list
Need help with my syntax, my attempt is below, commented out with '<-------------
(language=vbscript)
Result needed: Remove duplicate words from a text file.
Current script:
Const ForReading = 1
Const ForWriting = 2
Set objFSO = CreateObject("Scripting.Fi leSystemOb ject")
Set objfile = objFSO.OpenTextFile("D:\Te mp\sort\te st3.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Te mp\sort\te st4.txt", ForWriting)
Word1 = objFile.Readline
'Wscript.Echo word1
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
'Wscript.Echo word2
'If word1 = word2 then '<------------------------ ---------- ---------- --
' objtest.writeline '<------------------------ ---------- ---------- -------
'End if '<------------------------ ---------- ---------- ---------- ---------- -
objtest.writeline (word1)
Word1 = Word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Wscript.Echo "Completed"
Wscript.quit
(language=vbscript)
Result needed: Remove duplicate words from a text file.
Current script:
Const ForReading = 1
Const ForWriting = 2
Set objFSO = CreateObject("Scripting.Fi
Set objfile = objFSO.OpenTextFile("D:\Te
Set objtest = objFSO.OpenTextFile("D:\Te
Word1 = objFile.Readline
'Wscript.Echo word1
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
'Wscript.Echo word2
'If word1 = word2 then '<------------------------
' objtest.writeline '<------------------------
'End if '<------------------------
objtest.writeline (word1)
Word1 = Word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Wscript.Echo "Completed"
Wscript.quit
Is it only one word per line?
yes but are you trying to remove all duplicates:
a
b
c
b
would be:
a
b
c
or just consecutive:
a
b
b
c
b
would be:
a
b
c
b
...because your code only does the second option
a
b
c
b
would be:
a
b
c
or just consecutive:
a
b
b
c
b
would be:
a
b
c
b
...because your code only does the second option
to do the first one you need to read all the items in text1 into an array
then filter it (http://www.devx.com/vb2themax/Tip/18977)
finally write it back to text2
then filter it (http://www.devx.com/vb2themax/Tip/18977)
finally write it back to text2
Here is one way to remove all duplicate lines in the file:
Const ForReading = 1
Const ForWriting = 2
inputFile = "D:\Temp\sort\test3.txt"
outputFile = "D:\Temp\sort\test4.txt"
Set dict = CreateObject("Scripting.Di ctionary")
Set objFSO = CreateObject("Scripting.Fi leSystemOb ject")
Set objInput = objFSO.OpenTextFile(inputF ile, ForReading)
Set objOutput = objFSO.OpenTextFile(output File, ForWriting, True)
While Not objInput.AtEndOfStream
line = objInput.Readline
If Not dict.Exists(line) Then
dict.Add line, Nothing
objOutput.WriteLine line
End If
Wend
objInput.Close
objOutput.Close
Wscript.Echo "Completed"
Const ForReading = 1
Const ForWriting = 2
inputFile = "D:\Temp\sort\test3.txt"
outputFile = "D:\Temp\sort\test4.txt"
Set dict = CreateObject("Scripting.Di
Set objFSO = CreateObject("Scripting.Fi
Set objInput = objFSO.OpenTextFile(inputF
Set objOutput = objFSO.OpenTextFile(output
While Not objInput.AtEndOfStream
line = objInput.Readline
If Not dict.Exists(line) Then
dict.Add line, Nothing
objOutput.WriteLine line
End If
Wend
objInput.Close
objOutput.Close
Wscript.Echo "Completed"
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, " ")
For iCount = 0 To UBound(aHold)
For iCount2 = 0 To UBound(aHold)
If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
Next iCount2
Next iCount
DelRepeats = Join(aHold, " ")
Do Until (InStr(1, DelRepeats, " ") <= 0)
DelRepeats = Replace(DelRepeats, " ", " ")
Loop
End Function
'\\Code//
Try that out :D, worked fine for me. It uses a space as the delimeter between the items, and is case sensitive. I can change or make variable either of those. But is this what you want?
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, " ")
For iCount = 0 To UBound(aHold)
For iCount2 = 0 To UBound(aHold)
If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
Next iCount2
Next iCount
DelRepeats = Join(aHold, " ")
Do Until (InStr(1, DelRepeats, " ") <= 0)
DelRepeats = Replace(DelRepeats, " ", " ")
Loop
End Function
'\\Code//
Try that out :D, worked fine for me. It uses a space as the delimeter between the items, and is case sensitive. I can change or make variable either of those. But is this what you want?
ASKER
I am trying to remove all duplicates from the file, even if they are not consecutive. Also, it is one (1) word per line. Example as shown above.
File1
a
b
c
b
would be: (File2)
a
b
c
Thanks
File1
a
b
c
b
would be: (File2)
a
b
c
Thanks
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, vbNewLine)
For iCount = 0 To UBound(aHold)
For iCount2 = 0 To UBound(aHold)
If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
Next iCount2
Next iCount
DelRepeats = Join(aHold, vbNewLine)
Do Until (InStr(1, DelRepeats, vbNewLine & vbNewLine) <= 0)
DelRepeats = Replace(DelRepeats, vbNewLine & vbNewLine, vbNewLine)
Loop
End Function
'\\Code//
So you could, for example, say:
'//Code\\
Dim sHold as String
Open "C:/File1.txt" For Input As #1
sHold = DelRepeats(Input(LOF(1), 1))
Close #1
Open "C:/File2.txt" For Output as #1
Print #1, sHold
Close #1
'\\Code//
The above code would open File1.txt in the C drive, clear the repeats and save the modified file to File2.txt in the C drive.
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, vbNewLine)
For iCount = 0 To UBound(aHold)
For iCount2 = 0 To UBound(aHold)
If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
Next iCount2
Next iCount
DelRepeats = Join(aHold, vbNewLine)
Do Until (InStr(1, DelRepeats, vbNewLine & vbNewLine) <= 0)
DelRepeats = Replace(DelRepeats, vbNewLine & vbNewLine, vbNewLine)
Loop
End Function
'\\Code//
So you could, for example, say:
'//Code\\
Dim sHold as String
Open "C:/File1.txt" For Input As #1
sHold = DelRepeats(Input(LOF(1), 1))
Close #1
Open "C:/File2.txt" For Output as #1
Print #1, sHold
Close #1
'\\Code//
The above code would open File1.txt in the C drive, clear the repeats and save the modified file to File2.txt in the C drive.
Did you try my code a23m2000?...it does what you asked for... =)
If your a little confused, idle_mind did have the best working method...
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
So how did you end up solving this problem?
You stated you were using VBScript...which does not have a ListBox!
???
You stated you were using VBScript...which does not have a ListBox!
???
ASKER
I sorted the file and then used this code below to Remove Duplicates.
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Set objFSO = CreateObject("Scripting.Fi leSystemOb ject")
Set objfile = objFSO.OpenTextFile("D:\Te mp\sort\te st3.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Te mp\sort\te st4.txt", ForWriting)
Set objcopy = objfso.getfile ("D:\Temp\sort\test4.txt")
Word1 = objFile.Readline
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
If word1 = word2 then
else
objtest.writeline (word1)
end if
word1=word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
objcopy.copy("D:\Temp\sort \test3.txt ")
Wscript.Echo "Completed sort"
Wscript.quit
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Set objFSO = CreateObject("Scripting.Fi
Set objfile = objFSO.OpenTextFile("D:\Te
Set objtest = objFSO.OpenTextFile("D:\Te
Set objcopy = objfso.getfile ("D:\Temp\sort\test4.txt")
Word1 = objFile.Readline
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
If word1 = word2 then
else
objtest.writeline (word1)
end if
word1=word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
objcopy.copy("D:\Temp\sort
Wscript.Echo "Completed sort"
Wscript.quit
Out of curiosity...how did you sort the file?
ASKER
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Dim Word3 'Tempholder variable
SW=1
Do until SW=0
SW=0
Set objFSO = CreateObject("Scripting.Fi leSystemOb ject")
Set objfile = objFSO.OpenTextFile("D:\Te mp\sort\te st.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Te mp\sort\te st2.txt", ForWriting)
Set objcopy = objfso.getfile ("D:\Temp\sort\test2.txt")
Word1 = objFile.Readline
'Wscript.Echo word1
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
'Wscript.Echo word2
If word1 > word2 then 'Swaps alpebetical order of words
word3=word1
word1=word2
word2=word3
SW=1
end if
'If word1 = word2 then
' objtest.writeline
'End if
objtest.writeline (word1)
Word1 = Word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Objcopy.copy("D:\Temp\sort \test.txt" )
Loop
Wscript.Echo "Completed sort"
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Dim Word3 'Tempholder variable
SW=1
Do until SW=0
SW=0
Set objFSO = CreateObject("Scripting.Fi
Set objfile = objFSO.OpenTextFile("D:\Te
Set objtest = objFSO.OpenTextFile("D:\Te
Set objcopy = objfso.getfile ("D:\Temp\sort\test2.txt")
Word1 = objFile.Readline
'Wscript.Echo word1
Do Until objFile.AtEndOfStream 'Check to see if EOF
word2 = objFile.Readline
'Wscript.Echo word2
If word1 > word2 then 'Swaps alpebetical order of words
word3=word1
word1=word2
word2=word3
SW=1
end if
'If word1 = word2 then
' objtest.writeline
'End if
objtest.writeline (word1)
Word1 = Word2
loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Objcopy.copy("D:\Temp\sort
Loop
Wscript.Echo "Completed sort"
are you trying to remove consecutive duplicates or all duplicates?
The code you gave looks correct to remove consecutive duplicates