Solved

Remove duplicate words from list

Posted on 2006-06-23
15
516 Views
Last Modified: 2010-04-30
Need help with my syntax, my attempt is below, commented out with '<-------------
(language=vbscript)

Result needed: Remove duplicate words from a text file.

Current script:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     'If word1 = word2 then   '<----------------------------------------------
       ' objtest.writeline         '<---------------------------------------------------
     'End if       '<-----------------------------------------------------------------
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Wscript.Echo "Completed"
Wscript.quit

0
Comment
Question by:a23m2000
  • 5
  • 4
  • 3
  • +2
15 Comments
 
LVL 9

Expert Comment

by:justchat_1
ID: 16971993
Can you clarify the question:
are you trying to remove consecutive duplicates or all duplicates?

The code you gave looks correct to remove consecutive duplicates
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 16972268
Is it only one word per line?
0
 
LVL 9

Expert Comment

by:justchat_1
ID: 16972320
yes but are you trying to remove all duplicates:
a
b
c
b

would be:
a
b
c

or just consecutive:
a
b
b
c
b

would be:
a
b
c
b

...because your code only does the second option
0
 
LVL 9

Expert Comment

by:justchat_1
ID: 16972374
to do the first one you need to read all the items in text1 into an array

then filter it (http://www.devx.com/vb2themax/Tip/18977)

finally write it back to text2
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 16972455
Here is one way to remove all duplicate lines in the file:

Const ForReading = 1
Const ForWriting = 2

inputFile = "D:\Temp\sort\test3.txt"
outputFile = "D:\Temp\sort\test4.txt"

Set dict = CreateObject("Scripting.Dictionary")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objInput = objFSO.OpenTextFile(inputFile, ForReading)
Set objOutput = objFSO.OpenTextFile(outputFile, ForWriting, True)

While Not objInput.AtEndOfStream
    line = objInput.Readline
    If Not dict.Exists(line) Then
        dict.Add line, Nothing          
        objOutput.WriteLine line
    End If
Wend

objInput.Close
objOutput.Close

Wscript.Echo "Completed"
0
 
LVL 1

Expert Comment

by:Brownhead
ID: 16981640
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, " ")
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, " ")
Do Until (InStr(1, DelRepeats, "  ") <= 0)
    DelRepeats = Replace(DelRepeats, "  ", " ")
Loop
End Function
'\\Code//

Try that out :D, worked fine for me. It uses a space as the delimeter between the items, and is case sensitive. I can change or make variable either of those. But is this what you want?
0
 

Author Comment

by:a23m2000
ID: 17004151
I am trying to remove all duplicates from the file, even if they are not consecutive. Also, it is one (1) word per line. Example as shown above.

File1
a
b
c
b

would be: (File2)
a
b
c


Thanks
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 1

Expert Comment

by:Brownhead
ID: 17004899
'//Code\\
'Nothing is needed to use this code
Private Function DelRepeats(ByVal Str As String) As String
Dim aHold() As String, iCount As Integer, iCount2 As Integer
aHold = Split(Str, vbNewLine)
For iCount = 0 To UBound(aHold)
    For iCount2 = 0 To UBound(aHold)
        If (aHold(iCount) = aHold(iCount2) And iCount <> iCount2) Then aHold(iCount2) = ""
    Next iCount2
Next iCount
DelRepeats = Join(aHold, vbNewLine)
Do Until (InStr(1, DelRepeats, vbNewLine & vbNewLine) <= 0)
    DelRepeats = Replace(DelRepeats, vbNewLine & vbNewLine, vbNewLine)
Loop
End Function
'\\Code//

So you could, for example, say:

'//Code\\
Dim sHold as String
Open "C:/File1.txt" For Input As #1
    sHold = DelRepeats(Input(LOF(1), 1))
Close #1
Open "C:/File2.txt" For Output as #1
    Print #1, sHold
Close #1
'\\Code//

The above code would open File1.txt in the C drive, clear the repeats and save the modified file to File2.txt in the C drive.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17022070
Did you try my code a23m2000?...it does what you asked for...   =)
0
 
LVL 9

Expert Comment

by:justchat_1
ID: 17022117
If your a little confused, idle_mind did have the best working method...
0
 
LVL 1

Accepted Solution

by:
tguez earned 190 total points
ID: 17040095
I am sorry, the solutions you received are good for small files. If your word list is large, these solutions wil take a long time.

Your code is basically correct.  The only problem is that it will work only if your input file is sorted.  If it is sorted, then all duplicates will be removed.  It it is not sorted, then only consequetivies will be removed.

So if you want to remove all duplicates, you need to make sure your input file is sorted.

There are a few ways to sort the file. And then run the alogrithim you wrote.  This will be ok.

Now, if you sort the file the easiest way, then it will take exactly as long as the other guys wrote.  Which is not good for large files.  Something like 10,000 words or more, you will heat a big preformance hit.

What you can do to avoid sorting yourself.  You can use a small dirty trick.  Load all your words into a list box, which is placed on your form, button hidden form the user.  Set the sorted prompty of the listbox to true.  Then the list box will automatically sort all items added.

So, then just write a first route to load all words and add them to the list box.

When you are done, all the words are sorted.  Now just run your first program on the sorted list, and you'll be all set.

Of course, the dirty trick here is that I used the list box to the the sorting.  If you want to a professional programmer, then load all the file into an array, and write an algorithm to sort the array.  But this is a bit more difficult and requires carefully debugging to make sure your sort works.

Of course, this is all good for you if your files are 10K words or more.  For files under 1,000 words, you can use the code the guys gave you above.

Tomer
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17091239
So how did you end up solving this problem?

You stated you were using VBScript...which does not have a ListBox!

???
0
 

Author Comment

by:a23m2000
ID: 17108210
I sorted the file and then used this code below to Remove Duplicates.

Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2

    Set objFSO = CreateObject("Scripting.FileSystemObject")
    Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test3.txt", ForReading)
    Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test4.txt", ForWriting)
    Set objcopy = objfso.getfile ("D:\Temp\sort\test4.txt")
    Word1 = objFile.Readline
    Do Until objFile.AtEndOfStream 'Check to see if EOF
        word2 = objFile.Readline
        If word1 = word2 then
        else    
            objtest.writeline (word1)
        end if
        word1=word2
    loop
    objtest.writeline (word2)
    objfile.Close
    objtest.Close
    objcopy.copy("D:\Temp\sort\test3.txt")
Wscript.Echo "Completed sort"
Wscript.quit
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 17111153
Out of curiosity...how did you sort the file?
0
 

Author Comment

by:a23m2000
ID: 17146701
Const ForReading = 1
Const ForWriting = 2
Dim SW
Dim Word1
Dim Word2
Dim Word3 'Tempholder variable
SW=1

Do until SW=0  
SW=0

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objfile = objFSO.OpenTextFile("D:\Temp\sort\test.txt", ForReading)
Set objtest = objFSO.OpenTextFile("D:\Temp\sort\test2.txt", ForWriting)
Set objcopy = objfso.getfile ("D:\Temp\sort\test2.txt")

     Word1 = objFile.Readline
        'Wscript.Echo word1
  Do Until objFile.AtEndOfStream 'Check to see if EOF
     word2 = objFile.Readline
        'Wscript.Echo word2
     If word1 > word2 then 'Swaps alpebetical order of words
        word3=word1
        word1=word2
        word2=word3
        SW=1
     end if
     'If word1 = word2 then
       ' objtest.writeline
     'End if
    objtest.writeline (word1)
    Word1 = Word2
  loop
objtest.writeline (word2)
objfile.Close
objtest.Close
Objcopy.copy("D:\Temp\sort\test.txt")
Loop
Wscript.Echo "Completed sort"
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction While answering a recent question (http://www.experts-exchange.com/Q_27402310.html) in the VB classic zone, I wrote some VB code in the (Office) VBA environment, rather than fire up my older PC.  I didn't post completely correct code o…
Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now