Link to home
Start Free TrialLog in
Avatar of Invader_RM
Invader_RM

asked on

Word List Gen

Hello.

I am trying to make my own Spell Checker, and I need some code (a fast code) that can generate every possible combination of a string for a certain length.

EG:

Characters - abc
length - 3

It would then output every possible combination in to a text document. So in the text doucment it would look something like this.

aaa aab aac aba abb abc aca acb acc

And so on.

I need a fast code. With my old program that I made, it did do it fast, but when you came to do a length of above 3 it became very slow. To do every possible combination with characters of a-z + A-Z + 0-9 it would take for a length of 4 about 35 mins.

I did mange to download a very fast Word List gen but it did not output the strings in the way I wanted it too But it was Very, Very fast it could do characters of  a-z + A-Z + 0-9  with a length of 4 in less than 1 min.

Is there any code that you can provide me with so I can make a very fast one?

Thank You

Avatar of JR2003
JR2003

a-z + A-Z + 0-9 give 62 values. The number permutations of selecting 3 values from 62 is 62^3 or 238,328.
When you are concatenating this many items in a string the string reallocates memory for itself every time you append an item on the end, this is what will take the time in VB.
To make this as fast as possible you should calculate the length of the string before you start and make the string that long full of spaces.
Then use the 'Mid' function to write each word to the required psosition within the string.

This function takes about 1 second to complete on my pc.

JR

Option Explicit

Private Sub Command1_Click()

    Dim sAlphaNumerics As String
    Dim sResult As String
    Dim iLenResult As Long
    Dim iLenWord As Long
    iLenWord = 3 '<< This is the length of the word
    Dim i As Long
    Dim j As Long
    Dim k As Long
    Dim iPos As Long
    sAlphaNumerics = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
    iLenResult = (Len(sAlphaNumerics) ^ iLenWord) * (iLenWord + 1) - 1
    sResult = Space(iLenResult)
    iPos = 1
    For i = 1 To Len(sAlphaNumerics)
        For j = 1 To Len(sAlphaNumerics)
            For k = 1 To Len(sAlphaNumerics)
                Mid$(sResult, iPos, iLenWord) = Mid(sAlphaNumerics, i, 1) & Mid(sAlphaNumerics, j, 1) & Mid(sAlphaNumerics, k, 1)
                iPos = iPos + iLenWord + 1
            Next k
        Next j
    Next i
    MsgBox sResult '<< sResult contains the required result string

End Sub
Hi Invader_RM,

I can write generic code that can handle any word length and doesn't have a fixed number of loops in it as JR2003 has done.

But...

How does generating every possible combination of a series of letters tie in with a spell checker?  What is it exactly you are going to do with this list?  Your word list is going to get exponentially bigger as the word length increases.  The speed at which you generate these values will be irrevelant if you have to walk a file with billions of lines in it.

Usually you start with a known list of words and check your input against that.  There are various ways to represent a known word list in memory that make checking if a word is valid very fast.

Idle_Mind
Invader_RM,

As Idle Mind says, you would need to hard code another loop in the function I gave above if you want to run it for 4 character words. The code below is more generic and allows you just to alter jus the variable 'iLenWord' to determine the word length. However I don't think it will work with more than 4 characters due to the length of string you would need to store the result. At 4 characters the string needs to be (62^4) * 5 characters long or about 70Mb. If it were 5 characters long the result would be (62^5) * 6  characters long or over 5Gb. It's doubtful there is enough memory on a computer to store such a string. The code could easily be modufied to output the result in partial stages.

Also the use of this would not be at all useful for a spell checking program.

I compiled the following code with all the compilation optimisation settings set and it ran on 4 a character word in about 20 seconds on my P4 2.66 Ghz machine.

JR

Option Explicit

Private Sub Command1_Click()

    Dim sAlphaNumerics As String
    Dim sResult As String
    Dim sWord As String
    Dim iLenResult As Long
    Dim iLenWord As Long
    iLenWord = 4 '<< This is the length of the word
    Dim iCurrIndex As Long
    Dim iIterators() As Long
    Dim i As Long
    Dim iPos As Long
    sAlphaNumerics = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
   
    iLenResult = (Len(sAlphaNumerics) ^ iLenWord) * (iLenWord + 1) - 1
    ReDim iIterators(iLenWord - 1)
   
    sResult = Space(iLenResult)
    MsgBox "Allocated memory to string about to start"
    iPos = 1
    iCurrIndex = iLenWord - 1
    Do While iIterators(0) < Len(sAlphaNumerics)
        sWord = Space(iLenWord) 'Initialise
        For i = 0 To iLenWord - 1
            Mid$(sWord, i + 1, 1) = Mid$(sAlphaNumerics, iIterators(i) + 1, 1)
        Next i
        Mid$(sResult, iPos, iLenWord) = sWord 'Add the next word
        iPos = iPos + iLenWord + 1 'Change the offset
        iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1
        If iIterators(iCurrIndex) >= Len(sAlphaNumerics) Then
            While iIterators(iCurrIndex) >= Len(sAlphaNumerics)
                iIterators(iCurrIndex) = 0
                iCurrIndex = iCurrIndex - 1
                If iCurrIndex < 0 Then
                    Exit Do
                End If
                iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1
            Wend
            Debug.Print sWord
            iCurrIndex = iLenWord - 1
        End If
    Loop
    MsgBox "finished", vbInformation '<< sResult contains the required result string
    RichTextBox1.Text = sResult
End Sub

Avatar of Invader_RM

ASKER

Hi, Thanks for all your help.

It's Kinda of a Spell Checker, But i just need all the combinations, it's kinda werid, so I wont go in to much detail.

Anyway JR2003:

I think you have me wrong, I would like code that Doesn't output every combination all at one time in one string, I want it so that:

result = aaa

then the next time around getting another combiantion

result = aab

The result varible needs to be overwritten for each combination, and then for each combination print it in to a word document. So it doesn't end up as one big string. Therefore not alot of memory will be taken up.

Thanks for your code anyway, I am bout to try it, but would it be possible for what i said above?
Edit - Forgot to say, Can you also Add Notes on to the code please, because I've tried reading it and seeing what it does, but, I cant find out how it does it. Thansk again, (just tested code, It's nice and fast)
Invader_RM,

You can add code to the code below to make it do what you want with the variable sWord each time it generates the next word in the series, it may slow the process down though as literally millions of words are generated. Comments have been added.

JR



Option Explicit

Private Sub Command1_Click()

    Dim sAlphaNumerics As String
    Dim sResult As String
    Dim sWord As String
    Dim iLenResult As Long
    Dim iLenWord As Long
    iLenWord = 4 '<< This is the length of the word
    Dim iCurrIndex As Long
    Dim iIterators() As Long '<< This is a list of counters that act like indexes inside nested for loops
    Dim i As Long
    Dim iPos As Long
   
    sAlphaNumerics = "abcdefghijklmnopqrstuvwxyz" & _
                     "ABCDEFGHIJKLMNOPQRSTUVWXYZ" & _
                     "0123456789"                       '<< This is the list of letters the word will be made up of
   
    iLenResult = (Len(sAlphaNumerics) ^ iLenWord) * (iLenWord + 1) - 1  '<<Calculate the length of the string required to store the entire result
   
    ReDim iIterators(iLenWord - 1)  'Set up an array with a counter for each letter in the word
   
    sResult = Space(iLenResult)  'Allocate space for the entire result in a string
    MsgBox "Allocated memory to string about to start"
   
    iPos = 1  '<< Initialsie
    iCurrIndex = iLenWord - 1   'Set the current index to the innermost counter
    sWord = Space(iLenWord) 'Initialise the word length to the length of the required word
    Do While iIterators(0) < Len(sAlphaNumerics)
   
        'Create the next word in the series
        For i = 0 To iLenWord - 1
            Mid$(sWord, i + 1, 1) = Mid$(sAlphaNumerics, iIterators(i) + 1, 1)
        Next i
       
        '
        '***  sWord now contains the next word in the series
        '***  You can put code here to do something other than write the word to the string
        '
       
        Mid$(sResult, iPos, iLenWord) = sWord 'Add the next word to the big string
        iPos = iPos + iLenWord + 1 'Change the offset
       
        iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1      '<< Add 1 to the inner most loop
        If iIterators(iCurrIndex) >= Len(sAlphaNumerics) Then    '<< the inner most loop counter has reached its limit so need to carry 1 to the next outer loop etc...
            While iIterators(iCurrIndex) >= Len(sAlphaNumerics)
                iIterators(iCurrIndex) = 0
                iCurrIndex = iCurrIndex - 1
                If iCurrIndex < 0 Then
                    Exit Do
                End If
                iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1   '<< Add 1 to the loop counter
            Wend
            iCurrIndex = iLenWord - 1
        End If
    Loop
    MsgBox "finished", vbInformation '<< sResult contains the required result string
End Sub

Thanks, I manged to get it working, But, every enrty it prints to the file, it has " Marks around it. Also you can't make them over 4 :( Thanks for your help anyway, it was much apprecated, but i would like to make them up to like 10 - 12 at a max. anyone else?
Invader_RM,

I don't think you are grasping how many combinations you are tyring to calculate:

 62^1 = 62
 62^2 = 3844
 62^3 = 238328
 62^4 = 14776336
 62^5 = 91613282
 62^6 = 5.680023558e10
 62^7 = 3.521614606e12
 62^8 = 2.183401056e14
 62^9 = 1.353708655e16
62^10 = 8.392993659e17
62^11 = 5.203656068e19
62^12 = 3.226266762e21

That is more than 322,626,676,200,000,000,000 combinations.

I don't care how fast your algorithm is...you will be dead before your program churns out that many combinations!

>> It's Kinda of a Spell Checker, But i just need all the combinations, it's kinda werid, so I wont go in to much detail.

You recently posted a question on MD5 encryption.  Are you trying to write a password cracker?

Idle_Mind
lol, I do know how many it is yes. And no i'm not making a password cracker, I say it's kinda like a spell checker, because its for a word search game, but nm looks like I will have to use my old program :(
Perhaps if you explained what you are doing with your word search game, we could help you come up with a different approach to solve the problem.

~IM
Its just your standard Word Search game, does it really matter? all I asked for was some code, and now im getting battered for u saying im making a passowrd cracker. Nice people we have here.

Forget it i will use my old program.
I'm sorry, I didn't mean to offend you.  =)

>> does it really matter? all I asked for was some code,

The spirit of EE is not to only provide code, but to also solve problems.  If an approach taken by an author is "computationally unfeasible" then the experts will often post entirely different algorithms that can solve the problem in a different way.

I hope I haven't made your experience at EE an unpleasant one.

Regards,

Idle_Mind
Here, use this code it's not the fastes but it is quite fast, if anone can improve on it, Please do

Private Sub Form_Load()
 
    txtchar.Text = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" '/// Sets the characters to be used
    txtlength.text = 1   '/// Sets the string length
 
End Sub

Private Sub cmdcalc_Click()

    open "my_text.text" for output as #1 '/// Open file my_text.text for output
    call engine(txtchar.text, txtlength.text) '/// Call function engine

End Sub

Public Sub Engine(char As String, length As Byte)

   ReDim ary(length)
   Dim depth As Byte
   Dim Result As String
   Dim md5_Result As String

   depth = 1

   While Not (flag)
         k = DoEvents()
         ary(depth) = ary(depth) + 1

         If depth = length Then
            Result = Left(Result, length - 1) + Mid(char, ary(depth), 1)
         Else
            Result = Result + Mid(char, ary(depth), 1)
         End If

         If ary(depth) <> Len(char) + 1 Then
            If depth <> length Then
               depth = depth + 1
            Else
               
               
                print #1, result  '/// Result is the combination, add code here to do extra info
       
                                     
             End If              
               
               
              Else
            If depth = 1 Then
               msgbox "Done!", vbInformation + vbOkOnly, "Combinations completed"
               flag = true
               exit sub
            Else
               ary(depth) = 0
               depth = depth - 1
               Result = Left(Result, depth - 1)
            End If
         End If
   Wend
 
End Sub


No Idel_Mind, you didn't offend me or anything, and you didn't make it unpleasent, Just i don't like to be blamed for making password crackers. Anyway, yeh A-L-E-X I've seen that code before, and I think thats the one im using. But, it's a bit slow, can anyone see if they can improve that code, because it does everything that i want it do do, just its a bit slow.


Thanks
Invader_RM,

The reason for you getting quotation marks around the words in the file is something to do with the way you are writing to the file. I usually use the file system object. The code below uses the file system object for writing to the file. You will need to set a reference to "Microsoft Scripting Runtime" in your project references.

The following code writes to a file instead of a string. The runtime has gone from about 15 seconds to a couple of minutes because of this. If speed is really a key issue then you would be better populating a big string of a couple of Mb in size in a similar method to my previous example and then periodically writing it to a file every time it fills up.

To change the number of letters in a word alter the 'iLenWord' variable.
To change the characters in the words alter the 'sAlphaNumerics' variable.

JR


Option Explicit

Private Sub Command1_Click()

    Dim sAlphaNumerics As String
    Dim sWord As String
    Dim iLenWord As Long
    Dim iCurrIndex As Long
    Dim iIterators() As Long '<< This is a list of counters that act like indexes inside nested for loops
    Dim i As Long
    Dim iPos As Long
    Dim iWordCount As Long
   
    'File writing stuff
    Dim fso As Scripting.FileSystemObject
    Dim MyTextFile As Scripting.TextStream
    Set fso = New Scripting.FileSystemObject
    Set MyTextFile = fso.CreateTextFile("c:\GenWords.txt")
   
    'You can adjust the value of these 2 variables to you liking
    iLenWord = 4 '<< This is the length of the word
    sAlphaNumerics = "abcdefghijklmnopqrstuvwxyz" & _
                     "ABCDEFGHIJKLMNOPQRSTUVWXYZ" & _
                     "0123456789"                       '<< This is the list of letters the word will be made up of
   
   
    If MsgBox(Format(Len(sAlphaNumerics) ^ iLenWord, "#,##0") & " words will be written to the file do you wish to continue?", vbQuestion Or vbOKCancel) = vbCancel Then
        Exit Sub
    End If
   
    ReDim iIterators(iLenWord - 1)  'Set up an array with a counter for each letter in the word
   
    iCurrIndex = iLenWord - 1   'Set the current index to the innermost counter
    sWord = Space(iLenWord) 'Initialise the word length to the length of the required word
    Do While iIterators(0) < Len(sAlphaNumerics)
   
        'Create the next word in the series
        For i = 0 To iLenWord - 1
            Mid$(sWord, i + 1, 1) = Mid$(sAlphaNumerics, iIterators(i) + 1, 1)
        Next i
       
        MyTextFile.WriteLine sWord  '<<< Write the word as a line to a text file
       
        iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1      '<< Add 1 to the inner most loop
        If iIterators(iCurrIndex) >= Len(sAlphaNumerics) Then    '<< the inner most loop counter has reached its limit so need to carry 1 to the next outer loop etc...
            While iIterators(iCurrIndex) >= Len(sAlphaNumerics)
                iIterators(iCurrIndex) = 0
                iCurrIndex = iCurrIndex - 1
                If iCurrIndex < 0 Then
                    Exit Do
                End If
                iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1   '<< Add 1 to the loop counter
            Wend
            iCurrIndex = iLenWord - 1
        End If
    Loop
    MyTextFile.Close
    MsgBox "finished", vbInformation '<< sResult contains the required result string
End Sub


Thanks, But, i get an error with that code

Compile Error:
User-Defined type is not defined

and then it highlights  

Dim fso As Scripting.FileSystemObject

I shall also try the other method you said, wait untill it fills to one MB, then releae it in to the document, But how to i use VB to see how bgi a file size will be in a string? if that makes sense ^^ lol
<<<User-Defined type is not defined
You will need to set a reference to "Microsoft Scripting Runtime" in your project/references.


ASKER CERTIFIED SOLUTION
Avatar of JR2003
JR2003

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Grade B? Doesn't it do everything you wanted in the question? :-(
Yeh sorry about that, My younger bro (who is 1 year younger than me) came on here, and I was still logged in, and just went on here and gave you a grade B, sorry about that. Is there anyway to change it?

 Also, one more thing, but you dont have to if you dont want to, is to put more detailed notes on about what each lines does and how it does it, because i still don't quite understand it.


Thanks very much though, the code is perfect :)
Invader_RM.

It doesn't matter about the grade now you've put the record straight.
The best way to understand the code is to step through the code as it runs and look at the various variables.
I've pasted in a better version of the code that has a progress bar and cancel button and an additional optimisation (variable iCS).

JR

Option Explicit

Private bCancel As Boolean
Private bRunning As Boolean


Private Sub cmdGo_Click()
   
    GenerateWords txtAlphaNumerics.Text, Val(txtWordLength.Text), txtFileName.Text

End Sub

Private Sub cmdCancel_Click()

    bCancel = True
    If Not bRunning Then
        Unload Me
    End If
   
End Sub

Private Sub Form_Load()

    txtAlphaNumerics.Text = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
    txtFileName.Text = "c:\GenWords.txt"
    txtWordLength.Text = 4

End Sub


Private Function GenerateWords(sAlphaNumerics As String, iLenWord As Long, sFileName As String)
   
    On Error GoTo Trap
   
    Dim sWord As String
    Dim iCurrIndex As Long
    Dim iIterators() As Long '<< This is a list of counters that act like indexes inside nested for loops
    Dim i As Long
    Dim iPos As Long
    Dim sIntResult As String
    Dim iIntLength
    Dim iCS As Long
    Dim dblTotalWords As Double
    Dim dblWords As Double

    bCancel = False
    bRunning = True
    With ProgressBar1
        .Min = 0
        .Max = 100
    End With
   
    'File writing stuff
    Dim fso As Scripting.FileSystemObject
    Dim MyTextFile As Scripting.TextStream
    Set fso = New Scripting.FileSystemObject
    Set MyTextFile = fso.CreateTextFile(sFileName)
   
    dblTotalWords = Len(sAlphaNumerics) ^ iLenWord
    If MsgBox(Format(dblTotalWords, "#,##0") & " words will be written to the file do you wish to continue?", vbQuestion Or vbOKCancel) = vbCancel Then
        Exit Function
    End If
    Screen.MousePointer = vbHourglass
    iIntLength = (iLenWord + 1) * 10000
    sIntResult = Space(iIntLength)
    ReDim iIterators(iLenWord - 1)  'Set up an array with a counter for each letter in the word
    iPos = 1
    iCurrIndex = iLenWord - 1       'Set the current index to the innermost counter
    sWord = Space(iLenWord)         'Initialise the word length to the length of the required word
    iCS = 0
    dblWords = 0
    Do While iIterators(0) < Len(sAlphaNumerics)
        'Create the next word in the series
        For i = iCS To iLenWord - 1
            Mid$(sWord, i + 1, 1) = Mid$(sAlphaNumerics, iIterators(i) + 1, 1)
        Next i
        Mid$(sIntResult, iPos, iLenWord) = sWord  'Add the next word to the string
        dblWords = dblWords + 1
        iCS = iLenWord - 1
        iPos = iPos + iLenWord + 1                'Change the offset
        If iPos >= iIntLength Then
            MyTextFile.Write sIntResult           '<<< Write the words to a text file
            iPos = 1
            If dblWords Mod 100000 = 0 Then
                DoEvents
                If Fix((dblWords / dblTotalWords) * 100) > ProgressBar1.Value Then
                    ProgressBar1.Value = Fix(dblWords / dblTotalWords * 100)
                End If
                If bCancel Then
                    If MsgBox("Are you sure you want to stop?", vbQuestion Or vbYesNo) = vbYes Then
                        ProgressBar1.Value = 0
                        bCancel = False
                        Exit Do
                    Else
                        bCancel = False
                    End If
                End If
            End If
        End If
        iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1      '<< Add 1 to the inner most loop
        If iIterators(iCurrIndex) >= Len(sAlphaNumerics) Then    '<< the inner most loop counter has reached its limit so need to carry 1 to the next outer loop etc...
            While iIterators(iCurrIndex) >= Len(sAlphaNumerics)
                iIterators(iCurrIndex) = 0
                iCurrIndex = iCurrIndex - 1
                iCS = iCurrIndex
                If iCurrIndex < 0 Then
                    Exit Do
                End If
                iIterators(iCurrIndex) = iIterators(iCurrIndex) + 1   '<< Add 1 to the loop counter
            Wend
            iCurrIndex = iLenWord - 1
        End If
    Loop
    If iPos > 1 Then
        MyTextFile.Write Left$(sIntResult, iPos - 2) '<<< Write the words as a line to a text file
    End If
    MyTextFile.Close
    Set MyTextFile = Nothing
    MsgBox "Finished!", vbInformation '<< sResult contains the required result string
    bRunning = False

    On Error GoTo 0

    GoTo SkipTrap
   
Trap:

    bCancel = False
    Screen.MousePointer = vbDefault

    Dim iResult As VbMsgBoxResult
    With Err
        iResult = MsgBox("Error: " & .Number & "GenerateWords of Form Form1" & vbNewLine & _
                "Description: " & .Description & vbNewLine & _
                "Source: " & .Source, vbAbortRetryIgnore Or vbExclamation, "Error")
        If iResult = vbRetry Then
            Resume
        ElseIf iResult = vbIgnore Then
            Resume Next
        End If
        bRunning = False
    End With

SkipTrap:

    Screen.MousePointer = vbDefault

End Function