Solved

# Need an algorithm to rearrange an arry in all possible combinations in VB.NET

Posted on 2011-10-09
410 Views
I am trying to figure out a way in VB.NET to take an array of say 7 items and arrange it in all possible combinations.  So if I have a Char array or A,B,C,D,E,F,G my goal is to get it to A,C,B,D,E,F,G, A,C,D,BE,F,G etc.  Then I would like to store each of these in a new array that lists each new combination.
0
Question by:monica73174

LVL 85

Expert Comment

It looks like you want them with no repetitions, but does order matter?  So you want both "BE" and "EB"?
0

LVL 85

Expert Comment

One way is to use the libaray here written by Adrian Akison:
http://www.codeproject.com/KB/recipes/Combinatorics.aspx

You would want "variations without repetition".
Click on Project --> Add Reference --> Browse --> Facet.Combinatorics.dll
Next, you create instances of Variations, passing in the set of letters, and the number of characters to choose.

Here is what it could look like:
Public Class Form1

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
ListBox1.DataSource = Variations(TextBox1.Text)
End Sub

Private Function Variations(ByVal input As String) As List(Of String)
Dim values As New List(Of String)
For i As Integer = 0 To input.Length - 1
Next

Dim results As New List(Of String)
For i As Integer = 1 To input.Length
Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
For Each variation As IList(Of String) In vars
Next
Next
Return results
End Function

End Class

Example using just "ABCD" (as ABCDEFG is A LOT of variations!!!):  Idle-Mind-510645.flv
0

LVL 85

Expert Comment

If you want to see them sorted then:
Public Class Form1

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim results As List(Of String) = Variations(TextBox1.Text)
ListBox1.DataSource = results
End Sub

Private Function VariationComparer(ByVal var1 As String, ByVal var2 As String) As Integer
Dim result As Integer = var1.Length.CompareTo(var2.Length)
If result = 0 Then
result = var1.CompareTo(var2)
End If
Return result
End Function

Private Function Variations(ByVal input As String) As List(Of String)
Dim values As New List(Of String)
For i As Integer = 0 To input.Length - 1
Next

Dim results As New List(Of String)
For i As Integer = 1 To input.Length
Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
For Each variation As IList(Of String) In vars
Next
Next
Return results
End Function

End Class

0

Author Comment

I believe that is something similar to what I am looking for.  Basically I need all the combinations because I am looking for valid words that can be compared against a dictionary.
0

LVL 85

Expert Comment

That'll produce them all...give it a shot.

There are other ways to approach the problem if you are looking for words in a dictionary though.

The brute force approach produces all possible combos and checks every single one to see if it is a valid word.

A different approach would produce a DAWG (directed acyclic word graph) from the valid words dictionary and then use a recursive graph traversal based on a connected graph of the letters of the input to build the words.  By checking the DAWG to see if any valid words begin with the current "prefix" you can cut a large number of permutations that can't possibly make a valid word...thus cutting down the generation time significantly.  This approach is often used in "Boggle" type word games.

0

Author Comment

I was kind of afraid of getting every permutation because that could cause a great deal of processor usage and time to complete the task.

So using the DAWG method I would take the list of words and create a graph and then search the graph?  There are about ~270,000 words in the acceptable words list I am using.  This approach seems to have the least overhead.
0

LVL 85

Expert Comment

For a DAWG, basically you take your dictionary of valid words and build up a graph that shows which letters connect to other letters in a valid word. *I won't lie, this is a complicated approach!*

Without optimization, though, a DAWG consumes a huge amount of memory.
We trade memory for speed with this approach...

Simplified, consider this small list of valid words: ace, act, are, art

Starting with "ace"...
Build a node with "a".
Connected to that is a node with "c".
Connected to "c" is a node with "e":

a ---> c ---> e

Next we process "act".
We already have a node with "a".
We already have a node with "c" connected to "a".
Add a node connected to "c" that has "t":

a ---> c +--> e
|
\--> t

Following the same procedure for "are" and "art", we end up with a DAWG that looks like this:

a +--> c +--> e
|         |
|         \--> t
|
\--> r +--> e
\--> t

Now, given the input "acert", we build up another graph that connects every letter to every other letter and start recursively following the paths (starting with each letter as a start point).  At each node, we look at the word built up thus far and ask the DAWG if any valid words start with that "prefix".  Since only words beginning with the letter "a" are in the DAWG, everything beginning with any other letter would immediately be rejected and those combinations wouldn't even be generated.

*Suffixes are a huge source of inefficiency.  Consider suffixes such as "ed", "er" and "ing" which can be attached to a huge number of words!  Without optimization, those suffix combinations would be duplicated for every single word that it was valid for.  So one optimization is to build common suffixes once and then make a prefix point to the existing suffix so that it isn't duplicated.

I have a huge personal project that can show valid words in a Boggle type board.  Let me know if you would like to see some code...
0

LVL 85

Expert Comment

Here's a quick demo of what that app does...it basically animates the recursive tree traversal and changes the colors to show when it rejects a prefix, finds a valid prefix, and finds a valid word (adding it to the list at the right).  Without the animation part enabled, the valid words populate near instantaneously.  This can literally be seen as I type the board in at the beginning:
Idle-Mind-510654.flv
0

LVL 85

Expert Comment

How's this project going monica?  Let me know if you need help tweaking the code I already posted, or if you want to see some code dealing with the DAWG approach.
0

Author Comment

Hello, I was trying out some different things but the DAWG approach sounds like the way I think i would like to go.  So yes I would love to see some code for DAWG.
0

LVL 85

Expert Comment

Alrighty...I'll try to post a stripped down example tomorrow.
0

LVL 85

Accepted Solution

Sorry for the delay.  First things first...results of a word look-up are only as good as the word list.  I'm using the 4th version of the official players scrabble dictionary "ospd4-lwl.txt", the long word list (lwl) version.  I think I originally got it from the Zyzzyva site here: ospd4-lwl.txt
http://zyzzyva.net/wordlists.shtml

Basically I take each line from the file and look for the first word on that line using space as the delimiter.  If your word list file is formatted differently then adjust the code accordingly.

Not sure how much reading you've done on Tries/DAWGs yet.  A DAWG is basically an optimized Trie.
http://en.wikipedia.org/wiki/Trie
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

To optimize a Trie, and thus produce a DAWG, you remove redundant suffixes.  (*There are other optimizations that can be done but I'll just ignore those.)  For full optimization, you'd first build the Trie and then analyze it, removing suffixes that you find to be duplicates.  This is a time-consuming process, though, so I've taken the middle ground with my DAWG.

Instead of analyzing for suffixes actually present in the Trie, I instead simply use a pre-defined list of the most common suffixes; such as "ed", "er", and "ing".  This approach does miss things like the common suffix between "hello" and "jello".  I decided to waste some memory to gain speed in loading the dictionary and also keep the complexity of the code down.

My nodes in the Trie are called TrieEntry:
Public Class TrieEntry

Public Letter As String = ""
Public EndOfWord As Boolean = False
Public IsSuffix As Boolean = False
Public NextLetter As New List(Of TrieEntry)

Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
Me.Letter = Letter
Me.IsSuffix = IsSuffix
End Sub

Public Function Ending() As String
If Me.IsSuffix Then
If Me.NextLetter.Count = 0 Then
Return Me.Letter
Else
Return Me.Letter & Me.NextLetter(0).Ending
End If
Else
Return ""
End If
End Function

End Class

Here is the Trie class:
Public Class Trie

Private _LongestWord As Integer
Private _WordCount As Integer
Private Letters As New List(Of TrieEntry)
Private Suffixes As New List(Of SuffixData)

Public Event ImportProgress(ByVal percentage As Integer)
Public Event FinishedProcessingFile()
Public WithEvents bgw As New System.ComponentModel.BackgroundWorker

Private Class SuffixData

Public Sub New(ByVal suffix As String)
Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
Dim te As TrieEntry = Me.SuffixEntry
Dim c As String
Dim newTE As TrieEntry
For i As Integer = 1 To suffix.Length - 1
c = suffix.Substring(i, 1)
newTE = New TrieEntry(c, True)
te = newTE
Next
te.EndOfWord = True
End Sub

Public Count As Integer = 0
Public SuffixEntry As TrieEntry = Nothing

Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
Return ret
End Function

Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
If ret = 0 Then
ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
End If
Return ret
End Function

End Class

Public Sub New()
Me.bgw.WorkerReportsProgress = True
Dim CommonSuffixes As String = ""
CommonSuffixes &= "es,ed,ing" ' > 10,000
CommonSuffixes &= ",er,ly" ' > 5,000
CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
Next
End Sub

Public ReadOnly Property LongestWord() As Integer
Get
Return _LongestWord
End Get
End Property

Public ReadOnly Property WordCount() As Integer
Get
Return _WordCount
End Get
End Property

Public Sub ProcessFile(ByVal filename As String)
If Not Me.bgw.IsBusy Then
Me.bgw.RunWorkerAsync(filename)
End If
End Sub

Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
Dim filename As String = e.Argument
Try
If System.IO.File.Exists(filename) Then
For Each sd As SuffixData In Suffixes
sd.Count = 0
Next
Me._WordCount = 0
Me.Letters.Clear()
Dim word As String
Dim lastP As Integer = 0
While sr.Peek <> -1
If Not IsNothing(word) Then
word = word.Trim().ToLower.Split(" ")(0)
If word.Length > 0 Then
For i As Integer = 0 To Suffixes.Count - 1
If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
Suffixes(i).Count = Suffixes(i).Count + 1
Exit For
End If
Next
End If
End If
Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
If p <> lastP Then
lastP = p
Me.bgw.ReportProgress(lastP)
End If
End While
End Using
Else
End If
Catch ex As Exception
MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub

Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
RaiseEvent ImportProgress(e.ProgressPercentage)
End Sub

Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
RaiseEvent FinishedProcessingFile()
End Sub

Public Sub AddWord(ByVal word As String)
If word.Trim.Length = 0 Then
Exit Sub
End If

Dim c As String
Dim te As TrieEntry = Nothing
Dim subTE As TrieEntry = Nothing

word = word.Trim.ToLower
If Not Me.HasWord(word) Then
c = word.Substring(0, 1)
For Each subTE In Me.Letters
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
Exit For
End If
Next
If IsNothing(te) Then
te = New TrieEntry(c)
End If

Dim found As Boolean = False
For i As Integer = 1 To word.Length - 1
For Each sd As SuffixData In Me.Suffixes
If word.Substring(i) = sd.SuffixEntry.Ending Then

Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
Exit Sub
End If
Next

found = False
c = word.Substring(i, 1)
For Each subTE In te.NextLetter
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
found = True
Exit For
End If
Next
subTE = New TrieEntry(c)
te = subTE
End If
Next i
te.EndOfWord = True

Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
End If
End Sub

Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
If ending = "" Then
If CheckForWord Then
Return te.EndOfWord
Else
Return True
End If
End If

Dim ret As Boolean
ending = ending.ToLower
For Each subTE As TrieEntry In te.NextLetter
If subTE.Letter = ending.Substring(0, 1) Then
ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
If ret Then
Return True
End If
End If
Next
Return False
End Function

Public Function HasBeginning(ByVal beginning As String) As Boolean
beginning = beginning.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = beginning.Substring(0, 1) Then
ret = HasEnding(subTE, beginning.Substring(1))
If ret Then
Return True
End If
End If
Next
Return False
End Function

Public Function HasWord(ByVal word As String) As Boolean
word = word.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = word.Substring(0, 1) Then
ret = HasEnding(subTE, word.Substring(1), True)
If ret Then
Return True
End If
End If
Next
Return False
End Function

End Class

To load it up, create an instance and call the ProcessFile() method.  If you want progress and completion status then wire up the ImportProgress() and FinishedProcessingFile() events:
Public Class Form1

Private WithEvents T As New Trie

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Button1.Enabled = False

Using ofd As New OpenFileDialog
If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
T.ProcessFile(ofd.FileName)
End If
End Using
End Sub

Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
End Sub

Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
Button1.Enabled = True
End Sub

End Class

Now you can ask the Trie if it has a word or if any words begin with a specific prefix:
If T.HasWord("Thingamabob") Then

End If

If T.HasBeginning("qyzjw") Then

End If

I'll post a quick example showing how to find all words that can be built from a set of letters.
0

LVL 85

Expert Comment

Try this out with a blank form:
Public Class Form1

Private WithEvents W As Word = Nothing
Private WithEvents T As New Trie

Private LB As New ListBox
Private WithEvents TB As New TextBox

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
TB.Enabled = False
TB.Location = New Point(TB.Margin.Left, TB.Margin.Top)
TB.Width = Me.ClientRectangle.Width - (TB.Location.X * 2)
TB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right

LB.Location = New Point(TB.Location.X, TB.Bounds.Bottom + TB.Margin.Bottom)
LB.Size = New Size(TB.Width, Me.ClientRectangle.Height - LB.Margin.Bottom - LB.Location.Y)
LB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right Or AnchorStyles.Bottom
LB.IntegralHeight = False
End Sub

Private Sub Form1_Shown(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Shown
Using ofd As New OpenFileDialog
If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
T.ProcessFile(ofd.FileName)
Else
Me.Close()
End If
End Using
End Sub

Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
End Sub

Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
Me.Text = "Enter a Word below!"
TB.Enabled = True
TB.Focus()
End Sub

Private Sub TB_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TB.TextChanged
If TB.Text.Trim <> "" Then
If Not IsNothing(W) Then
W.CancelWordSearch()
End If

W = New Word(TB.Text.Trim)
W.FindWords(Me.T)
End If
End Sub

Private Sub W_WordList(ByVal sender As Word, ByVal WordsFound As System.Collections.Generic.List(Of String)) Handles W.WordList
LB.DataSource = WordsFound
End Sub

End Class

Public Class Word

Private Words As New List(Of String)
Private Letters As New List(Of Letter)
Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

Public Event WordList(ByVal sender As Word, ByVal WordsFound As List(Of String))

Public Sub New(ByVal word As String)
For i As Integer = 1 To word.Length
Dim L As New Letter(word.Substring(i - 1, 1))
Next
For Each L1 As Letter In Letters
For Each L2 As Letter In Letters
If Not L2 Is L1 Then
End If
Next
Next

Me.bgw.WorkerSupportsCancellation = True
End Sub

Private Class Letter

Public Value As String = ""
Public Visited As Boolean = False
Public OtherLetters As New List(Of Letter)

Public Sub New(ByVal letter As String)
Me.Value = letter
End Sub

Public Sub Visit(ByVal Word As String, ByVal T As Trie, ByVal Words As List(Of String), ByVal MinLength As Integer)
Me.Visited = True
Word = Word & Me.Value
If Word.Length >= MinLength AndAlso T.HasWord(Word) Then
End If
For Each OtherLetter As Letter In OtherLetters
If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
OtherLetter.Visit(Word, T, Words, MinLength)
End If
Next
Me.Visited = False
End Sub

End Class

Public Sub FindWords(ByVal T As Trie)
If Not bgw.IsBusy Then
bgw.RunWorkerAsync(T)
End If
End Sub

Public Sub CancelWordSearch()
If bgw.IsBusy AndAlso Not bgw.CancellationPending Then
bgw.CancelAsync()
End If
End Sub

Private Function SortWords(ByVal Word1 As String, ByVal Word2 As String) As Integer
Dim result As Integer = Word2.Length.CompareTo(Word1.Length)
If result = 0 Then
result = Word1.CompareTo(Word2)
End If
Return result
End Function

Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
Dim T As Trie = DirectCast(e.Argument, Trie)

Words.Clear()
For Each StartingLetter As Letter In Letters
For Each L As Letter In Letters
L.Visited = False
Next
StartingLetter.Visit("", T, Words, 2)

If bgw.CancellationPending Then
Exit Sub
End If
Next

Words = (From uniqWord In Words Distinct Select uniqWord).ToList
End Sub

Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
If Not bgw.CancellationPending Then
RaiseEvent WordList(Me, Words)
End If
End Sub

End Class

Public Class TrieEntry

Public Letter As String = ""
Public EndOfWord As Boolean = False
Public IsSuffix As Boolean = False
Public NextLetter As New List(Of TrieEntry)

Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
Me.Letter = Letter
Me.IsSuffix = IsSuffix
End Sub

Public Function Ending() As String
If Me.IsSuffix Then
If Me.NextLetter.Count = 0 Then
Return Me.Letter
Else
Return Me.Letter & Me.NextLetter(0).Ending
End If
Else
Return ""
End If
End Function

End Class

Public Class Trie

Private _LongestWord As Integer
Private _WordCount As Integer
Private Letters As New List(Of TrieEntry)
Private Suffixes As New List(Of SuffixData)

Public Event ImportProgress(ByVal percentage As Integer)
Public Event FinishedProcessingFile()
Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

Private Class SuffixData

Public Sub New(ByVal suffix As String)
Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
Dim te As TrieEntry = Me.SuffixEntry
Dim c As String
Dim newTE As TrieEntry
For i As Integer = 1 To suffix.Length - 1
c = suffix.Substring(i, 1)
newTE = New TrieEntry(c, True)
te = newTE
Next
te.EndOfWord = True
End Sub

Public Count As Integer = 0
Public SuffixEntry As TrieEntry = Nothing

Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
Return ret
End Function

Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
If ret = 0 Then
ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
End If
Return ret
End Function

End Class

Public Sub New()
Me.bgw.WorkerReportsProgress = True
Dim CommonSuffixes As String = ""
CommonSuffixes &= "es,ed,ing" ' > 10,000
CommonSuffixes &= ",er,ly" ' > 5,000
CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
Next
End Sub

Public ReadOnly Property LongestWord() As Integer
Get
Return _LongestWord
End Get
End Property

Public ReadOnly Property WordCount() As Integer
Get
Return _WordCount
End Get
End Property

Public Sub ProcessFile(ByVal filename As String)
If Not Me.bgw.IsBusy Then
Me.bgw.RunWorkerAsync(filename)
End If
End Sub

Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
Dim filename As String = e.Argument
Try
If System.IO.File.Exists(filename) Then
For Each sd As SuffixData In Suffixes
sd.Count = 0
Next
Me._WordCount = 0
Me.Letters.Clear()
Dim word As String
Dim lastP As Integer = 0
While sr.Peek <> -1
If Not IsNothing(word) Then
word = word.Trim().ToLower.Split(" ")(0)
If word.Length > 0 Then
For i As Integer = 0 To Suffixes.Count - 1
If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
Suffixes(i).Count = Suffixes(i).Count + 1
Exit For
End If
Next
End If
End If
Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
If p <> lastP Then
lastP = p
Me.bgw.ReportProgress(lastP)
End If
End While
End Using
Else
End If
Catch ex As Exception
MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub

Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
RaiseEvent ImportProgress(e.ProgressPercentage)
End Sub

Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
RaiseEvent FinishedProcessingFile()
End Sub

Public Sub AddWord(ByVal word As String)
If word.Trim.Length = 0 Then
Exit Sub
End If

Dim c As String
Dim te As TrieEntry = Nothing
Dim subTE As TrieEntry = Nothing

word = word.Trim.ToLower
If Not Me.HasWord(word) Then
c = word.Substring(0, 1)
For Each subTE In Me.Letters
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
Exit For
End If
Next
If IsNothing(te) Then
te = New TrieEntry(c)
End If

Dim found As Boolean = False
For i As Integer = 1 To word.Length - 1
For Each sd As SuffixData In Me.Suffixes
If word.Substring(i) = sd.SuffixEntry.Ending Then

Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
Exit Sub
End If
Next

found = False
c = word.Substring(i, 1)
For Each subTE In te.NextLetter
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
found = True
Exit For
End If
Next
subTE = New TrieEntry(c)
te = subTE
End If
Next i
te.EndOfWord = True

Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
End If
End Sub

Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
If ending = "" Then
If CheckForWord Then
Return te.EndOfWord
Else
Return True
End If
End If

Dim ret As Boolean
ending = ending.ToLower
For Each subTE As TrieEntry In te.NextLetter
If subTE.Letter = ending.Substring(0, 1) Then
ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
If ret Then
Return True
End If
End If
Next
Return False
End Function

Public Function HasBeginning(ByVal beginning As String) As Boolean
beginning = beginning.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = beginning.Substring(0, 1) Then
ret = HasEnding(subTE, beginning.Substring(1))
If ret Then
Return True
End If
End If
Next
Return False
End Function

Public Function HasWord(ByVal word As String) As Boolean
word = word.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = word.Substring(0, 1) Then
ret = HasEnding(subTE, word.Substring(1), True)
If ret Then
Return True
End If
End If
Next
Return False
End Function

End Class

0

Author Comment

Thanks your example is excellent and easy to follow.  It has been very difficult to find examples of DAWGs that I understand.   I want to understand how the algorithm works not just copy someone else's work.
0

LVL 85

Expert Comment

Ask any questions you might have about the code and I'll do my best to explain it.  =)
0

Author Comment

One last question, Using this method with the DAWG I still need to have the permutation of all the letters that I select and then pass them in one by one in the hasword function?
0

LVL 85

Expert Comment

The Word class makes a Graph from the input letters and connects each letter to every other letter.  Then it uses a simple recursive algorithm to produce all the permutations.  Without the DAWG, we would produce all combinations and check every single one for a valid word combination.  In that scenario you could use a HashTable instead of a DAWG for word validation.

With the DAWG, though, we do NOT produce all possible permutations.  The magic happens in the Visit() method at lines #101 thru #103:

If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
OtherLetter.Visit(Word, T, Words, MinLength)
End If

Here, using the HasBeginning() function, we only recurse to the next available letter if adding it to the current permutation produces a valid word beginning.  This eliminates entire branches of the possible permutations from being generated.  For instance, if the word "quartz" was passed in, we wouldn't generate any word permutations that start with "qz" or "zq" because the DAWG would report back that no words begin with those combinations.  Thus the DAWG reduces the time needed to find all the possible words using a set of letters because the number of permutations actually produced and checked is significantly fewer than simply brute force checking every single permutation.
0

## Featured Post

### Suggested Solutions

This tutorial demonstrates one way to create an application that runs without any Forms but still has a GUI presence via an Icon in the System Tray. The magic lies in Inheriting from the ApplicationContext Class and passing that to Application.Ruâ€¦
Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itselâ€¦
how to add IIS SMTP to handle application/Scanner relays into office 365.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.