monica73174
asked on
Need an algorithm to rearrange an arry in all possible combinations in VB.NET
I am trying to figure out a way in VB.NET to take an array of say 7 items and arrange it in all possible combinations. So if I have a Char array or A,B,C,D,E,F,G my goal is to get it to A,C,B,D,E,F,G, A,C,D,BE,F,G etc. Then I would like to store each of these in a new array that lists each new combination.
It looks like you want them with no repetitions, but does order matter? So you want both "BE" and "EB"?
One way is to use the libaray here written by Adrian Akison:
http://www.codeproject.com/KB/recipes/Combinatorics.aspx
You would want "variations without repetition".
Click on Project --> Add Reference --> Browse --> Facet.Combinatorics.dll
Next, you create instances of Variations, passing in the set of letters, and the number of characters to choose.
Here is what it could look like:
Example using just "ABCD" (as ABCDEFG is A LOT of variations!!!): Idle-Mind-510645.flv
http://www.codeproject.com/KB/recipes/Combinatorics.aspx
You would want "variations without repetition".
Click on Project --> Add Reference --> Browse --> Facet.Combinatorics.dll
Next, you create instances of Variations, passing in the set of letters, and the number of characters to choose.
Here is what it could look like:
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
ListBox1.DataSource = Variations(TextBox1.Text)
End Sub
Private Function Variations(ByVal input As String) As List(Of String)
Dim values As New List(Of String)
For i As Integer = 0 To input.Length - 1
values.Add(input.Substring(i, 1))
Next
Dim results As New List(Of String)
For i As Integer = 1 To input.Length
Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
For Each variation As IList(Of String) In vars
results.Add(String.Join("", variation))
Next
Next
Return results
End Function
End Class
Example using just "ABCD" (as ABCDEFG is A LOT of variations!!!): Idle-Mind-510645.flv
If you want to see them sorted then:
Public Class Form1
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim results As List(Of String) = Variations(TextBox1.Text)
results.Sort(AddressOf VariationComparer)
ListBox1.DataSource = results
End Sub
Private Function VariationComparer(ByVal var1 As String, ByVal var2 As String) As Integer
Dim result As Integer = var1.Length.CompareTo(var2.Length)
If result = 0 Then
result = var1.CompareTo(var2)
End If
Return result
End Function
Private Function Variations(ByVal input As String) As List(Of String)
Dim values As New List(Of String)
For i As Integer = 0 To input.Length - 1
values.Add(input.Substring(i, 1))
Next
Dim results As New List(Of String)
For i As Integer = 1 To input.Length
Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
For Each variation As IList(Of String) In vars
results.Add(String.Join("", variation))
Next
Next
Return results
End Function
End Class
ASKER
I believe that is something similar to what I am looking for. Basically I need all the combinations because I am looking for valid words that can be compared against a dictionary.
That'll produce them all...give it a shot.
There are other ways to approach the problem if you are looking for words in a dictionary though.
The brute force approach produces all possible combos and checks every single one to see if it is a valid word.
...but if you already know that no words start with "zq", then why produce more combos that start with that? zqa, zqb, zqc, etc...
A different approach would produce a DAWG (directed acyclic word graph) from the valid words dictionary and then use a recursive graph traversal based on a connected graph of the letters of the input to build the words. By checking the DAWG to see if any valid words begin with the current "prefix" you can cut a large number of permutations that can't possibly make a valid word...thus cutting down the generation time significantly. This approach is often used in "Boggle" type word games.
There are other ways to approach the problem if you are looking for words in a dictionary though.
The brute force approach produces all possible combos and checks every single one to see if it is a valid word.
...but if you already know that no words start with "zq", then why produce more combos that start with that? zqa, zqb, zqc, etc...
A different approach would produce a DAWG (directed acyclic word graph) from the valid words dictionary and then use a recursive graph traversal based on a connected graph of the letters of the input to build the words. By checking the DAWG to see if any valid words begin with the current "prefix" you can cut a large number of permutations that can't possibly make a valid word...thus cutting down the generation time significantly. This approach is often used in "Boggle" type word games.
ASKER
I was kind of afraid of getting every permutation because that could cause a great deal of processor usage and time to complete the task.
So using the DAWG method I would take the list of words and create a graph and then search the graph? There are about ~270,000 words in the acceptable words list I am using. This approach seems to have the least overhead.
So using the DAWG method I would take the list of words and create a graph and then search the graph? There are about ~270,000 words in the acceptable words list I am using. This approach seems to have the least overhead.
For a DAWG, basically you take your dictionary of valid words and build up a graph that shows which letters connect to other letters in a valid word. *I won't lie, this is a complicated approach!*
Without optimization, though, a DAWG consumes a huge amount of memory.
We trade memory for speed with this approach...
Simplified, consider this small list of valid words: ace, act, are, art
Starting with "ace"...
Build a node with "a".
Connected to that is a node with "c".
Connected to "c" is a node with "e":
a ---> c ---> e
Next we process "act".
We already have a node with "a".
We already have a node with "c" connected to "a".
Add a node connected to "c" that has "t":
a ---> c +--> e
|
\--> t
Following the same procedure for "are" and "art", we end up with a DAWG that looks like this:
a +--> c +--> e
| |
| \--> t
|
\--> r +--> e
\--> t
Now, given the input "acert", we build up another graph that connects every letter to every other letter and start recursively following the paths (starting with each letter as a start point). At each node, we look at the word built up thus far and ask the DAWG if any valid words start with that "prefix". Since only words beginning with the letter "a" are in the DAWG, everything beginning with any other letter would immediately be rejected and those combinations wouldn't even be generated.
*Suffixes are a huge source of inefficiency. Consider suffixes such as "ed", "er" and "ing" which can be attached to a huge number of words! Without optimization, those suffix combinations would be duplicated for every single word that it was valid for. So one optimization is to build common suffixes once and then make a prefix point to the existing suffix so that it isn't duplicated.
I have a huge personal project that can show valid words in a Boggle type board. Let me know if you would like to see some code...
Without optimization, though, a DAWG consumes a huge amount of memory.
We trade memory for speed with this approach...
Simplified, consider this small list of valid words: ace, act, are, art
Starting with "ace"...
Build a node with "a".
Connected to that is a node with "c".
Connected to "c" is a node with "e":
a ---> c ---> e
Next we process "act".
We already have a node with "a".
We already have a node with "c" connected to "a".
Add a node connected to "c" that has "t":
a ---> c +--> e
|
\--> t
Following the same procedure for "are" and "art", we end up with a DAWG that looks like this:
a +--> c +--> e
| |
| \--> t
|
\--> r +--> e
\--> t
Now, given the input "acert", we build up another graph that connects every letter to every other letter and start recursively following the paths (starting with each letter as a start point). At each node, we look at the word built up thus far and ask the DAWG if any valid words start with that "prefix". Since only words beginning with the letter "a" are in the DAWG, everything beginning with any other letter would immediately be rejected and those combinations wouldn't even be generated.
*Suffixes are a huge source of inefficiency. Consider suffixes such as "ed", "er" and "ing" which can be attached to a huge number of words! Without optimization, those suffix combinations would be duplicated for every single word that it was valid for. So one optimization is to build common suffixes once and then make a prefix point to the existing suffix so that it isn't duplicated.
I have a huge personal project that can show valid words in a Boggle type board. Let me know if you would like to see some code...
Here's a quick demo of what that app does...it basically animates the recursive tree traversal and changes the colors to show when it rejects a prefix, finds a valid prefix, and finds a valid word (adding it to the list at the right). Without the animation part enabled, the valid words populate near instantaneously. This can literally be seen as I type the board in at the beginning:
Idle-Mind-510654.flv
Idle-Mind-510654.flv
How's this project going monica? Let me know if you need help tweaking the code I already posted, or if you want to see some code dealing with the DAWG approach.
ASKER
Hello, I was trying out some different things but the DAWG approach sounds like the way I think i would like to go. So yes I would love to see some code for DAWG.
Alrighty...I'll try to post a stripped down example tomorrow.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Try this out with a blank form:
Public Class Form1
Private WithEvents W As Word = Nothing
Private WithEvents T As New Trie
Private LB As New ListBox
Private WithEvents TB As New TextBox
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
TB.Enabled = False
TB.Location = New Point(TB.Margin.Left, TB.Margin.Top)
TB.Width = Me.ClientRectangle.Width - (TB.Location.X * 2)
TB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right
Me.Controls.Add(TB)
LB.Location = New Point(TB.Location.X, TB.Bounds.Bottom + TB.Margin.Bottom)
LB.Size = New Size(TB.Width, Me.ClientRectangle.Height - LB.Margin.Bottom - LB.Location.Y)
LB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right Or AnchorStyles.Bottom
LB.IntegralHeight = False
Me.Controls.Add(LB)
End Sub
Private Sub Form1_Shown(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Shown
Using ofd As New OpenFileDialog
If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
T.ProcessFile(ofd.FileName)
Else
Me.Close()
End If
End Using
End Sub
Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
Me.Text = "Loading... " & percentage & "%"
End Sub
Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
Me.Text = "Enter a Word below!"
TB.Enabled = True
TB.Focus()
End Sub
Private Sub TB_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TB.TextChanged
If TB.Text.Trim <> "" Then
If Not IsNothing(W) Then
W.CancelWordSearch()
End If
W = New Word(TB.Text.Trim)
W.FindWords(Me.T)
End If
End Sub
Private Sub W_WordList(ByVal sender As Word, ByVal WordsFound As System.Collections.Generic.List(Of String)) Handles W.WordList
LB.DataSource = WordsFound
End Sub
End Class
Public Class Word
Private Words As New List(Of String)
Private Letters As New List(Of Letter)
Private WithEvents bgw As New System.ComponentModel.BackgroundWorker
Public Event WordList(ByVal sender As Word, ByVal WordsFound As List(Of String))
Public Sub New(ByVal word As String)
For i As Integer = 1 To word.Length
Dim L As New Letter(word.Substring(i - 1, 1))
Letters.Add(L)
Next
For Each L1 As Letter In Letters
For Each L2 As Letter In Letters
If Not L2 Is L1 Then
L1.OtherLetters.Add(L2)
End If
Next
Next
Me.bgw.WorkerSupportsCancellation = True
End Sub
Private Class Letter
Public Value As String = ""
Public Visited As Boolean = False
Public OtherLetters As New List(Of Letter)
Public Sub New(ByVal letter As String)
Me.Value = letter
End Sub
Public Sub Visit(ByVal Word As String, ByVal T As Trie, ByVal Words As List(Of String), ByVal MinLength As Integer)
Me.Visited = True
Word = Word & Me.Value
If Word.Length >= MinLength AndAlso T.HasWord(Word) Then
Words.Add(Word)
End If
For Each OtherLetter As Letter In OtherLetters
If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
OtherLetter.Visit(Word, T, Words, MinLength)
End If
Next
Me.Visited = False
End Sub
End Class
Public Sub FindWords(ByVal T As Trie)
If Not bgw.IsBusy Then
bgw.RunWorkerAsync(T)
End If
End Sub
Public Sub CancelWordSearch()
If bgw.IsBusy AndAlso Not bgw.CancellationPending Then
bgw.CancelAsync()
End If
End Sub
Private Function SortWords(ByVal Word1 As String, ByVal Word2 As String) As Integer
Dim result As Integer = Word2.Length.CompareTo(Word1.Length)
If result = 0 Then
result = Word1.CompareTo(Word2)
End If
Return result
End Function
Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
Dim T As Trie = DirectCast(e.Argument, Trie)
Words.Clear()
For Each StartingLetter As Letter In Letters
For Each L As Letter In Letters
L.Visited = False
Next
StartingLetter.Visit("", T, Words, 2)
If bgw.CancellationPending Then
Exit Sub
End If
Next
Words = (From uniqWord In Words Distinct Select uniqWord).ToList
Words.Sort(AddressOf SortWords)
End Sub
Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
If Not bgw.CancellationPending Then
RaiseEvent WordList(Me, Words)
End If
End Sub
End Class
Public Class TrieEntry
Public Letter As String = ""
Public EndOfWord As Boolean = False
Public IsSuffix As Boolean = False
Public NextLetter As New List(Of TrieEntry)
Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
Me.Letter = Letter
Me.IsSuffix = IsSuffix
End Sub
Public Function Ending() As String
If Me.IsSuffix Then
If Me.NextLetter.Count = 0 Then
Return Me.Letter
Else
Return Me.Letter & Me.NextLetter(0).Ending
End If
Else
Return ""
End If
End Function
End Class
Public Class Trie
Private _LongestWord As Integer
Private _WordCount As Integer
Private Letters As New List(Of TrieEntry)
Private Suffixes As New List(Of SuffixData)
Public Event ImportProgress(ByVal percentage As Integer)
Public Event FinishedProcessingFile()
Private WithEvents bgw As New System.ComponentModel.BackgroundWorker
Private Loading As Boolean = False
Private Class SuffixData
Public Sub New(ByVal suffix As String)
Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
Dim te As TrieEntry = Me.SuffixEntry
Dim c As String
Dim newTE As TrieEntry
For i As Integer = 1 To suffix.Length - 1
c = suffix.Substring(i, 1)
newTE = New TrieEntry(c, True)
te.NextLetter.Add(newTE)
te = newTE
Next
te.EndOfWord = True
End Sub
Public Count As Integer = 0
Public SuffixEntry As TrieEntry = Nothing
Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
Return ret
End Function
Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
If ret = 0 Then
ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
If ret = 0 Then
ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
End If
End If
Return ret
End Function
End Class
Public Sub New()
Me.bgw.WorkerReportsProgress = True
Dim CommonSuffixes As String = ""
CommonSuffixes &= "es,ed,ing" ' > 10,000
CommonSuffixes &= ",er,ly" ' > 5,000
CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
Suffixes.Add(New SuffixData(suffix))
Next
Suffixes.Sort(AddressOf SuffixData.ByLengthThenAlpha)
End Sub
Public ReadOnly Property LongestWord() As Integer
Get
Return _LongestWord
End Get
End Property
Public ReadOnly Property WordCount() As Integer
Get
Return _WordCount
End Get
End Property
Public Sub ProcessFile(ByVal filename As String)
If Not Me.bgw.IsBusy Then
Loading = True
Me.bgw.RunWorkerAsync(filename)
End If
End Sub
Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
Dim filename As String = e.Argument
Try
If System.IO.File.Exists(filename) Then
For Each sd As SuffixData In Suffixes
sd.Count = 0
Next
Me._WordCount = 0
Me.Letters.Clear()
Dim word As String
Dim lastP As Integer = 0
Using sr As New System.IO.StreamReader(filename)
While sr.Peek <> -1
word = sr.ReadLine()
If Not IsNothing(word) Then
word = word.Trim().ToLower.Split(" ")(0)
If word.Length > 0 Then
For i As Integer = 0 To Suffixes.Count - 1
If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
Suffixes(i).Count = Suffixes(i).Count + 1
Exit For
End If
Next
Me.AddWord(word)
End If
End If
Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
If p <> lastP Then
lastP = p
Me.bgw.ReportProgress(lastP)
End If
End While
End Using
Suffixes.Sort(AddressOf SuffixData.ByCountThenLengthThenAlpha)
Loading = False
Else
MessageBox.Show("Unable to locate " & filename, "File Not Found", MessageBoxButtons.OK, MessageBoxIcon.Error)
End If
Catch ex As Exception
MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
End Sub
Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
RaiseEvent ImportProgress(e.ProgressPercentage)
End Sub
Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
RaiseEvent FinishedProcessingFile()
End Sub
Public Sub AddWord(ByVal word As String)
If word.Trim.Length = 0 Then
Exit Sub
End If
Dim c As String
Dim te As TrieEntry = Nothing
Dim subTE As TrieEntry = Nothing
word = word.Trim.ToLower
If Not Me.HasWord(word) Then
c = word.Substring(0, 1)
For Each subTE In Me.Letters
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
Exit For
End If
Next
If IsNothing(te) Then
te = New TrieEntry(c)
Me.Letters.Add(te)
End If
Dim found As Boolean = False
For i As Integer = 1 To word.Length - 1
For Each sd As SuffixData In Me.Suffixes
If word.Substring(i) = sd.SuffixEntry.Ending Then
te.NextLetter.Add(sd.SuffixEntry)
Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
Exit Sub
End If
Next
found = False
c = word.Substring(i, 1)
For Each subTE In te.NextLetter
If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
te = subTE
found = True
Exit For
End If
Next
If Not found Then
subTE = New TrieEntry(c)
te.NextLetter.Add(subTE)
te = subTE
End If
Next i
te.EndOfWord = True
Me._WordCount = Me._WordCount + 1
If word.Length > Me._LongestWord Then
Me._LongestWord = word.Length
End If
End If
End Sub
Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
If ending = "" Then
If CheckForWord Then
Return te.EndOfWord
Else
Return True
End If
End If
Dim ret As Boolean
ending = ending.ToLower
For Each subTE As TrieEntry In te.NextLetter
If subTE.Letter = ending.Substring(0, 1) Then
ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
If ret Then
Return True
End If
End If
Next
Return False
End Function
Public Function HasBeginning(ByVal beginning As String) As Boolean
beginning = beginning.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = beginning.Substring(0, 1) Then
ret = HasEnding(subTE, beginning.Substring(1))
If ret Then
Return True
End If
End If
Next
Return False
End Function
Public Function HasWord(ByVal word As String) As Boolean
word = word.ToLower
Dim ret As Boolean
For Each subTE As TrieEntry In Me.Letters
If subTE.Letter = word.Substring(0, 1) Then
ret = HasEnding(subTE, word.Substring(1), True)
If ret Then
Return True
End If
End If
Next
Return False
End Function
End Class
ASKER
Thanks your example is excellent and easy to follow. It has been very difficult to find examples of DAWGs that I understand. I want to understand how the algorithm works not just copy someone else's work.
Ask any questions you might have about the code and I'll do my best to explain it. =)
ASKER
One last question, Using this method with the DAWG I still need to have the permutation of all the letters that I select and then pass them in one by one in the hasword function?
The Word class makes a Graph from the input letters and connects each letter to every other letter. Then it uses a simple recursive algorithm to produce all the permutations. Without the DAWG, we would produce all combinations and check every single one for a valid word combination. In that scenario you could use a HashTable instead of a DAWG for word validation.
With the DAWG, though, we do NOT produce all possible permutations. The magic happens in the Visit() method at lines #101 thru #103:
If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
OtherLetter.Visit(Word, T, Words, MinLength)
End If
Here, using the HasBeginning() function, we only recurse to the next available letter if adding it to the current permutation produces a valid word beginning. This eliminates entire branches of the possible permutations from being generated. For instance, if the word "quartz" was passed in, we wouldn't generate any word permutations that start with "qz" or "zq" because the DAWG would report back that no words begin with those combinations. Thus the DAWG reduces the time needed to find all the possible words using a set of letters because the number of permutations actually produced and checked is significantly fewer than simply brute force checking every single permutation.
With the DAWG, though, we do NOT produce all possible permutations. The magic happens in the Visit() method at lines #101 thru #103:
If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
OtherLetter.Visit(Word, T, Words, MinLength)
End If
Here, using the HasBeginning() function, we only recurse to the next available letter if adding it to the current permutation produces a valid word beginning. This eliminates entire branches of the possible permutations from being generated. For instance, if the word "quartz" was passed in, we wouldn't generate any word permutations that start with "qz" or "zq" because the DAWG would report back that no words begin with those combinations. Thus the DAWG reduces the time needed to find all the possible words using a set of letters because the number of permutations actually produced and checked is significantly fewer than simply brute force checking every single permutation.