Link to home
Start Free TrialLog in
Avatar of monica73174
monica73174

asked on

Need an algorithm to rearrange an arry in all possible combinations in VB.NET

I am trying to figure out a way in VB.NET to take an array of say 7 items and arrange it in all possible combinations.  So if I have a Char array or A,B,C,D,E,F,G my goal is to get it to A,C,B,D,E,F,G, A,C,D,BE,F,G etc.  Then I would like to store each of these in a new array that lists each new combination.  
Avatar of Mike Tomlinson
Mike Tomlinson
Flag of United States of America image

It looks like you want them with no repetitions, but does order matter?  So you want both "BE" and "EB"?
One way is to use the libaray here written by Adrian Akison:
http://www.codeproject.com/KB/recipes/Combinatorics.aspx

You would want "variations without repetition".
Click on Project --> Add Reference --> Browse --> Facet.Combinatorics.dll
Next, you create instances of Variations, passing in the set of letters, and the number of characters to choose.

Here is what it could look like:  
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        ListBox1.DataSource = Variations(TextBox1.Text)
    End Sub

    Private Function Variations(ByVal input As String) As List(Of String)
        Dim values As New List(Of String)
        For i As Integer = 0 To input.Length - 1
            values.Add(input.Substring(i, 1))
        Next

        Dim results As New List(Of String)
        For i As Integer = 1 To input.Length
            Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
            For Each variation As IList(Of String) In vars
                results.Add(String.Join("", variation))
            Next
        Next
            Return results
    End Function

End Class

Open in new window


Example using just "ABCD" (as ABCDEFG is A LOT of variations!!!):  Idle-Mind-510645.flv
If you want to see them sorted then:
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim results As List(Of String) = Variations(TextBox1.Text)
        results.Sort(AddressOf VariationComparer)
        ListBox1.DataSource = results
    End Sub

    Private Function VariationComparer(ByVal var1 As String, ByVal var2 As String) As Integer
        Dim result As Integer = var1.Length.CompareTo(var2.Length)
        If result = 0 Then
            result = var1.CompareTo(var2)
        End If
        Return result
    End Function

    Private Function Variations(ByVal input As String) As List(Of String)
        Dim values As New List(Of String)
        For i As Integer = 0 To input.Length - 1
            values.Add(input.Substring(i, 1))
        Next

        Dim results As New List(Of String)
        For i As Integer = 1 To input.Length
            Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
            For Each variation As IList(Of String) In vars
                results.Add(String.Join("", variation))
            Next
        Next
            Return results
    End Function

End Class

Open in new window

Avatar of monica73174
monica73174

ASKER

I believe that is something similar to what I am looking for.  Basically I need all the combinations because I am looking for valid words that can be compared against a dictionary.  
That'll produce them all...give it a shot.

There are other ways to approach the problem if you are looking for words in a dictionary though.

The brute force approach produces all possible combos and checks every single one to see if it is a valid word.

...but if you already know that no words start with "zq", then why produce more combos that start with that? zqa, zqb, zqc, etc...

A different approach would produce a DAWG (directed acyclic word graph) from the valid words dictionary and then use a recursive graph traversal based on a connected graph of the letters of the input to build the words.  By checking the DAWG to see if any valid words begin with the current "prefix" you can cut a large number of permutations that can't possibly make a valid word...thus cutting down the generation time significantly.  This approach is often used in "Boggle" type word games.

I was kind of afraid of getting every permutation because that could cause a great deal of processor usage and time to complete the task.  

So using the DAWG method I would take the list of words and create a graph and then search the graph?  There are about ~270,000 words in the acceptable words list I am using.  This approach seems to have the least overhead.  
For a DAWG, basically you take your dictionary of valid words and build up a graph that shows which letters connect to other letters in a valid word. *I won't lie, this is a complicated approach!*

Without optimization, though, a DAWG consumes a huge amount of memory.
We trade memory for speed with this approach...

Simplified, consider this small list of valid words: ace, act, are, art

Starting with "ace"...
Build a node with "a".
Connected to that is a node with "c".
Connected to "c" is a node with "e":

    a ---> c ---> e

Next we process "act".
We already have a node with "a".
We already have a node with "c" connected to "a".
Add a node connected to "c" that has "t":

    a ---> c +--> e
                 |
                 \--> t

Following the same procedure for "are" and "art", we end up with a DAWG that looks like this:


    a +--> c +--> e
        |         |
        |         \--> t
        |
        \--> r +--> e
                  \--> t

Now, given the input "acert", we build up another graph that connects every letter to every other letter and start recursively following the paths (starting with each letter as a start point).  At each node, we look at the word built up thus far and ask the DAWG if any valid words start with that "prefix".  Since only words beginning with the letter "a" are in the DAWG, everything beginning with any other letter would immediately be rejected and those combinations wouldn't even be generated.

*Suffixes are a huge source of inefficiency.  Consider suffixes such as "ed", "er" and "ing" which can be attached to a huge number of words!  Without optimization, those suffix combinations would be duplicated for every single word that it was valid for.  So one optimization is to build common suffixes once and then make a prefix point to the existing suffix so that it isn't duplicated.

I have a huge personal project that can show valid words in a Boggle type board.  Let me know if you would like to see some code...
Here's a quick demo of what that app does...it basically animates the recursive tree traversal and changes the colors to show when it rejects a prefix, finds a valid prefix, and finds a valid word (adding it to the list at the right).  Without the animation part enabled, the valid words populate near instantaneously.  This can literally be seen as I type the board in at the beginning:
Idle-Mind-510654.flv
How's this project going monica?  Let me know if you need help tweaking the code I already posted, or if you want to see some code dealing with the DAWG approach.
Hello, I was trying out some different things but the DAWG approach sounds like the way I think i would like to go.  So yes I would love to see some code for DAWG.  
Alrighty...I'll try to post a stripped down example tomorrow.
ASKER CERTIFIED SOLUTION
Avatar of Mike Tomlinson
Mike Tomlinson
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Try this out with a blank form:
Public Class Form1

    Private WithEvents W As Word = Nothing
    Private WithEvents T As New Trie

    Private LB As New ListBox
    Private WithEvents TB As New TextBox

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        TB.Enabled = False
        TB.Location = New Point(TB.Margin.Left, TB.Margin.Top)
        TB.Width = Me.ClientRectangle.Width - (TB.Location.X * 2)
        TB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right
        Me.Controls.Add(TB)

        LB.Location = New Point(TB.Location.X, TB.Bounds.Bottom + TB.Margin.Bottom)
        LB.Size = New Size(TB.Width, Me.ClientRectangle.Height - LB.Margin.Bottom - LB.Location.Y)
        LB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right Or AnchorStyles.Bottom
        LB.IntegralHeight = False
        Me.Controls.Add(LB)
    End Sub

    Private Sub Form1_Shown(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Shown
        Using ofd As New OpenFileDialog
            If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
                T.ProcessFile(ofd.FileName)
            Else
                Me.Close()
            End If
        End Using
    End Sub

    Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
        Me.Text = "Loading... " & percentage & "%"
    End Sub

    Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
        Me.Text = "Enter a Word below!"
        TB.Enabled = True
        TB.Focus()
    End Sub

    Private Sub TB_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TB.TextChanged
        If TB.Text.Trim <> "" Then
            If Not IsNothing(W) Then
                W.CancelWordSearch()
            End If

            W = New Word(TB.Text.Trim)
            W.FindWords(Me.T)
        End If
    End Sub

    Private Sub W_WordList(ByVal sender As Word, ByVal WordsFound As System.Collections.Generic.List(Of String)) Handles W.WordList
        LB.DataSource = WordsFound
    End Sub

End Class

Public Class Word

    Private Words As New List(Of String)
    Private Letters As New List(Of Letter)
    Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

    Public Event WordList(ByVal sender As Word, ByVal WordsFound As List(Of String))

    Public Sub New(ByVal word As String)
        For i As Integer = 1 To word.Length
            Dim L As New Letter(word.Substring(i - 1, 1))
            Letters.Add(L)
        Next
        For Each L1 As Letter In Letters
            For Each L2 As Letter In Letters
                If Not L2 Is L1 Then
                    L1.OtherLetters.Add(L2)
                End If
            Next
        Next

        Me.bgw.WorkerSupportsCancellation = True
    End Sub

    Private Class Letter

        Public Value As String = ""
        Public Visited As Boolean = False
        Public OtherLetters As New List(Of Letter)

        Public Sub New(ByVal letter As String)
            Me.Value = letter
        End Sub

        Public Sub Visit(ByVal Word As String, ByVal T As Trie, ByVal Words As List(Of String), ByVal MinLength As Integer)
            Me.Visited = True
            Word = Word & Me.Value
            If Word.Length >= MinLength AndAlso T.HasWord(Word) Then
                Words.Add(Word)
            End If
            For Each OtherLetter As Letter In OtherLetters
                If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
                    OtherLetter.Visit(Word, T, Words, MinLength)
                End If
            Next
            Me.Visited = False
        End Sub

    End Class

    Public Sub FindWords(ByVal T As Trie)
        If Not bgw.IsBusy Then
            bgw.RunWorkerAsync(T)
        End If
    End Sub

    Public Sub CancelWordSearch()
        If bgw.IsBusy AndAlso Not bgw.CancellationPending Then
            bgw.CancelAsync()
        End If
    End Sub

    Private Function SortWords(ByVal Word1 As String, ByVal Word2 As String) As Integer
        Dim result As Integer = Word2.Length.CompareTo(Word1.Length)
        If result = 0 Then
            result = Word1.CompareTo(Word2)
        End If
        Return result
    End Function

    Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
        Dim T As Trie = DirectCast(e.Argument, Trie)

        Words.Clear()
        For Each StartingLetter As Letter In Letters
            For Each L As Letter In Letters
                L.Visited = False
            Next
            StartingLetter.Visit("", T, Words, 2)

            If bgw.CancellationPending Then
                Exit Sub
            End If
        Next

        Words = (From uniqWord In Words Distinct Select uniqWord).ToList
        Words.Sort(AddressOf SortWords)
    End Sub

    Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
        If Not bgw.CancellationPending Then
            RaiseEvent WordList(Me, Words)
        End If
    End Sub

End Class

Public Class TrieEntry

    Public Letter As String = ""
    Public EndOfWord As Boolean = False
    Public IsSuffix As Boolean = False
    Public NextLetter As New List(Of TrieEntry)

    Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
        Me.Letter = Letter
        Me.IsSuffix = IsSuffix
    End Sub

    Public Function Ending() As String
        If Me.IsSuffix Then
            If Me.NextLetter.Count = 0 Then
                Return Me.Letter
            Else
                Return Me.Letter & Me.NextLetter(0).Ending
            End If
        Else
            Return ""
        End If
    End Function

End Class

Public Class Trie

    Private _LongestWord As Integer
    Private _WordCount As Integer
    Private Letters As New List(Of TrieEntry)
    Private Suffixes As New List(Of SuffixData)

    Public Event ImportProgress(ByVal percentage As Integer)
    Public Event FinishedProcessingFile()
    Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

    Private Loading As Boolean = False

    Private Class SuffixData

        Public Sub New(ByVal suffix As String)
            Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
            Dim te As TrieEntry = Me.SuffixEntry
            Dim c As String
            Dim newTE As TrieEntry
            For i As Integer = 1 To suffix.Length - 1
                c = suffix.Substring(i, 1)
                newTE = New TrieEntry(c, True)
                te.NextLetter.Add(newTE)
                te = newTE
            Next
            te.EndOfWord = True
        End Sub

        Public Count As Integer = 0
        Public SuffixEntry As TrieEntry = Nothing

        Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
            If ret = 0 Then
                ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
            End If
            Return ret
        End Function

        Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
            If ret = 0 Then
                ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
                If ret = 0 Then
                    ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
                End If
            End If
            Return ret
        End Function

    End Class

    Public Sub New()
        Me.bgw.WorkerReportsProgress = True
        Dim CommonSuffixes As String = ""
        CommonSuffixes &= "es,ed,ing" ' > 10,000
        CommonSuffixes &= ",er,ly" ' > 5,000
        CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
        For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
            Suffixes.Add(New SuffixData(suffix))
        Next
        Suffixes.Sort(AddressOf SuffixData.ByLengthThenAlpha)
    End Sub

    Public ReadOnly Property LongestWord() As Integer
        Get
            Return _LongestWord
        End Get
    End Property

    Public ReadOnly Property WordCount() As Integer
        Get
            Return _WordCount
        End Get
    End Property

    Public Sub ProcessFile(ByVal filename As String)
        If Not Me.bgw.IsBusy Then
            Loading = True
            Me.bgw.RunWorkerAsync(filename)
        End If
    End Sub

    Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
        Dim filename As String = e.Argument
        Try
            If System.IO.File.Exists(filename) Then
                For Each sd As SuffixData In Suffixes
                    sd.Count = 0
                Next
                Me._WordCount = 0
                Me.Letters.Clear()
                Dim word As String
                Dim lastP As Integer = 0
                Using sr As New System.IO.StreamReader(filename)
                    While sr.Peek <> -1
                        word = sr.ReadLine()
                        If Not IsNothing(word) Then
                            word = word.Trim().ToLower.Split(" ")(0)
                            If word.Length > 0 Then
                                For i As Integer = 0 To Suffixes.Count - 1
                                    If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
                                        Suffixes(i).Count = Suffixes(i).Count + 1
                                        Exit For
                                    End If
                                Next
                                Me.AddWord(word)
                            End If
                        End If
                        Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
                        If p <> lastP Then
                            lastP = p
                            Me.bgw.ReportProgress(lastP)
                        End If
                    End While
                End Using
                Suffixes.Sort(AddressOf SuffixData.ByCountThenLengthThenAlpha)
                Loading = False
            Else
                MessageBox.Show("Unable to locate " & filename, "File Not Found", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End If
        Catch ex As Exception
            MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
        End Try
    End Sub

    Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
        RaiseEvent ImportProgress(e.ProgressPercentage)
    End Sub

    Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
        RaiseEvent FinishedProcessingFile()
    End Sub

    Public Sub AddWord(ByVal word As String)
        If word.Trim.Length = 0 Then
            Exit Sub
        End If

        Dim c As String
        Dim te As TrieEntry = Nothing
        Dim subTE As TrieEntry = Nothing

        word = word.Trim.ToLower
        If Not Me.HasWord(word) Then
            c = word.Substring(0, 1)
            For Each subTE In Me.Letters
                If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                    te = subTE
                    Exit For
                End If
            Next
            If IsNothing(te) Then
                te = New TrieEntry(c)
                Me.Letters.Add(te)
            End If

            Dim found As Boolean = False
            For i As Integer = 1 To word.Length - 1
                For Each sd As SuffixData In Me.Suffixes
                    If word.Substring(i) = sd.SuffixEntry.Ending Then
                        te.NextLetter.Add(sd.SuffixEntry)

                        Me._WordCount = Me._WordCount + 1
                        If word.Length > Me._LongestWord Then
                            Me._LongestWord = word.Length
                        End If
                        Exit Sub
                    End If
                Next

                found = False
                c = word.Substring(i, 1)
                For Each subTE In te.NextLetter
                    If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                        te = subTE
                        found = True
                        Exit For
                    End If
                Next
                If Not found Then
                    subTE = New TrieEntry(c)
                    te.NextLetter.Add(subTE)
                    te = subTE
                End If
            Next i
            te.EndOfWord = True

            Me._WordCount = Me._WordCount + 1
            If word.Length > Me._LongestWord Then
                Me._LongestWord = word.Length
            End If
        End If
    End Sub

    Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
        If ending = "" Then
            If CheckForWord Then
                Return te.EndOfWord
            Else
                Return True
            End If
        End If

        Dim ret As Boolean
        ending = ending.ToLower
        For Each subTE As TrieEntry In te.NextLetter
            If subTE.Letter = ending.Substring(0, 1) Then
                ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasBeginning(ByVal beginning As String) As Boolean
        beginning = beginning.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = beginning.Substring(0, 1) Then
                ret = HasEnding(subTE, beginning.Substring(1))
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasWord(ByVal word As String) As Boolean
        word = word.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = word.Substring(0, 1) Then
                ret = HasEnding(subTE, word.Substring(1), True)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

End Class

Open in new window

Thanks your example is excellent and easy to follow.  It has been very difficult to find examples of DAWGs that I understand.   I want to understand how the algorithm works not just copy someone else's work.  
Ask any questions you might have about the code and I'll do my best to explain it.  =)
One last question, Using this method with the DAWG I still need to have the permutation of all the letters that I select and then pass them in one by one in the hasword function?  
The Word class makes a Graph from the input letters and connects each letter to every other letter.  Then it uses a simple recursive algorithm to produce all the permutations.  Without the DAWG, we would produce all combinations and check every single one for a valid word combination.  In that scenario you could use a HashTable instead of a DAWG for word validation.

With the DAWG, though, we do NOT produce all possible permutations.  The magic happens in the Visit() method at lines #101 thru #103:

                If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
                    OtherLetter.Visit(Word, T, Words, MinLength)
                End If

Here, using the HasBeginning() function, we only recurse to the next available letter if adding it to the current permutation produces a valid word beginning.  This eliminates entire branches of the possible permutations from being generated.  For instance, if the word "quartz" was passed in, we wouldn't generate any word permutations that start with "qz" or "zq" because the DAWG would report back that no words begin with those combinations.  Thus the DAWG reduces the time needed to find all the possible words using a set of letters because the number of permutations actually produced and checked is significantly fewer than simply brute force checking every single permutation.