Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 459
  • Last Modified:

Need an algorithm to rearrange an arry in all possible combinations in VB.NET

I am trying to figure out a way in VB.NET to take an array of say 7 items and arrange it in all possible combinations.  So if I have a Char array or A,B,C,D,E,F,G my goal is to get it to A,C,B,D,E,F,G, A,C,D,BE,F,G etc.  Then I would like to store each of these in a new array that lists each new combination.  
0
monica73174
Asked:
monica73174
  • 12
  • 5
1 Solution
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
It looks like you want them with no repetitions, but does order matter?  So you want both "BE" and "EB"?
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
One way is to use the libaray here written by Adrian Akison:
http://www.codeproject.com/KB/recipes/Combinatorics.aspx

You would want "variations without repetition".
Click on Project --> Add Reference --> Browse --> Facet.Combinatorics.dll
Next, you create instances of Variations, passing in the set of letters, and the number of characters to choose.

Here is what it could look like:  
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        ListBox1.DataSource = Variations(TextBox1.Text)
    End Sub

    Private Function Variations(ByVal input As String) As List(Of String)
        Dim values As New List(Of String)
        For i As Integer = 0 To input.Length - 1
            values.Add(input.Substring(i, 1))
        Next

        Dim results As New List(Of String)
        For i As Integer = 1 To input.Length
            Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
            For Each variation As IList(Of String) In vars
                results.Add(String.Join("", variation))
            Next
        Next
            Return results
    End Function

End Class

Open in new window


Example using just "ABCD" (as ABCDEFG is A LOT of variations!!!):  Idle-Mind-510645.flv
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
If you want to see them sorted then:
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim results As List(Of String) = Variations(TextBox1.Text)
        results.Sort(AddressOf VariationComparer)
        ListBox1.DataSource = results
    End Sub

    Private Function VariationComparer(ByVal var1 As String, ByVal var2 As String) As Integer
        Dim result As Integer = var1.Length.CompareTo(var2.Length)
        If result = 0 Then
            result = var1.CompareTo(var2)
        End If
        Return result
    End Function

    Private Function Variations(ByVal input As String) As List(Of String)
        Dim values As New List(Of String)
        For i As Integer = 0 To input.Length - 1
            values.Add(input.Substring(i, 1))
        Next

        Dim results As New List(Of String)
        For i As Integer = 1 To input.Length
            Dim vars As New Facet.Combinatorics.Variations(Of String)(values.AsReadOnly(), i)
            For Each variation As IList(Of String) In vars
                results.Add(String.Join("", variation))
            Next
        Next
            Return results
    End Function

End Class

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
monica73174Author Commented:
I believe that is something similar to what I am looking for.  Basically I need all the combinations because I am looking for valid words that can be compared against a dictionary.  
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
That'll produce them all...give it a shot.

There are other ways to approach the problem if you are looking for words in a dictionary though.

The brute force approach produces all possible combos and checks every single one to see if it is a valid word.

...but if you already know that no words start with "zq", then why produce more combos that start with that? zqa, zqb, zqc, etc...

A different approach would produce a DAWG (directed acyclic word graph) from the valid words dictionary and then use a recursive graph traversal based on a connected graph of the letters of the input to build the words.  By checking the DAWG to see if any valid words begin with the current "prefix" you can cut a large number of permutations that can't possibly make a valid word...thus cutting down the generation time significantly.  This approach is often used in "Boggle" type word games.

0
 
monica73174Author Commented:
I was kind of afraid of getting every permutation because that could cause a great deal of processor usage and time to complete the task.  

So using the DAWG method I would take the list of words and create a graph and then search the graph?  There are about ~270,000 words in the acceptable words list I am using.  This approach seems to have the least overhead.  
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
For a DAWG, basically you take your dictionary of valid words and build up a graph that shows which letters connect to other letters in a valid word. *I won't lie, this is a complicated approach!*

Without optimization, though, a DAWG consumes a huge amount of memory.
We trade memory for speed with this approach...

Simplified, consider this small list of valid words: ace, act, are, art

Starting with "ace"...
Build a node with "a".
Connected to that is a node with "c".
Connected to "c" is a node with "e":

    a ---> c ---> e

Next we process "act".
We already have a node with "a".
We already have a node with "c" connected to "a".
Add a node connected to "c" that has "t":

    a ---> c +--> e
                 |
                 \--> t

Following the same procedure for "are" and "art", we end up with a DAWG that looks like this:


    a +--> c +--> e
        |         |
        |         \--> t
        |
        \--> r +--> e
                  \--> t

Now, given the input "acert", we build up another graph that connects every letter to every other letter and start recursively following the paths (starting with each letter as a start point).  At each node, we look at the word built up thus far and ask the DAWG if any valid words start with that "prefix".  Since only words beginning with the letter "a" are in the DAWG, everything beginning with any other letter would immediately be rejected and those combinations wouldn't even be generated.

*Suffixes are a huge source of inefficiency.  Consider suffixes such as "ed", "er" and "ing" which can be attached to a huge number of words!  Without optimization, those suffix combinations would be duplicated for every single word that it was valid for.  So one optimization is to build common suffixes once and then make a prefix point to the existing suffix so that it isn't duplicated.

I have a huge personal project that can show valid words in a Boggle type board.  Let me know if you would like to see some code...
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Here's a quick demo of what that app does...it basically animates the recursive tree traversal and changes the colors to show when it rejects a prefix, finds a valid prefix, and finds a valid word (adding it to the list at the right).  Without the animation part enabled, the valid words populate near instantaneously.  This can literally be seen as I type the board in at the beginning:
Idle-Mind-510654.flv
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
How's this project going monica?  Let me know if you need help tweaking the code I already posted, or if you want to see some code dealing with the DAWG approach.
0
 
monica73174Author Commented:
Hello, I was trying out some different things but the DAWG approach sounds like the way I think i would like to go.  So yes I would love to see some code for DAWG.  
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Alrighty...I'll try to post a stripped down example tomorrow.
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Sorry for the delay.  First things first...results of a word look-up are only as good as the word list.  I'm using the 4th version of the official players scrabble dictionary "ospd4-lwl.txt", the long word list (lwl) version.  I think I originally got it from the Zyzzyva site here: ospd4-lwl.txt
http://zyzzyva.net/wordlists.shtml

Basically I take each line from the file and look for the first word on that line using space as the delimiter.  If your word list file is formatted differently then adjust the code accordingly.

Not sure how much reading you've done on Tries/DAWGs yet.  A DAWG is basically an optimized Trie.
http://en.wikipedia.org/wiki/Trie
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

To optimize a Trie, and thus produce a DAWG, you remove redundant suffixes.  (*There are other optimizations that can be done but I'll just ignore those.)  For full optimization, you'd first build the Trie and then analyze it, removing suffixes that you find to be duplicates.  This is a time-consuming process, though, so I've taken the middle ground with my DAWG.

Instead of analyzing for suffixes actually present in the Trie, I instead simply use a pre-defined list of the most common suffixes; such as "ed", "er", and "ing".  This approach does miss things like the common suffix between "hello" and "jello".  I decided to waste some memory to gain speed in loading the dictionary and also keep the complexity of the code down.

My nodes in the Trie are called TrieEntry:
Public Class TrieEntry

    Public Letter As String = ""
    Public EndOfWord As Boolean = False
    Public IsSuffix As Boolean = False
    Public NextLetter As New List(Of TrieEntry)

    Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
        Me.Letter = Letter
        Me.IsSuffix = IsSuffix
    End Sub

    Public Function Ending() As String
        If Me.IsSuffix Then
            If Me.NextLetter.Count = 0 Then
                Return Me.Letter
            Else
                Return Me.Letter & Me.NextLetter(0).Ending
            End If
        Else
            Return ""
        End If
    End Function

End Class

Open in new window


Here is the Trie class:
Public Class Trie

    Private _LongestWord As Integer
    Private _WordCount As Integer
    Private Letters As New List(Of TrieEntry)
    Private Suffixes As New List(Of SuffixData)

    Public Event ImportProgress(ByVal percentage As Integer)
    Public Event FinishedProcessingFile()
    Public WithEvents bgw As New System.ComponentModel.BackgroundWorker

    Private Loading As Boolean = False

    Private Class SuffixData

        Public Sub New(ByVal suffix As String)
            Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
            Dim te As TrieEntry = Me.SuffixEntry
            Dim c As String
            Dim newTE As TrieEntry
            For i As Integer = 1 To suffix.Length - 1
                c = suffix.Substring(i, 1)
                newTE = New TrieEntry(c, True)
                te.NextLetter.Add(newTE)
                te = newTE
            Next
            te.EndOfWord = True
        End Sub

        Public Count As Integer = 0
        Public SuffixEntry As TrieEntry = Nothing

        Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
            If ret = 0 Then
                ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
            End If
            Return ret
        End Function

        Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
            If ret = 0 Then
                ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
                If ret = 0 Then
                    ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
                End If
            End If
            Return ret
        End Function

    End Class

    Public Sub New()
        Me.bgw.WorkerReportsProgress = True
        Dim CommonSuffixes As String = ""
        CommonSuffixes &= "es,ed,ing" ' > 10,000
        CommonSuffixes &= ",er,ly" ' > 5,000
        CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
        For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
            Suffixes.Add(New SuffixData(suffix))
        Next
        Suffixes.Sort(AddressOf SuffixData.ByLengthThenAlpha)
    End Sub

    Public ReadOnly Property LongestWord() As Integer
        Get
            Return _LongestWord
        End Get
    End Property

    Public ReadOnly Property WordCount() As Integer
        Get
            Return _WordCount
        End Get
    End Property

    Public Sub ProcessFile(ByVal filename As String)
        If Not Me.bgw.IsBusy Then
            Loading = True
            Me.bgw.RunWorkerAsync(filename)
        End If
    End Sub

    Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
        Dim filename As String = e.Argument
        Try
            If System.IO.File.Exists(filename) Then
                For Each sd As SuffixData In Suffixes
                    sd.Count = 0
                Next
                Me._WordCount = 0
                Me.Letters.Clear()
                Dim word As String
                Dim lastP As Integer = 0
                Using sr As New System.IO.StreamReader(filename)
                    While sr.Peek <> -1
                        word = sr.ReadLine()
                        If Not IsNothing(word) Then
                            word = word.Trim().ToLower.Split(" ")(0)
                            If word.Length > 0 Then
                                For i As Integer = 0 To Suffixes.Count - 1
                                    If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
                                        Suffixes(i).Count = Suffixes(i).Count + 1
                                        Exit For
                                    End If
                                Next
                                Me.AddWord(word)
                            End If
                        End If
                        Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
                        If p <> lastP Then
                            lastP = p
                            Me.bgw.ReportProgress(lastP)
                        End If
                    End While
                End Using
                Suffixes.Sort(AddressOf SuffixData.ByCountThenLengthThenAlpha)
                Loading = False
            Else
                MessageBox.Show("Unable to locate " & filename, "File Not Found", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End If
        Catch ex As Exception
            MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
        End Try
    End Sub

    Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
        RaiseEvent ImportProgress(e.ProgressPercentage)
    End Sub

    Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
        RaiseEvent FinishedProcessingFile()
    End Sub

    Public Sub AddWord(ByVal word As String)
        If word.Trim.Length = 0 Then
            Exit Sub
        End If

        Dim c As String
        Dim te As TrieEntry = Nothing
        Dim subTE As TrieEntry = Nothing

        word = word.Trim.ToLower
        If Not Me.HasWord(word) Then
            c = word.Substring(0, 1)
            For Each subTE In Me.Letters
                If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                    te = subTE
                    Exit For
                End If
            Next
            If IsNothing(te) Then
                te = New TrieEntry(c)
                Me.Letters.Add(te)
            End If

            Dim found As Boolean = False
            For i As Integer = 1 To word.Length - 1
                For Each sd As SuffixData In Me.Suffixes
                    If word.Substring(i) = sd.SuffixEntry.Ending Then
                        te.NextLetter.Add(sd.SuffixEntry)

                        Me._WordCount = Me._WordCount + 1
                        If word.Length > Me._LongestWord Then
                            Me._LongestWord = word.Length
                        End If
                        Exit Sub
                    End If
                Next

                found = False
                c = word.Substring(i, 1)
                For Each subTE In te.NextLetter
                    If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                        te = subTE
                        found = True
                        Exit For
                    End If
                Next
                If Not found Then
                    subTE = New TrieEntry(c)
                    te.NextLetter.Add(subTE)
                    te = subTE
                End If
            Next i
            te.EndOfWord = True

            Me._WordCount = Me._WordCount + 1
            If word.Length > Me._LongestWord Then
                Me._LongestWord = word.Length
            End If
        End If
    End Sub

    Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
        If ending = "" Then
            If CheckForWord Then
                Return te.EndOfWord
            Else
                Return True
            End If
        End If

        Dim ret As Boolean
        ending = ending.ToLower
        For Each subTE As TrieEntry In te.NextLetter
            If subTE.Letter = ending.Substring(0, 1) Then
                ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasBeginning(ByVal beginning As String) As Boolean
        beginning = beginning.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = beginning.Substring(0, 1) Then
                ret = HasEnding(subTE, beginning.Substring(1))
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasWord(ByVal word As String) As Boolean
        word = word.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = word.Substring(0, 1) Then
                ret = HasEnding(subTE, word.Substring(1), True)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

End Class

Open in new window


To load it up, create an instance and call the ProcessFile() method.  If you want progress and completion status then wire up the ImportProgress() and FinishedProcessingFile() events:
Public Class Form1

    Private WithEvents T As New Trie

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Button1.Enabled = False

        Using ofd As New OpenFileDialog
            If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
                T.ProcessFile(ofd.FileName)
            End If
        End Using
    End Sub

    Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
        Me.Text = "Loading... " & percentage & "%"
    End Sub

    Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
        Button1.Enabled = True
        Me.Text = "Ready!"
    End Sub

End Class

Open in new window


Now you can ask the Trie if it has a word or if any words begin with a specific prefix:
If T.HasWord("Thingamabob") Then

        End If

        If T.HasBeginning("qyzjw") Then

        End If

Open in new window


I'll post a quick example showing how to find all words that can be built from a set of letters.
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Try this out with a blank form:
Public Class Form1

    Private WithEvents W As Word = Nothing
    Private WithEvents T As New Trie

    Private LB As New ListBox
    Private WithEvents TB As New TextBox

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        TB.Enabled = False
        TB.Location = New Point(TB.Margin.Left, TB.Margin.Top)
        TB.Width = Me.ClientRectangle.Width - (TB.Location.X * 2)
        TB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right
        Me.Controls.Add(TB)

        LB.Location = New Point(TB.Location.X, TB.Bounds.Bottom + TB.Margin.Bottom)
        LB.Size = New Size(TB.Width, Me.ClientRectangle.Height - LB.Margin.Bottom - LB.Location.Y)
        LB.Anchor = AnchorStyles.Left Or AnchorStyles.Top Or AnchorStyles.Right Or AnchorStyles.Bottom
        LB.IntegralHeight = False
        Me.Controls.Add(LB)
    End Sub

    Private Sub Form1_Shown(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Shown
        Using ofd As New OpenFileDialog
            If ofd.ShowDialog = Windows.Forms.DialogResult.OK Then
                T.ProcessFile(ofd.FileName)
            Else
                Me.Close()
            End If
        End Using
    End Sub

    Private Sub T_ImportProgress(ByVal percentage As Integer) Handles T.ImportProgress
        Me.Text = "Loading... " & percentage & "%"
    End Sub

    Private Sub T_FinishedProcessingFile() Handles T.FinishedProcessingFile
        Me.Text = "Enter a Word below!"
        TB.Enabled = True
        TB.Focus()
    End Sub

    Private Sub TB_TextChanged(ByVal sender As Object, ByVal e As System.EventArgs) Handles TB.TextChanged
        If TB.Text.Trim <> "" Then
            If Not IsNothing(W) Then
                W.CancelWordSearch()
            End If

            W = New Word(TB.Text.Trim)
            W.FindWords(Me.T)
        End If
    End Sub

    Private Sub W_WordList(ByVal sender As Word, ByVal WordsFound As System.Collections.Generic.List(Of String)) Handles W.WordList
        LB.DataSource = WordsFound
    End Sub

End Class

Public Class Word

    Private Words As New List(Of String)
    Private Letters As New List(Of Letter)
    Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

    Public Event WordList(ByVal sender As Word, ByVal WordsFound As List(Of String))

    Public Sub New(ByVal word As String)
        For i As Integer = 1 To word.Length
            Dim L As New Letter(word.Substring(i - 1, 1))
            Letters.Add(L)
        Next
        For Each L1 As Letter In Letters
            For Each L2 As Letter In Letters
                If Not L2 Is L1 Then
                    L1.OtherLetters.Add(L2)
                End If
            Next
        Next

        Me.bgw.WorkerSupportsCancellation = True
    End Sub

    Private Class Letter

        Public Value As String = ""
        Public Visited As Boolean = False
        Public OtherLetters As New List(Of Letter)

        Public Sub New(ByVal letter As String)
            Me.Value = letter
        End Sub

        Public Sub Visit(ByVal Word As String, ByVal T As Trie, ByVal Words As List(Of String), ByVal MinLength As Integer)
            Me.Visited = True
            Word = Word & Me.Value
            If Word.Length >= MinLength AndAlso T.HasWord(Word) Then
                Words.Add(Word)
            End If
            For Each OtherLetter As Letter In OtherLetters
                If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
                    OtherLetter.Visit(Word, T, Words, MinLength)
                End If
            Next
            Me.Visited = False
        End Sub

    End Class

    Public Sub FindWords(ByVal T As Trie)
        If Not bgw.IsBusy Then
            bgw.RunWorkerAsync(T)
        End If
    End Sub

    Public Sub CancelWordSearch()
        If bgw.IsBusy AndAlso Not bgw.CancellationPending Then
            bgw.CancelAsync()
        End If
    End Sub

    Private Function SortWords(ByVal Word1 As String, ByVal Word2 As String) As Integer
        Dim result As Integer = Word2.Length.CompareTo(Word1.Length)
        If result = 0 Then
            result = Word1.CompareTo(Word2)
        End If
        Return result
    End Function

    Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
        Dim T As Trie = DirectCast(e.Argument, Trie)

        Words.Clear()
        For Each StartingLetter As Letter In Letters
            For Each L As Letter In Letters
                L.Visited = False
            Next
            StartingLetter.Visit("", T, Words, 2)

            If bgw.CancellationPending Then
                Exit Sub
            End If
        Next

        Words = (From uniqWord In Words Distinct Select uniqWord).ToList
        Words.Sort(AddressOf SortWords)
    End Sub

    Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
        If Not bgw.CancellationPending Then
            RaiseEvent WordList(Me, Words)
        End If
    End Sub

End Class

Public Class TrieEntry

    Public Letter As String = ""
    Public EndOfWord As Boolean = False
    Public IsSuffix As Boolean = False
    Public NextLetter As New List(Of TrieEntry)

    Public Sub New(ByVal Letter As String, Optional ByVal IsSuffix As Boolean = False)
        Me.Letter = Letter
        Me.IsSuffix = IsSuffix
    End Sub

    Public Function Ending() As String
        If Me.IsSuffix Then
            If Me.NextLetter.Count = 0 Then
                Return Me.Letter
            Else
                Return Me.Letter & Me.NextLetter(0).Ending
            End If
        Else
            Return ""
        End If
    End Function

End Class

Public Class Trie

    Private _LongestWord As Integer
    Private _WordCount As Integer
    Private Letters As New List(Of TrieEntry)
    Private Suffixes As New List(Of SuffixData)

    Public Event ImportProgress(ByVal percentage As Integer)
    Public Event FinishedProcessingFile()
    Private WithEvents bgw As New System.ComponentModel.BackgroundWorker

    Private Loading As Boolean = False

    Private Class SuffixData

        Public Sub New(ByVal suffix As String)
            Me.SuffixEntry = New TrieEntry(suffix.Substring(0, 1), True)
            Dim te As TrieEntry = Me.SuffixEntry
            Dim c As String
            Dim newTE As TrieEntry
            For i As Integer = 1 To suffix.Length - 1
                c = suffix.Substring(i, 1)
                newTE = New TrieEntry(c, True)
                te.NextLetter.Add(newTE)
                te = newTE
            Next
            te.EndOfWord = True
        End Sub

        Public Count As Integer = 0
        Public SuffixEntry As TrieEntry = Nothing

        Public Shared Function ByLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
            If ret = 0 Then
                ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
            End If
            Return ret
        End Function

        Public Shared Function ByCountThenLengthThenAlpha(ByVal suffixA As SuffixData, ByVal suffixB As SuffixData) As Integer
            Dim ret As Integer = suffixB.Count.CompareTo(suffixA.Count)
            If ret = 0 Then
                ret = suffixB.SuffixEntry.Letter.Length.CompareTo(suffixA.SuffixEntry.Letter.Length)
                If ret = 0 Then
                    ret = suffixA.SuffixEntry.Letter.CompareTo(suffixB.SuffixEntry.Letter)
                End If
            End If
            Return ret
        End Function

    End Class

    Public Sub New()
        Me.bgw.WorkerReportsProgress = True
        Dim CommonSuffixes As String = ""
        CommonSuffixes &= "es,ed,ing" ' > 10,000
        CommonSuffixes &= ",er,ly" ' > 5,000
        CommonSuffixes &= ",ic,ess,al,est,ation,ate,able,ity,ist,en,or,ion" ' > 1,000
        For Each suffix As String In CommonSuffixes.Split(",".ToCharArray, StringSplitOptions.RemoveEmptyEntries)
            Suffixes.Add(New SuffixData(suffix))
        Next
        Suffixes.Sort(AddressOf SuffixData.ByLengthThenAlpha)
    End Sub

    Public ReadOnly Property LongestWord() As Integer
        Get
            Return _LongestWord
        End Get
    End Property

    Public ReadOnly Property WordCount() As Integer
        Get
            Return _WordCount
        End Get
    End Property

    Public Sub ProcessFile(ByVal filename As String)
        If Not Me.bgw.IsBusy Then
            Loading = True
            Me.bgw.RunWorkerAsync(filename)
        End If
    End Sub

    Private Sub bgw_DoWork(ByVal sender As Object, ByVal e As System.ComponentModel.DoWorkEventArgs) Handles bgw.DoWork
        Dim filename As String = e.Argument
        Try
            If System.IO.File.Exists(filename) Then
                For Each sd As SuffixData In Suffixes
                    sd.Count = 0
                Next
                Me._WordCount = 0
                Me.Letters.Clear()
                Dim word As String
                Dim lastP As Integer = 0
                Using sr As New System.IO.StreamReader(filename)
                    While sr.Peek <> -1
                        word = sr.ReadLine()
                        If Not IsNothing(word) Then
                            word = word.Trim().ToLower.Split(" ")(0)
                            If word.Length > 0 Then
                                For i As Integer = 0 To Suffixes.Count - 1
                                    If word.EndsWith(Suffixes(i).SuffixEntry.Ending) Then
                                        Suffixes(i).Count = Suffixes(i).Count + 1
                                        Exit For
                                    End If
                                Next
                                Me.AddWord(word)
                            End If
                        End If
                        Dim p As Integer = Math.Floor(sr.BaseStream.Position / sr.BaseStream.Length * 100)
                        If p <> lastP Then
                            lastP = p
                            Me.bgw.ReportProgress(lastP)
                        End If
                    End While
                End Using
                Suffixes.Sort(AddressOf SuffixData.ByCountThenLengthThenAlpha)
                Loading = False
            Else
                MessageBox.Show("Unable to locate " & filename, "File Not Found", MessageBoxButtons.OK, MessageBoxIcon.Error)
            End If
        Catch ex As Exception
            MessageBox.Show("FileName: " & filename & vbCrLf & vbCrLf & ex.ToString, "Error Processing File", MessageBoxButtons.OK, MessageBoxIcon.Error)
        End Try
    End Sub

    Private Sub bgw_ProgressChanged(ByVal sender As Object, ByVal e As System.ComponentModel.ProgressChangedEventArgs) Handles bgw.ProgressChanged
        RaiseEvent ImportProgress(e.ProgressPercentage)
    End Sub

    Private Sub bgw_RunWorkerCompleted(ByVal sender As Object, ByVal e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles bgw.RunWorkerCompleted
        RaiseEvent FinishedProcessingFile()
    End Sub

    Public Sub AddWord(ByVal word As String)
        If word.Trim.Length = 0 Then
            Exit Sub
        End If

        Dim c As String
        Dim te As TrieEntry = Nothing
        Dim subTE As TrieEntry = Nothing

        word = word.Trim.ToLower
        If Not Me.HasWord(word) Then
            c = word.Substring(0, 1)
            For Each subTE In Me.Letters
                If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                    te = subTE
                    Exit For
                End If
            Next
            If IsNothing(te) Then
                te = New TrieEntry(c)
                Me.Letters.Add(te)
            End If

            Dim found As Boolean = False
            For i As Integer = 1 To word.Length - 1
                For Each sd As SuffixData In Me.Suffixes
                    If word.Substring(i) = sd.SuffixEntry.Ending Then
                        te.NextLetter.Add(sd.SuffixEntry)

                        Me._WordCount = Me._WordCount + 1
                        If word.Length > Me._LongestWord Then
                            Me._LongestWord = word.Length
                        End If
                        Exit Sub
                    End If
                Next

                found = False
                c = word.Substring(i, 1)
                For Each subTE In te.NextLetter
                    If subTE.Letter = c AndAlso Not subTE.IsSuffix Then
                        te = subTE
                        found = True
                        Exit For
                    End If
                Next
                If Not found Then
                    subTE = New TrieEntry(c)
                    te.NextLetter.Add(subTE)
                    te = subTE
                End If
            Next i
            te.EndOfWord = True

            Me._WordCount = Me._WordCount + 1
            If word.Length > Me._LongestWord Then
                Me._LongestWord = word.Length
            End If
        End If
    End Sub

    Private Function HasEnding(ByVal te As TrieEntry, ByVal ending As String, Optional ByVal CheckForWord As Boolean = False) As Boolean
        If ending = "" Then
            If CheckForWord Then
                Return te.EndOfWord
            Else
                Return True
            End If
        End If

        Dim ret As Boolean
        ending = ending.ToLower
        For Each subTE As TrieEntry In te.NextLetter
            If subTE.Letter = ending.Substring(0, 1) Then
                ret = HasEnding(subTE, ending.Substring(1), CheckForWord)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasBeginning(ByVal beginning As String) As Boolean
        beginning = beginning.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = beginning.Substring(0, 1) Then
                ret = HasEnding(subTE, beginning.Substring(1))
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

    Public Function HasWord(ByVal word As String) As Boolean
        word = word.ToLower
        Dim ret As Boolean
        For Each subTE As TrieEntry In Me.Letters
            If subTE.Letter = word.Substring(0, 1) Then
                ret = HasEnding(subTE, word.Substring(1), True)
                If ret Then
                    Return True
                End If
            End If
        Next
        Return False
    End Function

End Class

Open in new window

0
 
monica73174Author Commented:
Thanks your example is excellent and easy to follow.  It has been very difficult to find examples of DAWGs that I understand.   I want to understand how the algorithm works not just copy someone else's work.  
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
Ask any questions you might have about the code and I'll do my best to explain it.  =)
0
 
monica73174Author Commented:
One last question, Using this method with the DAWG I still need to have the permutation of all the letters that I select and then pass them in one by one in the hasword function?  
0
 
Mike TomlinsonMiddle School Assistant TeacherCommented:
The Word class makes a Graph from the input letters and connects each letter to every other letter.  Then it uses a simple recursive algorithm to produce all the permutations.  Without the DAWG, we would produce all combinations and check every single one for a valid word combination.  In that scenario you could use a HashTable instead of a DAWG for word validation.

With the DAWG, though, we do NOT produce all possible permutations.  The magic happens in the Visit() method at lines #101 thru #103:

                If Not OtherLetter.Visited AndAlso T.HasBeginning(Word & OtherLetter.Value) Then
                    OtherLetter.Visit(Word, T, Words, MinLength)
                End If

Here, using the HasBeginning() function, we only recurse to the next available letter if adding it to the current permutation produces a valid word beginning.  This eliminates entire branches of the possible permutations from being generated.  For instance, if the word "quartz" was passed in, we wouldn't generate any word permutations that start with "qz" or "zq" because the DAWG would report back that no words begin with those combinations.  Thus the DAWG reduces the time needed to find all the possible words using a set of letters because the number of permutations actually produced and checked is significantly fewer than simply brute force checking every single permutation.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 12
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now