[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
• Status: Solved
• Priority: Medium
• Security: Public
• Views: 736

# Scrabble Match

Hi, I'm trying to write a regular expression that will search a word list for available words based on letters given (think scrabble).

Suppose my letters are "scuton", how would I match the word/words that contain as many of these letters are possible? Remember a letter can be used only once. Also, what if I have two s, such as "scutons", how would I match that case too, now s can be used twice.

Does that make sense?

Thanks,
Brian
0
BrianGEFF719
• 3
• 2
1 Solution

Commented:
you won't know how many are possible until you search the entire list, so the best you can do with a regular expression is to look for a word containing a certain number of letters.  when you succeed with one number and fail at another number, that is the mist possible.
If you want to match a words that contain no other letters, that would be
\b[scuton]+\b
If you want to contain at least <number> of those letters, where it can also contain other letters, that would be
([scuton].*){<number>}
If you want to exclude words containing one of those letters twice that would be
(?!.*([scuton]).*\1)
If you want  exclude words containing 3 or more s
(?!(.*s){3})
0

Commented:
This feels more like a permutations problem than a regex problem.  There's no way to represent permutations in regex that I know of.  You can certainly generate a regex that handles all permutations, for example if you had the letters abc then this would match words in a dictionary using those letters.

^(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)\$

But the length of the expression will increase exponentially with the number of characters.

~bp
0

Commented:
^(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)\$
is equivalent to
^(?!.*(a[^a]+a|b[^b]+b|c[^c]+c|[^abc]))
which increases linearly with the number of characters.
0

Commented:
although strictly speaking, (?!...) is not a regular expression,
even if it is recognized by many programs that process regular expressions.
0

Commented:
Yes, I was trying to use the most basic regex syntax for my example, but do appreciate that things like Perl etc offer extensions to that.

~bp
0

Author Commented:
Thank you for your help thus far. Here is what I'm doing using grep:

brian\$ cat MASTER | grep -E '\b[nsfroeak]+\b' | grep -Ev '\b.*(.).*(\1).*\b'

In this case my letters are nsfroeak, and I can have no repeated letters, if I can repeat an e, I am doing:

brian\$ cat MASTER | grep -E '\b[nsfroeak]+\b' | grep -Ev '\b.*([^e]).*(\1).*\b'

Can this be improved in anyway? I'm having troubles with the wildcard, meaning a tile that can be used for any letter. Is there anyway to integrate one "wildcard" letter into this regular expressions?

Thanks.
Brian

0

## Featured Post

• 3
• 2
Tackle projects and never again get stuck behind a technical roadblock.