Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Program to evaluate quality of words...

Posted on 2005-02-24
Medium Priority
Last Modified: 2008-02-26
Hi, I have an application I would like built and I'm just starting my research...

More or less I need a program that can take a list of "words" and evaluate each word for it's “quality” as a word or maybe evaluate it’s pronouncability.   I'm not just talking words in the Oxford dictionary, but anything.  Examples might be "fdasioxc" or "soothie".  Obviously in this case “soothie” would rate quite well and “fdasioxc” would rate poorly.  (Even though neither are actual words.)

So my questions are:

1.) How possible might this be?  Is the English language predictable enough to write an application to do this – or something similar?

2.) How much might it cost to pay a good programmer to accomplish such a task?

3.) What language should I start my focus in?  I’m assuming some languages would be much better for this than others.

I do want to note that I am not looking for a program that:

1.) Has to speak.

2.) Has to be NASA-certified.  More or less I want good guesses.  If the program is wrong, it’s not that important.  Just as long as it works to some degree.

Thanks for your opinions.  I’ll award points to those that I feel can help get me started with this the best.  Right now I’m feeling quite lost and don’t have any idea where to start!

Question by:rebies
  • 2

Expert Comment

ID: 13399968
I would write some code (in C becuase that's what I know, but any language would do), that

1/ Reads in the English dictionary and works our what the probability is that letter X is followed by Y (For example Q followed by U is common, but J followed by D is rare.

2/ Stores all this info away in a file or database

3/ Scans any new word to determine the probability that the letters will appear in the order they appear in. If you get a high probability the word is pronounceable.

The assumes that words already in the dictionary are all reasonably pronounceable.


Accepted Solution

mtglotzbach earned 2000 total points
ID: 13399990
Hmmm.  Interesting problem.  I would imagine that the first step would be some sort of function that decomposes a word into its phonetic representation (not necessarily literally).  Could be an array of numbers that stand for a phonetic symbol.  There are rules of combinations of phonemes that must be followed for english.  Each language has its own set of phonemes, and its own set of rules for combination.

1) all phonological words must contain at least one syllable, and hence must contain at least one vowel.
2) Sequences of repeated consonants are not possible.
3) The velar nasal /ng/ never occurs in the onset of a syllable.
4) The glottal fricative /h/ never occurs in the coda of a syllable.
5) The affricates /ts/ and /dz/, and the glottal fricative /h/ do not occur in complex onsets.
6) The first consonant in a two-consonant onset must be an obstruent.(p,t,k, d, f, g)
7) The second consonant in a two-consonant onset must not be a voiced obstruent.
8) If the first consonant of a two-consonant onset is not an /s/, the second consonant must be a liquid or a glide – the second consonant must be /l/, /r/, /w/, or /j/
9) Every subsequence contained within a sequence of consonants must obey all the relevant phonotactic rules.
10) No glides in syllable codas.
11) The second consonant in a two-consonant coda cannot be /ng/, /d/, /r/, /3/.
12) If the second consonant in a complex coda is voiced, the first consonant in the coda must also be voiced.
13) When a non-alveolar nasal is in a coda together with a non-alveolar obstruent, they msut have the same place of articulation, and obstruent must be a voiceless stop.
14) Two obstruents in a coda together must have the same voicing.

Here is a basic set of these rules taken from the following site:

Some of the words used in these rules may not make sense unless you are a speech expert, however each of the phonemes belongs to a category (fricative, voiced, non-alveolar nasal)  you can find a list of phonemes and the associated classification.  Many speech pathology textbooks will have similar information.

Fortunately all of this can be written in a very strict set of rules that a programmer would simply use logic test to determine if the decompesed word matches these rules.  Given clear specifications and details, it doesn't seem like an overwhelming programming task.  I would guess this could be done for under $1000 (at least I would).  This could include some sort of user interface and maybe a few extras.  To cheapen the cost of software development, the specification should be clear and complete.

I believe most languages would boild down to this same structure.  Makes programming convenient.  However I am sure there are exceptions in more complex languages such as chinese where tone means as much to a word as groupings of phonemes (I think.  I'm no expert)

Hope this helps.

Author Comment

ID: 13412478
Wow really good information mtglotzbach!  Sorry I did not respond earlier - but I was hoping to have time to do more research into what you said.   (yes a lot of what you had there is simply put way over my head - as of right now anyway!)

But thanks for the link and pointer.  It looks like this would be possible.  I'm happy to hear that.


Author Comment

ID: 14300847

You still around here?  If so I might want to consult with you on this project, or at least get some further help identifying how this algorythm would be set up.

Thanks for replying to me if you get this.  You can email me at:

andrew [underscore] re [nospace] berry [at] hotmail.com


Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
If you are a mobile app developer and especially develop hybrid mobile apps then these 4 mistakes you must avoid for hybrid app development to be the more genuine app developer.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Introduction to Processes

581 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question