Program to evaluate quality of words...

Posted on 2005-02-24
Medium Priority
Last Modified: 2008-02-26
Hi, I have an application I would like built and I'm just starting my research...

More or less I need a program that can take a list of "words" and evaluate each word for it's “quality” as a word or maybe evaluate it’s pronouncability.   I'm not just talking words in the Oxford dictionary, but anything.  Examples might be "fdasioxc" or "soothie".  Obviously in this case “soothie” would rate quite well and “fdasioxc” would rate poorly.  (Even though neither are actual words.)

So my questions are:

1.) How possible might this be?  Is the English language predictable enough to write an application to do this – or something similar?

2.) How much might it cost to pay a good programmer to accomplish such a task?

3.) What language should I start my focus in?  I’m assuming some languages would be much better for this than others.

I do want to note that I am not looking for a program that:

1.) Has to speak.

2.) Has to be NASA-certified.  More or less I want good guesses.  If the program is wrong, it’s not that important.  Just as long as it works to some degree.

Thanks for your opinions.  I’ll award points to those that I feel can help get me started with this the best.  Right now I’m feeling quite lost and don’t have any idea where to start!

Question by:rebies
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2

Expert Comment

ID: 13399968
I would write some code (in C becuase that's what I know, but any language would do), that

1/ Reads in the English dictionary and works our what the probability is that letter X is followed by Y (For example Q followed by U is common, but J followed by D is rare.

2/ Stores all this info away in a file or database

3/ Scans any new word to determine the probability that the letters will appear in the order they appear in. If you get a high probability the word is pronounceable.

The assumes that words already in the dictionary are all reasonably pronounceable.


Accepted Solution

mtglotzbach earned 2000 total points
ID: 13399990
Hmmm.  Interesting problem.  I would imagine that the first step would be some sort of function that decomposes a word into its phonetic representation (not necessarily literally).  Could be an array of numbers that stand for a phonetic symbol.  There are rules of combinations of phonemes that must be followed for english.  Each language has its own set of phonemes, and its own set of rules for combination.

1) all phonological words must contain at least one syllable, and hence must contain at least one vowel.
2) Sequences of repeated consonants are not possible.
3) The velar nasal /ng/ never occurs in the onset of a syllable.
4) The glottal fricative /h/ never occurs in the coda of a syllable.
5) The affricates /ts/ and /dz/, and the glottal fricative /h/ do not occur in complex onsets.
6) The first consonant in a two-consonant onset must be an obstruent.(p,t,k, d, f, g)
7) The second consonant in a two-consonant onset must not be a voiced obstruent.
8) If the first consonant of a two-consonant onset is not an /s/, the second consonant must be a liquid or a glide – the second consonant must be /l/, /r/, /w/, or /j/
9) Every subsequence contained within a sequence of consonants must obey all the relevant phonotactic rules.
10) No glides in syllable codas.
11) The second consonant in a two-consonant coda cannot be /ng/, /d/, /r/, /3/.
12) If the second consonant in a complex coda is voiced, the first consonant in the coda must also be voiced.
13) When a non-alveolar nasal is in a coda together with a non-alveolar obstruent, they msut have the same place of articulation, and obstruent must be a voiceless stop.
14) Two obstruents in a coda together must have the same voicing.

Here is a basic set of these rules taken from the following site:

Some of the words used in these rules may not make sense unless you are a speech expert, however each of the phonemes belongs to a category (fricative, voiced, non-alveolar nasal)  you can find a list of phonemes and the associated classification.  Many speech pathology textbooks will have similar information.

Fortunately all of this can be written in a very strict set of rules that a programmer would simply use logic test to determine if the decompesed word matches these rules.  Given clear specifications and details, it doesn't seem like an overwhelming programming task.  I would guess this could be done for under $1000 (at least I would).  This could include some sort of user interface and maybe a few extras.  To cheapen the cost of software development, the specification should be clear and complete.

I believe most languages would boild down to this same structure.  Makes programming convenient.  However I am sure there are exceptions in more complex languages such as chinese where tone means as much to a word as groupings of phonemes (I think.  I'm no expert)

Hope this helps.

Author Comment

ID: 13412478
Wow really good information mtglotzbach!  Sorry I did not respond earlier - but I was hoping to have time to do more research into what you said.   (yes a lot of what you had there is simply put way over my head - as of right now anyway!)

But thanks for the link and pointer.  It looks like this would be possible.  I'm happy to hear that.


Author Comment

ID: 14300847

You still around here?  If so I might want to consult with you on this project, or at least get some further help identifying how this algorythm would be set up.

Thanks for replying to me if you get this.  You can email me at:

andrew [underscore] re [nospace] berry [at] hotmail.com


Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
Make the most of your online learning experience.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Simple Linear Regression
Suggested Courses

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question