Remove Non-Words

Hi,

I'm looking for a way to remove non-english dictionary words from a file that has 3 fields:

Example:

word      xxword      x      xa      worlds

I'm looking to output only dictionary words:

word      hello        worlds

I'm pretty sure this would be possible to accomplish by using a dictionary that comes with Unix by overlapping the two files and outputting matches and formatting.


IThank you
faithless1Asked:
Who is Participating?
 
shanikawmConnect With a Mentor Commented:
You can use php Pspell functions.

e.g.:

cat file.txt

penn pencil eraser
black bleu red
monitor key muose

php spell.php

pencil eraser
black red
monitor key

<?php
$pspell_link = pspell_new("en");
$lines=file('file.txt');
foreach ($lines as $line)
{
        $words=preg_split('/[ \s]+/',trim($line));
        foreach ($words as $word)
        {
                if(pspell_check($pspell_link,$word))
                {
                        echo $word,' ';
                }
        }
        echo "\n";
}
?> 

Open in new window

0
 
Ray PaseurConnect With a Mentor Commented:
See the notes here:
http://us.php.net/manual/en/pspell.installation.php

You can run this script to find out if you've got pSpell:
<?php phpinfo(); ?>

This search may have some good examples if you do not have the extension installed.
http://lmgtfy.com?q=PHP+spell+checking
0
All Courses

From novice to tech pro — start learning today.