Solved

Remove Non-Words

Posted on 2010-09-16
2
280 Views
Last Modified: 2012-05-10
Hi,

I'm looking for a way to remove non-english dictionary words from a file that has 3 fields:

Example:

word      xxword      x      xa      worlds

I'm looking to output only dictionary words:

word      hello        worlds

I'm pretty sure this would be possible to accomplish by using a dictionary that comes with Unix by overlapping the two files and outputting matches and formatting.


IThank you
0
Comment
Question by:faithless1
2 Comments
 
LVL 8

Accepted Solution

by:
shanikawm earned 450 total points
ID: 33699184
You can use php Pspell functions.

e.g.:

cat file.txt

penn pencil eraser
black bleu red
monitor key muose

php spell.php

pencil eraser
black red
monitor key

<?php
$pspell_link = pspell_new("en");
$lines=file('file.txt');
foreach ($lines as $line)
{
        $words=preg_split('/[ \s]+/',trim($line));
        foreach ($words as $word)
        {
                if(pspell_check($pspell_link,$word))
                {
                        echo $word,' ';
                }
        }
        echo "\n";
}
?> 

Open in new window

0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 50 total points
ID: 33701333
See the notes here:
http://us.php.net/manual/en/pspell.installation.php

You can run this script to find out if you've got pSpell:
<?php phpinfo(); ?>

This search may have some good examples if you do not have the extension installed.
http://lmgtfy.com?q=PHP+spell+checking
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question