Help with fixing a script

Hi,

Can someone please provide a working version of this script? I can't seem to figure out how to make this work. Also, if there is a better way to do this with another language that will return the same output, that would be great as well.

The main function of the script is to take a file as input (keyword.txt) with keywords on each line (space delimited) overlap it with Unix built in dictionary and output only keywords that are found in the dictionary. (At least that's my understanding of it)

Thanks

main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;for (i=1;i<NF;i++) if($i=="keyword"){ j=i;i=1000;};if(j>0) {if(j==1) j++;if(j==NF) j--;k=j-1;l=j+1; print $k "\t" $j "\t" $l;}}'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort


llc.sh

#!/bin/sh
olda=""
while read c b a
do
if [ "$a" != "$olda" ]
 then
grep -q "^$a" /usr/share/dict/words >/dell/null
  valid=$?
 fi
 olda=$a
 [ $valid  -eq 0 ] && echo -e "$c\t$b\t$a"
 done
 

ll.sh

#!/bin/sh
olda=""
while read a b c
do
if [ "$a" != "$olda" ]
 then
grep -q "^$a" /usr/share/dict/words >/dell/null
  valid=$?
 fi
 olda=$a
 [ $valid  -eq 0 ] && echo -e "$a\t$b\t$c"
 done
 
faithless1Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

simon3270Commented:
What this script appears to do is take the input file, remove any web addresses (text followed by a dot followed by com, net or org), any non-alphabetic characters some common words (with, from, txt, and, for, the and com) and any 1- or 2-character words.

It then looks for the first word "keyword" on the line, and prints out the word before and after it (or the two words after it if it is the first word on the line), so three words.  It was supposed to print out the last three words if "keyword" was last on the line, but the "if (i" loop stops before the last field.

It then prints out the three words if the first and third word (usually either side of "keyword") are in the dictionary.  To save effort, it doesn't check the dictionary if a word is repeated, just uses the previous result.

There are some scripting changes I would make to the provided scripts:
- "/dell/null" should be /dev/null, and is anyway not required since you are using "grep -q" which doesn't produce any output.
- the pattern you are looking for in the dictionary needs a trailing "$" to mark the end of the word - otherwise if will treat a prefix as a valid word (e.g. "produc" will appear to be a valid word because it is a prefix of "produce")
- In main.sh, the second awk should be "for (i=1;i<=NF;i++) {", not "for (i=1;i<NF;i++) {", so that "keyword" at the end of the line is matched.
- The way of stopping that "for (i=1;" loop (setting i to 1000) is untidy - it would fail if there were more than 1000 fields on the line, and is just a bit obscure.  Just put "next;" after you have printed out that first keyword.
- When checking for .com, .net and .org, you should include 0-9 in your pattern, in case the domain name ends with a digit.

Modified scripts (with a little reformatting to make the logic more obvious) are:
main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z0-9]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;
             for (i=1;i<=NF;i++) {
               if($i=="keyword") {
                 j=i;
                 if(j==1) j++;
                 if(j==NF) j--;
                 k=j-1;
                 l=j+1;
                 print $k "\t" $j "\t" $l;
                 next;
               }
             }
            }'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort 

Open in new window


llc.sh
#!/bin/sh
olda=""
while read c b a
do
  if [ "$a" != "$olda" ]
  then
    grep -q "^$a$" /usr/share/dict/words
    valid=$?
  fi
  olda=$a
  [ $valid  -eq 0 ] && echo -e "$c\t$b\t$a"
done

Open in new window


ll.sh
#!/bin/sh
olda=""
while read a b c
do
  if [ "$a" != "$olda" ]
  then
    grep -q "^$a" /usr/share/dict/words
    valid=$?
  fi
  olda=$a
  [ $valid  -eq 0 ] && echo -e "$a\t$b\t$c"
done

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
simon3270Commented:
One drawback of the above is that if keyword is the first or last word on the line, then you end up checking that "keyword" is in the dictionary, and never check the middle word of the three output.  If you use the main.sh below, it will always put keyword as the middle output word, so will always check both other words on the output line.

main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z0-9]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;
             for (i=1;i<=NF;i++) {
               if($i=="keyword") {
                 j=i;k=j-1;l=j+1;
                 if(i==1) {k=2;l=3;}
                 if(i==NF) {k=NF-2;l=NF-1;}
                 print $k "\t" $j "\t" $l;
                 next;
               }
             }
            }'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort

Open in new window

0
faithless1Author Commented:
Superb, thank you very much!!!!!! It took me a while to understand everything above and I think it now makes perfect sense. Thanks again for your help
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.