Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 409
  • Last Modified:

Help with fixing a script

Hi,

Can someone please provide a working version of this script? I can't seem to figure out how to make this work. Also, if there is a better way to do this with another language that will return the same output, that would be great as well.

The main function of the script is to take a file as input (keyword.txt) with keywords on each line (space delimited) overlap it with Unix built in dictionary and output only keywords that are found in the dictionary. (At least that's my understanding of it)

Thanks

main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;for (i=1;i<NF;i++) if($i=="keyword"){ j=i;i=1000;};if(j>0) {if(j==1) j++;if(j==NF) j--;k=j-1;l=j+1; print $k "\t" $j "\t" $l;}}'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort


llc.sh

#!/bin/sh
olda=""
while read c b a
do
if [ "$a" != "$olda" ]
 then
grep -q "^$a" /usr/share/dict/words >/dell/null
  valid=$?
 fi
 olda=$a
 [ $valid  -eq 0 ] && echo -e "$c\t$b\t$a"
 done
 

ll.sh

#!/bin/sh
olda=""
while read a b c
do
if [ "$a" != "$olda" ]
 then
grep -q "^$a" /usr/share/dict/words >/dell/null
  valid=$?
 fi
 olda=$a
 [ $valid  -eq 0 ] && echo -e "$a\t$b\t$c"
 done
 
0
faithless1
Asked:
faithless1
  • 2
2 Solutions
 
simon3270Commented:
What this script appears to do is take the input file, remove any web addresses (text followed by a dot followed by com, net or org), any non-alphabetic characters some common words (with, from, txt, and, for, the and com) and any 1- or 2-character words.

It then looks for the first word "keyword" on the line, and prints out the word before and after it (or the two words after it if it is the first word on the line), so three words.  It was supposed to print out the last three words if "keyword" was last on the line, but the "if (i" loop stops before the last field.

It then prints out the three words if the first and third word (usually either side of "keyword") are in the dictionary.  To save effort, it doesn't check the dictionary if a word is repeated, just uses the previous result.

There are some scripting changes I would make to the provided scripts:
- "/dell/null" should be /dev/null, and is anyway not required since you are using "grep -q" which doesn't produce any output.
- the pattern you are looking for in the dictionary needs a trailing "$" to mark the end of the word - otherwise if will treat a prefix as a valid word (e.g. "produc" will appear to be a valid word because it is a prefix of "produce")
- In main.sh, the second awk should be "for (i=1;i<=NF;i++) {", not "for (i=1;i<NF;i++) {", so that "keyword" at the end of the line is matched.
- The way of stopping that "for (i=1;" loop (setting i to 1000) is untidy - it would fail if there were more than 1000 fields on the line, and is just a bit obscure.  Just put "next;" after you have printed out that first keyword.
- When checking for .com, .net and .org, you should include 0-9 in your pattern, in case the domain name ends with a digit.

Modified scripts (with a little reformatting to make the logic more obvious) are:
main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z0-9]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;
             for (i=1;i<=NF;i++) {
               if($i=="keyword") {
                 j=i;
                 if(j==1) j++;
                 if(j==NF) j--;
                 k=j-1;
                 l=j+1;
                 print $k "\t" $j "\t" $l;
                 next;
               }
             }
            }'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort 

Open in new window


llc.sh
#!/bin/sh
olda=""
while read c b a
do
  if [ "$a" != "$olda" ]
  then
    grep -q "^$a$" /usr/share/dict/words
    valid=$?
  fi
  olda=$a
  [ $valid  -eq 0 ] && echo -e "$c\t$b\t$a"
done

Open in new window


ll.sh
#!/bin/sh
olda=""
while read a b c
do
  if [ "$a" != "$olda" ]
  then
    grep -q "^$a" /usr/share/dict/words
    valid=$?
  fi
  olda=$a
  [ $valid  -eq 0 ] && echo -e "$a\t$b\t$c"
done

Open in new window

0
 
simon3270Commented:
One drawback of the above is that if keyword is the first or last word on the line, then you end up checking that "keyword" is in the dictionary, and never check the middle word of the three output.  If you use the main.sh below, it will always put keyword as the middle output word, so will always check both other words on the output line.

main.sh
#!/bin/sh
cat $1 | tr A-Z a-z| grep keyword | cat -v|\
sed  -e 's/\([a-z0-9]\+\.\)\+\(com\|org\|net\)[^ ]*//g' -e 's/[^a-z ]/ /g' -e 's/[         ]\+/ /g' |\
awk '{j=-1;for (i=1;i<=NF;i++)if(length($i) < 3 || match($i,"^with$|^from$|^txt$|^and$|^for$|^the$|^com$") ) $i="";print }'|\
sed -e 's/^[        ]\+//g'|\
awk '(NF>2) {j=-1;
             for (i=1;i<=NF;i++) {
               if($i=="keyword") {
                 j=i;k=j-1;l=j+1;
                 if(i==1) {k=2;l=3;}
                 if(i==NF) {k=NF-2;l=NF-1;}
                 print $k "\t" $j "\t" $l;
                 next;
               }
             }
            }'|\
sort -u |sh ll.sh |sort -k +3|sh llc.sh |sort

Open in new window

0
 
faithless1Author Commented:
Superb, thank you very much!!!!!! It took me a while to understand everything above and I think it now makes perfect sense. Thanks again for your help
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now