• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 964
  • Last Modified:

Grep for pattern(s) through directories

Hi,
I have a script using "find" and "grep" together to search through all dirs for multiple pattern matching which is returning output as filenames but I want the output to show the actual patterns in the file

eg  
TMP_DIR=/tmp_dir

if [[ $# -eq 0 ]]
then
        echo "<Enter filename & search pattern(s) in quotes>"
        exit;
fi;

if [[ $# -eq 2 ]]
then

> ${TMP_DIR}/filelist.dat
find . -name "$1" | xargs egrep -li "$2" | while read FILENAME
do

if [ -f $FILENAME ]
   then
    ls -lg $FILENAME >> ${TMP_DIR}/filelist.dat
  fi
done

cat ${TMP_DIR}/filelist.dat|cut -c33-
 rm ${TMP_DIR}/filelist.dat
fi

if [[ $# -eq 3 ]]
then

> ${TMP_DIR}/filelist.dat
find . -name "$1" | xargs egrep -li "$2" | xargs egrep -li "$3" | while read FILENAME
do

if [ -f $FILENAME ]
   then
    ls -lg $FILENAME >> ${TMP_DIR}/filelist.dat
  fi
done

cat ${TMP_DIR}/filelist.dat | cut -c33-
 rm ${TMP_DIR}/filelist.dat
fi

How can I tweak this so instead of seeing the filename containing the matching pattern it shows the actual pattern in the file?
0
troublesome
Asked:
troublesome
  • 9
  • 6
  • 5
  • +3
1 Solution
 
glassdCommented:
Do yo need both filename and matched lines.

For only matched lines:

find . -name $FILENAME | xargs -i "$PATTERN"

For both:

for $FILE in $(find . -name $FILENAME | xargs -li "$PATTERN")
do
  echo "$FILE ----------------------------"
  grep $PATTERN $FILE
done
0
 
troublesomeAuthor Commented:
Tried the logic above and could not get it to work. The script I am using requires min of 2 parameters, the first is the filename ie "file*ksh" and 2nd and subsequent parameters are the pattern(s) "pattern 1" and "pattern 2" in the example below

At the moment using the script above only matching files including the path are displayed on screen using the cat command. What I would like is the filename with full path PLUS the pattern(s) also.
eg searching for pattern "pattern 1" AND "pattern 2"

Current:
multigrep "*.ksh" "pattern1" pattern2"
Aug 14  2003 ./dir1/dir2/subdir1/file1.ksh
Nov 14  2002 ./dir1/dir2/subdir2/file2.ksh

Should get:
Aug 14  2003 ./dir1/dir2/subdir1/file1.ksh:    pattern1  
Aug 14  2003 ./dir1/dir2/subdir1/file1.ksh:    pattern2
Nov 14  2002 ./dir1/dir2/subdir2/file2.ksh:       pattern1
Nov 14  2002 ./dir1/dir2/subdir2/file2.ksh:       pattern2
0
 
neteducationCommented:
#!/bin/ksh

filename=$1
shift 1

find . -name $filename | while read thisfile
do
  for pattern
  do
    grep $pattern $thisfile
  done
done


just typed it out of head so I dont know if it works just like this
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
TintinCommented:
#!/bin/ksh

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift
pattern=`echo $*|sed 's/ /|/g'`

find . -type f -name "$file_pattern" -exec egrep "$pattern" {} /dev/null \;


Don't forget that when you run it, you will need to quote any metacharacters, otherwise shell globbing will expand then, eg:

multigrep '*.ksh' pattern1 pattern2

0
 
neteducationCommented:
tintin: brilliant idea....

Two things to think about:

What if there is a space (quoted) pattern ?
What if there is a pipe in the (quoted) pattern ?

(my solution would also fail the first case...  my 'grep $pattern $thisfile'-line should read 'grep "$pattern" $thisfile' instead. )

If we solve these two cases I suppose your solution is much faster than mine.
0
 
troublesomeAuthor Commented:
multigrep '*.ksh' pattern1 pattern2

Almost there - Typing the above results in pattern matching when pattern1 OR pattern2 exist. I need it to find files where pattern1 AND pattern2 exist.

Also I noticed that it allows unlimited no of patterns for matching which is brilliant as it avoids having to repeat logic for variable pattern parameters.

eg if  I do multigrep '*.ksh' pattern1 pattern2 pattern3 pattern4 it should find all files that contain all 4 patterns and display the path/filename and pattern.
0
 
neteducationCommented:
Taking up my idea (which may work more often but slower)....

#!/bin/ksh

filename=$1
shift 1

find . -name $filename | while read thisfile
do
  rm /tmp/output$$
  for pattern
  do
    if ! grep $pattern $thisfile >>/tmp/output$$
    then
      break 2
    fi
  done
  cat /tmp/output$$
  rm /tmp/output$$
done
rm /tmp/output$$

0
 
troublesomeAuthor Commented:
Executing the script below but this is now erroring as shown below. Have put some diagnostics to confirm that parameters are being picked up correctly.
Help !!!!!

multigrep "*.ksh" "sleep" "if"
Result:
$0 = multigrep
$1 = *.ksh
$2 = sleep 30
$3 = if
find: bad option a2.ksh
find: path-list predicate-list
/tmp/output7520: No such file or directory


#!/bin/ksh

##########
#  multigrep  #
##########

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

echo "\$0 = $0"
echo "\$1 = $1"
echo "\$2 = $2"
echo "\$3 = $3"

filename=$1
shift 1

find . -name $filename | while read thisfile
do
  rm /tmp/output$$
  for pattern
  do
    if ! grep $pattern $thisfile >> /tmp/output$$
    then
      break 2
    fi
  done
  cat /tmp/output$$
   rm /tmp/output$$
done
  rm /tmp/output$$

0
 
neteducationCommented:
ok, modified a little (as I said, was typing out of head :-))


#!/bin/ksh

##########
#  multigrep  #
##########

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

echo "\$0 = $0"
echo "\$1 = $1"
echo "\$2 = $2"
echo "\$3 = $3"

filename=$1
shift 1

find . -name "$filename" | while read thisfile
do
  rm /tmp/output$$ 2>/dev/null
  for pattern
  do
    if ! grep "$pattern" "$thisfile" >> /tmp/output$$
    then
      break 2
    fi
  done
  cat /tmp/output$$
   rm /tmp/output$$ 2>/dev/null
done
rm /tmp/output$$ 2>/dev/null

0
 
troublesomeAuthor Commented:
Hi neteducation,

Thanks the error has cleared but still 2 problems:
1. output consists of pattern only without path and filename
2.Only works with 1 pattern - as soon as 2nd one included as parameter 3 there is no output. hmmmmmmm

Any ideas?
0
 
TintinCommented:
Here's my updated version that matches *all* specified patterns

#!/bin/ksh

function agrep
{
  FILE=$1
  shift

  for pattern in $*
  do
    grep "$pattern" $FILE >>/tmp/$$
  done

  if [ -s /tmp/$$ ]
  then
     echo "$FILE"
     cat /tmp/$$
     echo
  fi

  rm -f /tmp/$$
}

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift

for file in `find . -type f -name "$file_pattern"`
do
  agrep $file $*
done
0
 
Mike R.Commented:
Maybe I'm over-simplifying this.  But what about ...

find /dir -name "filenames" -exec /usr/xpg4/bin/grep -si -e "pattern1" -e "pattern2" {} \;

the syntax is ...
find /dir = will start a find on the "/dir"
-name "filenames" = add any find parameters you want (see find man page)
-exec = tell find to run the following command against any file found
/usr/xpg4/bin/grep = the command to be run.  In this instance, xpg4 has to be used to support the "-e"
-si = part of the grep saying "-s"upress errors and "-i"gnore case.
-e = grep switch saying use multiple patterns
{} = find syntax which stands as the variable containing the "found file" filled by the find command
\; = the method by which find understands it has reached the end of the overall find command statement and it is time to actually execute

Maybe I did not meet all the things you are trying to accomplish though :-)
Best of luck!
M
0
 
TintinCommented:
Excellent suggestion rightmirem.

Taking your example, my script can now read:

#!/bin/ksh

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift

patterns=`echo "$*"|sed 's/ / -e /g'`
find . -type f -name "$file_pattern" -exec /usr/xpg4/bin/grep -e $patterns {} /dev/null \;
0
 
troublesomeAuthor Commented:
Hi rightmirem, Tintin

Thanks for the above suggestions, I've been asking colleagues at work for a while now how I could develop the multigrep script I use currently to include information at line level instead of file level and most of them are stumped.

Thought I'd increase the points value as both the above suggestions are producing the same results but its still grepping for pattern1 OR pattern2 OR pattern3 ... etc.

When I tested it on 3 pattterns it should have matched on one file as this one contained all 3 patterns but the output was on that file plus another 20 or so that had either of the 3 patterns instead of all 3.

The format of the output is definitely better from Tintins last but one script - although it takes slightly longer to execute the output is easier to read.

1. Is it possible to add datestamp, timestamp and filesize to the left of the path/filename ?
2. Can the efficiency of rightmirem script be combined with output style of Tintin's.
3. Most importantly can the grep be corrected so its an "AND" instead of an "OR."

Cheers
0
 
troublesomeAuthor Commented:
Another problem is that if pattern contains wildcard like . or , etc it does not read it as part of the string so output is limited. Also it looks like it is not ignoring case. Searched for tony and it ignored all files with Tony.

Thanks
0
 
Mike R.Commented:
More simple thoughts...

-add the "-i" to ave grep ignore case.  That way a "grep -i test" will return regardless of whether it finds "test", "TEST", "Test", "tesT", ETC.

- make the secondary patterns pipes.  
I.E. "grep -i -e test1 -e test2 will return anything with EITHER "test1" or "test2"
...however "grep -i test1 | grep -i test2" will grep for "test1", and then re-grep all the returns from the first grep for test2.  The only thing that will be output will be items with BOTH "test1" AND "test2" in them.

So...for your script it could be ...
find . -type f -name "$file_pattern" -exec /usr/xpg4/bin/grep -e $pattern1 {} /dev/null \; | grep $pattern2 | grep $pattern3

BoL!
M
0
 
TintinCommented:
OK, I believe this version should do everything you need.  It will:

1.  Ignore case in the search.
2.  Be able to specify any number of patterns
3.  Does a AND match
4.  Displays the date of the file and any matches.

#!/bin/ksh

TEMP=/tmp/$$

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift

for file in `find . -type f -name "$file_pattern"`
do
  rm -f $TEMP

  for pattern in $*
  do
    grep -i $pattern $file >>$TEMP || continue 2
  done

  ls -lg $file | cut -c33-
  cat $TEMP
  echo
done
0
 
troublesomeAuthor Commented:
Sorry to be a pain but discovered a couple more issues:

Has a problem with certain metacharacters
eg
.
?
/
\

It interprets my.name@anywhere.com same as my name@anywhere.com. I would like the pattern matching to be exact apart from the case issue which you have resolved now, so that "my.name" and "my name" are different patterns.

2nd issue I can explain via an example of 2 test files:

#/!/bin/ksh
# test_file_1
hello baby its me

#/!/bin/ksh
# test_file_2
hello baby
baby its
its me
me
========================
multigrep4 "*ksh" "hello"

      43 Apr 22 11:47 ./test2.ksh
hello baby

      31 Apr 22 11:54 ./test1.ksh
hello baby its me
========================
multigrep4 "*ksh" "hello baby"                                                         [should be: 31 Apr 22 11:54 ./test1.ksh
                                                                                                               hello baby its me]          
      43 Apr 22 11:47 ./test2.ksh                                                          
hello baby                                                                                      [should be: 43 Apr 22 11:47 ./test2.ksh
hello baby                                                                                                 hello baby]
baby its

      31 Apr 22 11:54 ./test1.ksh
hello baby its me
hello baby its me
========================
multigrep4 "*ksh" "hello baby its"                                                    [should be: 31 Apr 22 11:54 ./test1.ksh
                                                                                                                     hello baby its me]
      43 Apr 22 11:47 ./test2.ksh
hello baby
hello baby
baby its
baby its
its me

      31 Apr 22 11:54 ./test1.ksh
hello baby its me
hello baby its me
hello baby its me
========================
multigrep4 "*ksh" "hello baby its me"                             [should be: 31 Apr 22 11:54 ./test1.ksh
                                                                                                  hello baby its me]

      43 Apr 22 11:47 ./test2.ksh
hello baby
hello baby
baby its
baby its
its me
its me
me

      31 Apr 22 11:54 ./test1.ksh
hello baby its me
hello baby its me
hello baby its me
hello baby its me
========================

As you can see above the only scenario where it worked the way it should was when the pattern string was text without spaces. As soon as there are spaces the results are wierd??? Its repeating everything. I have included what the oputput should on right hand side.
I think this may also be to do with special case re: metacharacters as I believe spaces are causing the problem even though I am putting quotes around the pattern I'm matching.

Cheers
0
 
TintinCommented:
OK, here is the latest version.

This handles exact matches by using fgrep and hands spaces in patterns by changing the loop to use $@ instead of $#

#!/bin/ksh

TEMP=/tmp/$$

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift

for file in `find . -type f -name "$file_pattern"`
do
  rm -f $TEMP

  for pattern in "$@"
  do
    fgrep -i "$pattern" $file >>$TEMP || continue 2
  done

  ls -lg $file | cut -c33-
  cat $TEMP
  echo
done
0
 
neteducationCommented:
troublesome: not quite sure what you mean by "pattern"...

I (and probably also tintin at first) was thinking that you want to be able to use searchpattern the way grep supports them. So i.e.

multigrep "*ksh" "[5-9]th place" "myname"

to search for 5th up to 9th place and things like this....

If you want this, then you cant use fgrep. If you want to explizitly search for certain metacharacters, then you must masquerade them, i.e. like this:

multigrep '*ksh' 'my\.name@somewhere'

because the dot is the metacharacter for a single character

If on the other hand you only want to search for fixed strings then fgrep is the solution (and its even much faster than normal grep)

0
 
ahoffmannCommented:
> eg if  I do multigrep '*.ksh' pattern1 pattern2 pattern3 pattern4 it should find all files that contain all 4 patterns and display the path/filename and pattern.

beside all other questions and suggestions, just a solution for this one:

awk '/pattern1/{p1=1}/pattern2/{p2=1}/pattern3/{p3=1}/pattern4/{p4=1}END{if(p1==1&&p2==1&&p3==1&&p4==1){print "pattern1 pattern2 pattern3 pattern4"}}' file

this can be used with your favorite find, somehow like this:
find . -type f -exec awk '.......' {} \;

if your awk is nawk or gawk, then you can also print the filename just by using following print in the END{} section:
  print FILENAME": pattern1 pattern2 pattern3 pattern4"

is this what you're looking for?
0
 
troublesomeAuthor Commented:
Hi,
Sorry for the confusion. What I meant by pattern was being able to search for strings of text even if the string contained within it any metacharacters so for eg there may be an error that has been generated by an application which a user is quoting but as we know the error test is not always captured accurately so require some flexibilty when searching for the code from which the error has been generated as the first step.
As the number of codes number in the thousands I needed a tool which allows me to search for patterns which could be single text or a string of test:
"Unable to update headers"
If the error message is quite long then the chances are the message will be broken up in the code so it occupies several lines, therefore putting in the whole message will bring up no matches so in this case I would break up the message into smaller constituents and use them as separate strings:
MESSAGE 'While these account details have been displayed on your screen, ' +
     'they have been updated by another user. ' +
     'To maintain data integrity, this save has been abandoned.'
In the above example I would search for files with "multiple patterns" ['While these account"] ["maintain data integrity"]

I've tweaked the script a little to avoid duplicate patterns when grepping for multiple occurences of pattern:
This is the final script which I've tested and its working exactly the way I wanted. Thank you all so much for your patience.

#!/bin/ksh

TEMP=/tmp/file1
TEMP2=/tmp/file2

if [ $# -lt 2 ]
then
   echo "Usage: `basename $0` [file] [pattern1] [pattern2] ...." >&2
   exit 1
fi

file_pattern=$1
shift

for file in `find . -type f -name "$file_pattern"`
do
  rm -f $TEMP

  for pattern in "$@"
  do
    fgrep -i "$pattern" $file >>$TEMP || continue 2
  done

  ls -l $file | cut -c33-
  uniq $TEMP > $TEMP2
  mv $TEMP2 $TEMP
  cat $TEMP
  echo
  echo
  echo
done
  rm -f $TEMP
0
 
troublesomeAuthor Commented:
I almost forgot to ask whether its possible to type the $file_pattern parameter without having to use quotes as this will mainly be in the form *.ksh, *.sc, *.osq etc? At the moment only works with quotes wrapped around it although if patterns are text without spaces they are accepted without quotes.
0
 
TintinCommented:
You will only be able to specify parameters with metacharacters in them if you turn off shell globbing.

In ksh, you do this typing at the command prompt:

set -o noglob

BTW, you can get rid of your temporary files in your script by changing:

uniq $TEMP > $TEMP2
mv $TEMP2 $TEMP
cat $TEMP

to simply:

uniq $TEMP




0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 9
  • 6
  • 5
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now