Link to home
Start Free TrialLog in
Avatar of HiT5698
HiT5698

asked on

help using grep with a fixed string

heya guys:

I've got a logfile full of lines with varying regular expressions, and I need to grep through the file, looking for specific patterns

this is part of a bash script that searches through an ircd spam filter log, showing various details - here's the lines I'm having trouble with:

pattern="^FREE .+ pics and movies (www.siteA.da.ru|wWw.siteB.oRg)$"
grep -F "$pattern" spamfilter.log

spamfilter.log is filled with lines similar to this:

[Sun Jul  3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@49f837.w82-123.abo.wanadoo.fr matches filter '^FREE .+ pics and movies (www\.siteA\.da\.ru|wWw\.siteB\.oRg)$': [PRIVMSG DarkReal: 'FREE Porn pics and movies www.siteA.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]

the various logfile lines have many different regex patterns, but that's an example of one of them

so, since what I want to search for is a regular expression, I can't use a regular expression to search for it easilly, so I use the -F switch to tell grep that it's a fixed string, but I can't get the correct results.. any ideas?



ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of HiT5698
HiT5698

ASKER

well I changed the url name in the pattern, so people don't click it here (it has a virus).. and I forgot to put in the escapes..

but you've brought up a great point.. the script is stripping out backslashes on the regex patterns, preventing correct matches in many cases..

here's the code that handles that part: first I grep through the whole spamfilter logfile, and cpy only the needed regex patterns into a temp file, and that temp file does have the backslashes preserved.. but then when the script gets to this:

($TEMPF is the temp file with only regex patterns in it, one per line):

{
cat $TEMPF|while read pattern ; do
  PCOUNT=$(grep -Fc "$pattern" $LOGF)
  LDATE=$(grep -F "$pattern" $LOGF|tail -n1|awk '{print $2" "$3" "$5}')
  PACTIVE=$(grep -Fm1 "$pattern" $CONF)
  [ "$PACTIVE" ] && TAG=[*] || TAG=[!]
  echo ""$n"a"$TAG". pattern: $pattern"    
  echo ""$n"b"$TAG". hits: ${PCOUNT:-0} [last: ${LDATE:-n/a]}"
  echo
  n=$(($n + 1))
done
} >> $OUTF

now the whole problem with that loop is that it strips the backslashes out of $pattern for some reason, so when I grep for $pattern, it usually is the wrong one (unless the original pattern had no backslashes to begin with, then those will work).. but why is that loop stripping the backslashes from $pattern?

to make things easier, here is the entire script:

#!/bin/sh
# makes a report on spamfilter hits

CONF=spamfilter.conf
LOGF=spamfilter.log
TEMPF=$(basename $0).tmp
URC=unrealircd.conf
OUTF=$(basename $0).log
STAMP=$(date '+%F %r %z')
SNAME=$(grep -Eom1 "name \".+\"" $URC|awk '{print $2}'|awk -F\" '{print $2}')
n=1

if [ ! -e "$LOGF" ] ; then
  echo "ERROR: Cannot find $LOGF"
  exit 0
fi

SDATE=$(grep -m1 "\- \[Spamfilter\]" $LOGF|awk '{print $2" "$3}')

# total hits in logfile (all filters)
HC=$(grep -Ec "\- \[Spamfilter\]" $LOGF)

# dump unique patterns from log into temp file
grep -E "\- \[Spamfilter\]" $LOGF|grep -Eo "matches filter '.*':"|awk -F\' '{print $2}'|sort|uniq > $TEMPF

# unique patterns found in log
UCOUNT=$(sed -n '$=' $TEMPF)

echo "# Created: $STAMP" > $OUTF
echo "# IRCd: $SNAME" >> $OUTF
echo "" >> $OUTF
echo "- $LOGF has $HC total spamfilter hits," >> $OUTF
echo "- from $UCOUNT unique patterns, since $SDATE" >> $OUTF
echo "- Prefix legend: [*] = active, [!] = not active" >> $OUTF
echo "" >> $OUTF

{
cat $TEMPF|while read pattern ; do
  PCOUNT=$(grep -Fc "$pattern" $LOGF)
  LDATE=$(grep -F "$pattern" $LOGF|tail -n1|awk '{print $2" "$3" "$5}')
  PACTIVE=$(grep -Fm1 "$pattern" $CONF)
  [ "$PACTIVE" ] && TAG=[*] || TAG=[!]
  echo ""$n"a"$TAG". pattern: $pattern"
  echo ""$n"b"$TAG". hits: ${PCOUNT:-0} [last: ${LDATE:-n/a]}"
  echo
  n=$(($n + 1))
done
} >> $OUTF

cat $OUTF

rm -f $OUTF
rm -f $TEMPF
exit 0

Avatar of HiT5698

ASKER

using that code above, I ran that with this exact logfile (spamfilter.log):

[Sun Jul  3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@d8475.w82-123.abo.wanadoo.fr matches filter '^FREE .+ pics and movies (www\.pornsites\.da\.ru|wWw\.aLmoRa\.oRg)$': [PRIVMSG DarkReal: 'FREE Porn pics and movies www.pornsites.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]
[Mon Jul  4 07:49:30 2005] - [Spamfilter] GurBeT!kedoozwrkq@23.135.11.3 matches filter '^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+\.almora\.org$': [PRIVMSG SixPacK: 'For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : www.almora.org'] [Spamming a porn url, scan your pc for viruses]
[Mon Jul  4 17:36:14 2005] - [Spamfilter] ATeS!~unicfvukg@12.186.170.89 matches filter '^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+\.almora\.org$': [PRIVMSG MERR50: 'For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : www.almora.org'] [Spamming a porn url, scan your pc for viruses]
[Mon Jul  4 20:48:35 2005] - [Spamfilter] ^Linda!~wxenolilr@3qwes-152-1-28-236.w82-123.abo.wanadoo.fr matches filter '^FREE .+ pics and movies (www\.pornsites\.da\.ru|wWw\.aLmoRa\.oRg)$': [PRIVMSG VOH|out: 'FREE Porn pics and movies www.pornsites.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]

and using that exact logfile, here is the script's output:

ircd@drt:~/urleaf$ ./wnsflist
# Created: 2005-07-05 05:26:28 AM +0200
# IRCd: someserver.testnet.org

- spamfilter.log has 4 total spamfilter hits,
- from 2 unique patterns, since Jul 3
- Prefix legend: [*] = active, [!] = not active

1a[!]. pattern: ^FREE .+ pics and movies (www.pornsites.da.ru|wWw.aLmoRa.oRg)$
1b[!]. hits: 0 [last: n/a]

2a[!]. pattern: ^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+.almora.org$
2b[!]. hits: 0 [last: n/a]


it's not supposed to be possible for there to be 0 hits (every pattern has atleast 1 hit of course)
Avatar of HiT5698

ASKER

nevermind I found the answer.. just had to use read -r instead of read.. but ozo, your observation was very helpful (don't know how I missed it before), so it looks like you get the points ;)