HiT5698
asked on
help using grep with a fixed string
heya guys:
I've got a logfile full of lines with varying regular expressions, and I need to grep through the file, looking for specific patterns
this is part of a bash script that searches through an ircd spam filter log, showing various details - here's the lines I'm having trouble with:
pattern="^FREE .+ pics and movies (www.siteA.da.ru|wWw.siteB.oRg)$"
grep -F "$pattern" spamfilter.log
spamfilter.log is filled with lines similar to this:
[Sun Jul 3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@49f 837.w82-12 3.abo.wana doo.fr matches filter '^FREE .+ pics and movies (www\.siteA\.da\.ru|wWw\.s iteB\.oRg) $': [PRIVMSG DarkReal: 'FREE Porn pics and movies www.siteA.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]
the various logfile lines have many different regex patterns, but that's an example of one of them
so, since what I want to search for is a regular expression, I can't use a regular expression to search for it easilly, so I use the -F switch to tell grep that it's a fixed string, but I can't get the correct results.. any ideas?
I've got a logfile full of lines with varying regular expressions, and I need to grep through the file, looking for specific patterns
this is part of a bash script that searches through an ircd spam filter log, showing various details - here's the lines I'm having trouble with:
pattern="^FREE .+ pics and movies (www.siteA.da.ru|wWw.siteB.oRg)$"
grep -F "$pattern" spamfilter.log
spamfilter.log is filled with lines similar to this:
[Sun Jul 3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@49f
the various logfile lines have many different regex patterns, but that's an example of one of them
so, since what I want to search for is a regular expression, I can't use a regular expression to search for it easilly, so I use the -F switch to tell grep that it's a fixed string, but I can't get the correct results.. any ideas?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
using that code above, I ran that with this exact logfile (spamfilter.log):
[Sun Jul 3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@d84 75.w82-123 .abo.wanad oo.fr matches filter '^FREE .+ pics and movies (www\.pornsites\.da\.ru|wW w\.aLmoRa\ .oRg)$': [PRIVMSG DarkReal: 'FREE Porn pics and movies www.pornsites.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]
[Mon Jul 4 07:49:30 2005] - [Spamfilter] GurBeT!kedoozwrkq@23.135.1 1.3 matches filter '^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+\.almora\.org$': [PRIVMSG SixPacK: 'For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : www.almora.org'] [Spamming a porn url, scan your pc for viruses]
[Mon Jul 4 17:36:14 2005] - [Spamfilter] ATeS!~unicfvukg@12.186.170 .89 matches filter '^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+\.almora\.org$': [PRIVMSG MERR50: 'For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : www.almora.org'] [Spamming a porn url, scan your pc for viruses]
[Mon Jul 4 20:48:35 2005] - [Spamfilter] ^Linda!~wxenolilr@3qwes-15 2-1-28-236 .w82-123.a bo.wanadoo .fr matches filter '^FREE .+ pics and movies (www\.pornsites\.da\.ru|wW w\.aLmoRa\ .oRg)$': [PRIVMSG VOH|out: 'FREE Porn pics and movies www.pornsites.da.ru'] [Spamming a porn url to users. Scan your pc for viruses.]
and using that exact logfile, here is the script's output:
ircd@drt:~/urleaf$ ./wnsflist
# Created: 2005-07-05 05:26:28 AM +0200
# IRCd: someserver.testnet.org
- spamfilter.log has 4 total spamfilter hits,
- from 2 unique patterns, since Jul 3
- Prefix legend: [*] = active, [!] = not active
1a[!]. pattern: ^FREE .+ pics and movies (www.pornsites.da.ru|wWw.aLmoRa.oRg)$
1b[!]. hits: 0 [last: n/a]
2a[!]. pattern: ^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+.almora.org$
2b[!]. hits: 0 [last: n/a]
it's not supposed to be possible for there to be 0 hits (every pattern has atleast 1 hit of course)
[Sun Jul 3 18:19:27 2005] - [Spamfilter] [|didosch|]!~wkngfhdvs@d84
[Mon Jul 4 07:49:30 2005] - [Spamfilter] GurBeT!kedoozwrkq@23.135.1
[Mon Jul 4 17:36:14 2005] - [Spamfilter] ATeS!~unicfvukg@12.186.170
[Mon Jul 4 20:48:35 2005] - [Spamfilter] ^Linda!~wxenolilr@3qwes-15
and using that exact logfile, here is the script's output:
ircd@drt:~/urleaf$ ./wnsflist
# Created: 2005-07-05 05:26:28 AM +0200
# IRCd: someserver.testnet.org
- spamfilter.log has 4 total spamfilter hits,
- from 2 unique patterns, since Jul 3
- Prefix legend: [*] = active, [!] = not active
1a[!]. pattern: ^FREE .+ pics and movies (www.pornsites.da.ru|wWw.aLmoRa.oRg)$
1b[!]. hits: 0 [last: n/a]
2a[!]. pattern: ^For Hot Girl & Crazy PØrn Movîes & HardcØre PØrn MØviés Click Red Box : (http://)?.+.almora.org$
2b[!]. hits: 0 [last: n/a]
it's not supposed to be possible for there to be 0 hits (every pattern has atleast 1 hit of course)
ASKER
nevermind I found the answer.. just had to use read -r instead of read.. but ozo, your observation was very helpful (don't know how I missed it before), so it looks like you get the points ;)
ASKER
but you've brought up a great point.. the script is stripping out backslashes on the regex patterns, preventing correct matches in many cases..
here's the code that handles that part: first I grep through the whole spamfilter logfile, and cpy only the needed regex patterns into a temp file, and that temp file does have the backslashes preserved.. but then when the script gets to this:
($TEMPF is the temp file with only regex patterns in it, one per line):
{
cat $TEMPF|while read pattern ; do
PCOUNT=$(grep -Fc "$pattern" $LOGF)
LDATE=$(grep -F "$pattern" $LOGF|tail -n1|awk '{print $2" "$3" "$5}')
PACTIVE=$(grep -Fm1 "$pattern" $CONF)
[ "$PACTIVE" ] && TAG=[*] || TAG=[!]
echo ""$n"a"$TAG". pattern: $pattern"
echo ""$n"b"$TAG". hits: ${PCOUNT:-0} [last: ${LDATE:-n/a]}"
echo
n=$(($n + 1))
done
} >> $OUTF
now the whole problem with that loop is that it strips the backslashes out of $pattern for some reason, so when I grep for $pattern, it usually is the wrong one (unless the original pattern had no backslashes to begin with, then those will work).. but why is that loop stripping the backslashes from $pattern?
to make things easier, here is the entire script:
#!/bin/sh
# makes a report on spamfilter hits
CONF=spamfilter.conf
LOGF=spamfilter.log
TEMPF=$(basename $0).tmp
URC=unrealircd.conf
OUTF=$(basename $0).log
STAMP=$(date '+%F %r %z')
SNAME=$(grep -Eom1 "name \".+\"" $URC|awk '{print $2}'|awk -F\" '{print $2}')
n=1
if [ ! -e "$LOGF" ] ; then
echo "ERROR: Cannot find $LOGF"
exit 0
fi
SDATE=$(grep -m1 "\- \[Spamfilter\]" $LOGF|awk '{print $2" "$3}')
# total hits in logfile (all filters)
HC=$(grep -Ec "\- \[Spamfilter\]" $LOGF)
# dump unique patterns from log into temp file
grep -E "\- \[Spamfilter\]" $LOGF|grep -Eo "matches filter '.*':"|awk -F\' '{print $2}'|sort|uniq > $TEMPF
# unique patterns found in log
UCOUNT=$(sed -n '$=' $TEMPF)
echo "# Created: $STAMP" > $OUTF
echo "# IRCd: $SNAME" >> $OUTF
echo "" >> $OUTF
echo "- $LOGF has $HC total spamfilter hits," >> $OUTF
echo "- from $UCOUNT unique patterns, since $SDATE" >> $OUTF
echo "- Prefix legend: [*] = active, [!] = not active" >> $OUTF
echo "" >> $OUTF
{
cat $TEMPF|while read pattern ; do
PCOUNT=$(grep -Fc "$pattern" $LOGF)
LDATE=$(grep -F "$pattern" $LOGF|tail -n1|awk '{print $2" "$3" "$5}')
PACTIVE=$(grep -Fm1 "$pattern" $CONF)
[ "$PACTIVE" ] && TAG=[*] || TAG=[!]
echo ""$n"a"$TAG". pattern: $pattern"
echo ""$n"b"$TAG". hits: ${PCOUNT:-0} [last: ${LDATE:-n/a]}"
echo
n=$(($n + 1))
done
} >> $OUTF
cat $OUTF
rm -f $OUTF
rm -f $TEMPF
exit 0