Link to home
Start Free TrialLog in
Avatar of enthuguy
enthuguyFlag for Australia

asked on

shell script to read line by line and search for a string

Hi,

would like to have a shell script which should read a huge file line by line and search for two  strings. If it finds either one of the string then it has to write to another file

1. Shell script to accept source file name an argument.
2. Search two strings, If string1 is in current line then write it to a file. (No need to need search for string2 on the same line)
3. If string2 is in current line, then append it to the same above file

string1 and string2 cannot be on the same line.

Thanks in advance.
Avatar of farzanj
farzanj
Flag of Canada image

Try this:

USAGE: ./scriptname keyword1 keyword2 sourceFile targetFile

#!/bin/bash

FILE1=$3
FILE2=$4
KEY1=$1
KEY2=$2

cat $FILE1 | while read line
do
     if (( $(echo $line | grep -Ec "$KEY1|$KEY2") > 0 ))
     then
             echo $line >> $FILE2
     fi
done

Open in new window

grep $1 'string1\|string1' >> file
Nice and quick solution Ozo except that it doesn't really read the file line by line :(
grep works line by line
Strictly following requirements:

Usage findstrings.s Input_file

where findstrings.sh is:
#!/bin/sh

infile=$1
match1="string1_to_match"
match2="2nd string"
outfile="output.log"

while read line
do
  if echo $line | grep "match1"
  then
    :
  else
    echo $line | grep "$match2"
  fi
done < $infile > $outfile
@Ozo:  Here is what I understand.    It works on Boyer-Moore algorithm, which doesn't work line by line.
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The original writer of GNU grep utility says that it does NOT work line by line.
Here is the post.
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Do you have a reference that shows that grep actually works line by line?
man grep

DESCRIPTION
       Grep  searches  the named input FILEs (or standard input if no files are named, or the file
       name - is given) for lines containing a match to  the  given  PATTERN.
Yes, that does not tell about the actual algorithm.  You are a big expert.  I don't have to tell you that you can find the keywords and then print the lines that contain it.  This statement is simply telling about the output not the algorithm.  The references I gave you the actual algorithms and one contains the statement of the person who actually programmed GNU grep.
I thought the question was a request for a specified output, not a request for a specified string matching algorithm.
(which none of the other answers has supplied either)
@Ozo:  We look up to you.  Could you then show what would the right solution that would satisfy the requirements.  Many thanks.
It looks to me like all the answers satisfy the requirements.
The requirements do not say that the two strings and output file should be accepted as arguments,
but it does not explicitly forbid it either.

I do see two potential ambiguities in the question.
Are we meant to make a distinction between a "write" when string1 is found, and an "append" when string2 is found?
It is also not entirely clear whether "string1 and string2 cannot be on the same line." is a statement about the source file, or about the desired output.
If it is about the output, then how to deal with string1 and string2 on the same line in the input is unclear.
Avatar of enthuguy

ASKER

Thanks all for your suggestions and input.

Will try this tomo at work and update.

Thanks again
Thanks all

to clarify. string1 and string2 cannot be on the same line. sorry if I had confused you guys.

to easy readability...would like to have 2 separate log files for each string found.

Below is what I extended from above script given by ozo but for some reason it couldn't proceed further "while read line"  and it hangs

Please help

==================
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

echo $match1
echo $match2

while read line
do
if  echo $line | grep "$match1"
then
echo $line
echo $line >> $outfile1
elif  echo $line | grep "$match2"
then
echo $line
#echo $line >> $outfile2
else
echo "Did not find matching string"
fi
done

echo "Finished"
==================
ASKER CERTIFIED SOLUTION
Avatar of simon3270
simon3270
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
apologies for referring wrong name

But thanks so much, I think I'm good for now.

One last request. if I would like to add a counter....say how many lines were found for string1 and string2 at the end of the file. how to achieve this?

btw, I will close this question anyway but if you could help on last counter thing....that would be great.

thanks again
You can keep count of lines several ways, but one is to add:
    m1count=0
    m2count=0
before the "while" line, then around the lines where you echo the matching lines to the output files, add this (the spaces round the "+" are important)
    m1count=$(expr $m1count + 1)
immediately after
    echo $line >> $outfile1
(i.e. between that line and the "elif" line)
That's an old-fashioned way of doing maths - later shells (e.g. bash) allow things like:
    ((m2count=m2count+1))
(unlike the "expr" line, you don't need spaces round the symbols).

Then after the "done < $infile" line, have something like:

    echo Found $m1count lines with \"$match1\"
    echo Found $m2count lines with \"$match2\"

One last thing - if your "match" string may start with a hyphen (I just tried searching for the string "-v"), change the grep lines to:

    if  echo $line | grep -q -- "$match1"

The "--" tells grep (and almost all GNU programs) that you have stopped giving options, and everything after the "--" is to be treated as text arguments to the command.

I've also added "> $outfile1" etc to empty out the files - otherwise they will just get bigger every time you run the script.  If you don't mind that, and want to keep old records too, just omit those lines.  The script here will report the number of lines it has added in this run.

So, the final script looks like:
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

m1count=0
m2count=0

> $outfile1
> $outfile2

while read line
do
    if  echo $line | grep -q -- "$match1" 
    then
        echo $line >> $outfile1
        m1count=$expr($m1count + 1)
    elif  echo $line | grep -q -- "$match2" 
    then
        echo $line >> $outfile2
        ((m2count = m2count + 1))
    fi
done < $infile

echo Found $m1count instances of \"$match1\"
echo Found $m2count instances of \"$match2\"

echo "Finished"

Open in new window


One things about this code - it will be painfully slow on big files - ozo's version will be *much* quicker!  You can still get things like line counts - just do "wc -l < $outfile1".
BTW, I had a quick look at the GNU grep source - it will normally search on large buffers, but some options (e.g. "-i" to ignore case) will force it to search line by line.  I think it uses Boyer-Moore in both cases.

Other greps (BSD, UNIX, Solaris) may well still have line-by-line searches.