Solved

shell script to read line by line and search for a string

Posted on 2012-03-11
17
586 Views
Last Modified: 2012-03-12
Hi,

would like to have a shell script which should read a huge file line by line and search for two  strings. If it finds either one of the string then it has to write to another file

1. Shell script to accept source file name an argument.
2. Search two strings, If string1 is in current line then write it to a file. (No need to need search for string2 on the same line)
3. If string2 is in current line, then append it to the same above file

string1 and string2 cannot be on the same line.

Thanks in advance.
0
Comment
Question by:enthuguy
  • 5
  • 5
  • 4
  • +1
17 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 37706986
Try this:

USAGE: ./scriptname keyword1 keyword2 sourceFile targetFile

#!/bin/bash

FILE1=$3
FILE2=$4
KEY1=$1
KEY2=$2

cat $FILE1 | while read line
do
     if (( $(echo $line | grep -Ec "$KEY1|$KEY2") > 0 ))
     then
             echo $line >> $FILE2
     fi
done

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 37707343
grep $1 'string1\|string1' >> file
0
 
LVL 31

Expert Comment

by:farzanj
ID: 37707514
Nice and quick solution Ozo except that it doesn't really read the file line by line :(
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 84

Expert Comment

by:ozo
ID: 37707542
grep works line by line
0
 
LVL 19

Expert Comment

by:simon3270
ID: 37707838
Strictly following requirements:

Usage findstrings.s Input_file

where findstrings.sh is:
#!/bin/sh

infile=$1
match1="string1_to_match"
match2="2nd string"
outfile="output.log"

while read line
do
  if echo $line | grep "match1"
  then
    :
  else
    echo $line | grep "$match2"
  fi
done < $infile > $outfile
0
 
LVL 31

Expert Comment

by:farzanj
ID: 37708163
@Ozo:  Here is what I understand.    It works on Boyer-Moore algorithm, which doesn't work line by line.
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The original writer of GNU grep utility says that it does NOT work line by line.
Here is the post.
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Do you have a reference that shows that grep actually works line by line?
0
 
LVL 84

Expert Comment

by:ozo
ID: 37708177
man grep

DESCRIPTION
       Grep  searches  the named input FILEs (or standard input if no files are named, or the file
       name - is given) for lines containing a match to  the  given  PATTERN.
0
 
LVL 31

Expert Comment

by:farzanj
ID: 37708185
Yes, that does not tell about the actual algorithm.  You are a big expert.  I don't have to tell you that you can find the keywords and then print the lines that contain it.  This statement is simply telling about the output not the algorithm.  The references I gave you the actual algorithms and one contains the statement of the person who actually programmed GNU grep.
0
 
LVL 84

Expert Comment

by:ozo
ID: 37708207
I thought the question was a request for a specified output, not a request for a specified string matching algorithm.
(which none of the other answers has supplied either)
0
 
LVL 31

Expert Comment

by:farzanj
ID: 37708212
@Ozo:  We look up to you.  Could you then show what would the right solution that would satisfy the requirements.  Many thanks.
0
 
LVL 84

Expert Comment

by:ozo
ID: 37708260
It looks to me like all the answers satisfy the requirements.
The requirements do not say that the two strings and output file should be accepted as arguments,
but it does not explicitly forbid it either.

I do see two potential ambiguities in the question.
Are we meant to make a distinction between a "write" when string1 is found, and an "append" when string2 is found?
It is also not entirely clear whether "string1 and string2 cannot be on the same line." is a statement about the source file, or about the desired output.
If it is about the output, then how to deal with string1 and string2 on the same line in the input is unclear.
0
 

Author Comment

by:enthuguy
ID: 37708288
Thanks all for your suggestions and input.

Will try this tomo at work and update.

Thanks again
0
 

Author Comment

by:enthuguy
ID: 37709130
Thanks all

to clarify. string1 and string2 cannot be on the same line. sorry if I had confused you guys.

to easy readability...would like to have 2 separate log files for each string found.

Below is what I extended from above script given by ozo but for some reason it couldn't proceed further "while read line"  and it hangs

Please help

==================
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

echo $match1
echo $match2

while read line
do
if  echo $line | grep "$match1"
then
echo $line
echo $line >> $outfile1
elif  echo $line | grep "$match2"
then
echo $line
#echo $line >> $outfile2
else
echo "Did not find matching string"
fi
done

echo "Finished"
==================
0
 
LVL 19

Accepted Solution

by:
simon3270 earned 300 total points
ID: 37709220
It was my script, rather than ozo's, but never mind.

The near-last line should be

    done < $infile

to read from that file.

Also, the hiddne trick in mine was that the grep lines did two things - it reutrned success if it foudn the line, and output the actual line.  You should change the if lines to be
    if  echo $line | grep -q "$match1"
The "-q" means that grep does the search, but doesn't output any matching lines.  Your "echo" a couple of lines lower does that.
0
 

Author Comment

by:enthuguy
ID: 37709271
apologies for referring wrong name

But thanks so much, I think I'm good for now.

One last request. if I would like to add a counter....say how many lines were found for string1 and string2 at the end of the file. how to achieve this?

btw, I will close this question anyway but if you could help on last counter thing....that would be great.

thanks again
0
 
LVL 19

Expert Comment

by:simon3270
ID: 37710954
You can keep count of lines several ways, but one is to add:
    m1count=0
    m2count=0
before the "while" line, then around the lines where you echo the matching lines to the output files, add this (the spaces round the "+" are important)
    m1count=$(expr $m1count + 1)
immediately after
    echo $line >> $outfile1
(i.e. between that line and the "elif" line)
That's an old-fashioned way of doing maths - later shells (e.g. bash) allow things like:
    ((m2count=m2count+1))
(unlike the "expr" line, you don't need spaces round the symbols).

Then after the "done < $infile" line, have something like:

    echo Found $m1count lines with \"$match1\"
    echo Found $m2count lines with \"$match2\"

One last thing - if your "match" string may start with a hyphen (I just tried searching for the string "-v"), change the grep lines to:

    if  echo $line | grep -q -- "$match1"

The "--" tells grep (and almost all GNU programs) that you have stopped giving options, and everything after the "--" is to be treated as text arguments to the command.

I've also added "> $outfile1" etc to empty out the files - otherwise they will just get bigger every time you run the script.  If you don't mind that, and want to keep old records too, just omit those lines.  The script here will report the number of lines it has added in this run.

So, the final script looks like:
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

m1count=0
m2count=0

> $outfile1
> $outfile2

while read line
do
    if  echo $line | grep -q -- "$match1" 
    then
        echo $line >> $outfile1
        m1count=$expr($m1count + 1)
    elif  echo $line | grep -q -- "$match2" 
    then
        echo $line >> $outfile2
        ((m2count = m2count + 1))
    fi
done < $infile

echo Found $m1count instances of \"$match1\"
echo Found $m2count instances of \"$match2\"

echo "Finished"

Open in new window


One things about this code - it will be painfully slow on big files - ozo's version will be *much* quicker!  You can still get things like line counts - just do "wc -l < $outfile1".
0
 
LVL 19

Expert Comment

by:simon3270
ID: 37711000
BTW, I had a quick look at the GNU grep source - it will normally search on large buffers, but some options (e.g. "-i" to ignore case) will force it to search line by line.  I think it uses Boyer-Moore in both cases.

Other greps (BSD, UNIX, Solaris) may well still have line-by-line searches.
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unix, date minus 1, 2 or 3 days (last working day) 11 69
SQL Insert parts by customer 12 43
youtube blocking politics 4 59
AWK:  END { statements } 2 19
This article will show, step by step, how to integrate R code into a R Sweave document
Utilizing an array to gracefully append to a list of EmailAddresses
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to moveā€¦

821 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question