Solved

shell script to read line by line and search for a string

Posted on 2012-03-11
17
547 Views
Last Modified: 2012-03-12
Hi,

would like to have a shell script which should read a huge file line by line and search for two  strings. If it finds either one of the string then it has to write to another file

1. Shell script to accept source file name an argument.
2. Search two strings, If string1 is in current line then write it to a file. (No need to need search for string2 on the same line)
3. If string2 is in current line, then append it to the same above file

string1 and string2 cannot be on the same line.

Thanks in advance.
0
Comment
Question by:enthuguy
  • 5
  • 5
  • 4
  • +1
17 Comments
 
LVL 31

Expert Comment

by:farzanj
Comment Utility
Try this:

USAGE: ./scriptname keyword1 keyword2 sourceFile targetFile

#!/bin/bash

FILE1=$3
FILE2=$4
KEY1=$1
KEY2=$2

cat $FILE1 | while read line
do
     if (( $(echo $line | grep -Ec "$KEY1|$KEY2") > 0 ))
     then
             echo $line >> $FILE2
     fi
done

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
grep $1 'string1\|string1' >> file
0
 
LVL 31

Expert Comment

by:farzanj
Comment Utility
Nice and quick solution Ozo except that it doesn't really read the file line by line :(
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
grep works line by line
0
 
LVL 19

Expert Comment

by:simon3270
Comment Utility
Strictly following requirements:

Usage findstrings.s Input_file

where findstrings.sh is:
#!/bin/sh

infile=$1
match1="string1_to_match"
match2="2nd string"
outfile="output.log"

while read line
do
  if echo $line | grep "match1"
  then
    :
  else
    echo $line | grep "$match2"
  fi
done < $infile > $outfile
0
 
LVL 31

Expert Comment

by:farzanj
Comment Utility
@Ozo:  Here is what I understand.    It works on Boyer-Moore algorithm, which doesn't work line by line.
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The original writer of GNU grep utility says that it does NOT work line by line.
Here is the post.
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Do you have a reference that shows that grep actually works line by line?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
man grep

DESCRIPTION
       Grep  searches  the named input FILEs (or standard input if no files are named, or the file
       name - is given) for lines containing a match to  the  given  PATTERN.
0
 
LVL 31

Expert Comment

by:farzanj
Comment Utility
Yes, that does not tell about the actual algorithm.  You are a big expert.  I don't have to tell you that you can find the keywords and then print the lines that contain it.  This statement is simply telling about the output not the algorithm.  The references I gave you the actual algorithms and one contains the statement of the person who actually programmed GNU grep.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 84

Expert Comment

by:ozo
Comment Utility
I thought the question was a request for a specified output, not a request for a specified string matching algorithm.
(which none of the other answers has supplied either)
0
 
LVL 31

Expert Comment

by:farzanj
Comment Utility
@Ozo:  We look up to you.  Could you then show what would the right solution that would satisfy the requirements.  Many thanks.
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
It looks to me like all the answers satisfy the requirements.
The requirements do not say that the two strings and output file should be accepted as arguments,
but it does not explicitly forbid it either.

I do see two potential ambiguities in the question.
Are we meant to make a distinction between a "write" when string1 is found, and an "append" when string2 is found?
It is also not entirely clear whether "string1 and string2 cannot be on the same line." is a statement about the source file, or about the desired output.
If it is about the output, then how to deal with string1 and string2 on the same line in the input is unclear.
0
 

Author Comment

by:enthuguy
Comment Utility
Thanks all for your suggestions and input.

Will try this tomo at work and update.

Thanks again
0
 

Author Comment

by:enthuguy
Comment Utility
Thanks all

to clarify. string1 and string2 cannot be on the same line. sorry if I had confused you guys.

to easy readability...would like to have 2 separate log files for each string found.

Below is what I extended from above script given by ozo but for some reason it couldn't proceed further "while read line"  and it hangs

Please help

==================
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

echo $match1
echo $match2

while read line
do
if  echo $line | grep "$match1"
then
echo $line
echo $line >> $outfile1
elif  echo $line | grep "$match2"
then
echo $line
#echo $line >> $outfile2
else
echo "Did not find matching string"
fi
done

echo "Finished"
==================
0
 
LVL 19

Accepted Solution

by:
simon3270 earned 300 total points
Comment Utility
It was my script, rather than ozo's, but never mind.

The near-last line should be

    done < $infile

to read from that file.

Also, the hiddne trick in mine was that the grep lines did two things - it reutrned success if it foudn the line, and output the actual line.  You should change the if lines to be
    if  echo $line | grep -q "$match1"
The "-q" means that grep does the search, but doesn't output any matching lines.  Your "echo" a couple of lines lower does that.
0
 

Author Comment

by:enthuguy
Comment Utility
apologies for referring wrong name

But thanks so much, I think I'm good for now.

One last request. if I would like to add a counter....say how many lines were found for string1 and string2 at the end of the file. how to achieve this?

btw, I will close this question anyway but if you could help on last counter thing....that would be great.

thanks again
0
 
LVL 19

Expert Comment

by:simon3270
Comment Utility
You can keep count of lines several ways, but one is to add:
    m1count=0
    m2count=0
before the "while" line, then around the lines where you echo the matching lines to the output files, add this (the spaces round the "+" are important)
    m1count=$(expr $m1count + 1)
immediately after
    echo $line >> $outfile1
(i.e. between that line and the "elif" line)
That's an old-fashioned way of doing maths - later shells (e.g. bash) allow things like:
    ((m2count=m2count+1))
(unlike the "expr" line, you don't need spaces round the symbols).

Then after the "done < $infile" line, have something like:

    echo Found $m1count lines with \"$match1\"
    echo Found $m2count lines with \"$match2\"

One last thing - if your "match" string may start with a hyphen (I just tried searching for the string "-v"), change the grep lines to:

    if  echo $line | grep -q -- "$match1"

The "--" tells grep (and almost all GNU programs) that you have stopped giving options, and everything after the "--" is to be treated as text arguments to the command.

I've also added "> $outfile1" etc to empty out the files - otherwise they will just get bigger every time you run the script.  If you don't mind that, and want to keep old records too, just omit those lines.  The script here will report the number of lines it has added in this run.

So, the final script looks like:
#!/bin/sh

infile=$1
match1="success for login id"
match2="Incorrect password for login id"
outfile1="success_login.log"
outfile2="Incorrect_pwd.log"

m1count=0
m2count=0

> $outfile1
> $outfile2

while read line
do
    if  echo $line | grep -q -- "$match1" 
    then
        echo $line >> $outfile1
        m1count=$expr($m1count + 1)
    elif  echo $line | grep -q -- "$match2" 
    then
        echo $line >> $outfile2
        ((m2count = m2count + 1))
    fi
done < $infile

echo Found $m1count instances of \"$match1\"
echo Found $m2count instances of \"$match2\"

echo "Finished"

Open in new window


One things about this code - it will be painfully slow on big files - ozo's version will be *much* quicker!  You can still get things like line counts - just do "wc -l < $outfile1".
0
 
LVL 19

Expert Comment

by:simon3270
Comment Utility
BTW, I had a quick look at the GNU grep source - it will normally search on large buffers, but some options (e.g. "-i" to ignore case) will force it to search line by line.  I think it uses Boyer-Moore in both cases.

Other greps (BSD, UNIX, Solaris) may well still have line-by-line searches.
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

This is an explanation of a simple data model to help parse a JSON feed
Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now