[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Linux: Count lines in file that begin with

Posted on 2013-11-15
18
Medium Priority
?
391 Views
Last Modified: 2013-11-18
My file is in this format:
Nov 10 04:03:00 Consectetur adipiscing elit. Mauris luctus, nulla eu pellentesque interdum, arcu quam elementum magna
Nov 10 04:03:07 Sollicitudin scelerisque magna lacus sit amet magna. Nunc iaculis arcu a egestas rutrum. 
Nov 10 04:04:01 Nulla quis feugiat dolor, vitae ultrices quam. In cursus eu est id luctus. Cras at velit eleifend
Nov 10 04:04:01 Tincidunt felis sed, elementum quam. Donec sit amet nisi vulputate, lobortis arcu ut, mattis tellus. 

Open in new window

It is located here:
/home/xyz/my.log

How can I count the number of lines that begin with the date and time stamp that matches one hour or less from now?
0
Comment
Question by:hankknight
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 6
  • 2
  • +1
18 Comments
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39651737
Again, a nice task.

input="/home/xyz/my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
while read line; do
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s")
   [[ $logdate -ge $start ]] && ((i+=1))
done < $input
echo "Lines in $input not older than 1 hour:" $i
0
 
LVL 21

Assisted Solution

by:Mazdajai
Mazdajai earned 200 total points
ID: 39651741
try

grep -c "`date | awk '{print $2,$3,$4}' |awk -F: '{print $1}'`" /home/xyz/my.log

Open in new window

0
 
LVL 16

Author Comment

by:hankknight
ID: 39652002
Thank you both.

I like the simplicity of Mazdajai's solution however there is a problem.  Instead of giving the results for the past 60 minutes it returns results based on the value of the current hour.

For example at 11:59 your solution returns 5000 results and at 12:00 your solution returns 0 results.

I want a solution that returns results based on the past hour, not the value of the current hour.

woolmilkporc, your idea gives me a syntax error.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39652125
Please, what is the exact error message?

I can guess a lot, but not everything.
0
 
LVL 16

Author Comment

by:hankknight
ID: 39652140
syntax error near unexpected token `
line 5: `   [[ $logdate -ge $start ]] && ((i+=1))

Open in new window

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39652150
I don't have such a backtick  ` in front of "[[ $logdate ... ."

Where does it come from?
0
 
LVL 16

Author Comment

by:hankknight
ID: 39654678
woolmilkporc, the problem must be on my end.  All the other code you have provided for other questions works perfectly.

That said, I still prefer Mazdajai's approach here:
grep -c "`date | awk '{print $2,$3,$4}' |awk -F: '{print $1}'`" /home/xyz/my.log

Open in new window

How can that code be adjusted to include all records from the past 60 minutes?  Maybe the date needs to be converted into a traditional timestamp on every line before processing it?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39654683
>> Maybe the date needs to be converted into a traditional timestamp <<

That's what I did. Looking forward to see Mazdajai's solution.

Here is my version again, this time in "code" format  which might be better for copy and paste, and again, there is no backtick anywhere:

input="/home/xyz/my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
while read line; do
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s")
   [[ $logdate -ge $start ]] && ((i+=1))
done < $input
echo "Lines in $input not older than 1 hour:" $i 

Open in new window

0
 
LVL 16

Author Comment

by:hankknight
ID: 39654856
This can return (inaccurate) results almost instantly even for very large files (10+ gigs)
grep -c "`date | awk '{print $2,$3,$4}' |awk -F: '{print $1}'`" /home/xyz/my.log

Open in new window

The code you posted can take more than 5 minutes.  Can the performance of your solution be enhanced?  

IDEA 1: Log entries are sequential.  Break the loop as soon as a non-matching line is found.
IDEA 2: Before starting the loop, check every 100th result and cut the file as soon as an older match is found
IDEA 3: Use Mazdajai's code to get all results from this hour and from last hour then use your approach to get the results for this hour only
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39655005
What exactly do you mean with "Log entries are sequential"? Do you mean that new records are added at the top of the file? I strongly doubt this.

If (as ususal) new entries are added at the end of the file you can try (following your IDEA 1):
input="my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
tac $input | while read line; do
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s")
  if [[ $logdate -ge $start ]]; then ((i+=1))
   else break
  fi
done
echo "Lines in $input younger than 1 hour:" $i

Open in new window

0
 
LVL 21

Expert Comment

by:Mazdajai
ID: 39655011
Try..

awk '
        BEGIN {
                cmd="date -d \"1 hour hour ago\" +%H:%M:%S"
                cmd | getline onehourago
                close(cmd)
        }
        {
                cmd="date -d \""$3"\" +%H:%M:%S"
                cmd | getline d
                close(cmd)
                if (d > onehourago) print
        }
' /home/xyz/my.log|wc -l 

Open in new window

0
 
LVL 16

Author Comment

by:hankknight
ID: 39655047
Both your ideas return 0 even when there are entries.  

New entries are always added to the bottom.  So I guess we would need to start at the bottom of the file and work our way up and then break when the time condition is not met.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39655059
That's exactly what I tried to do with "tac". Are you sure that your file's contents are OK?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39655099
Or is there an empty line at the end? Anyway, please try
input="/home/xyz/my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
tac $input | while read line; do
  if ! echo $line | grep -Eq "^[a-zA-Z]" ; then continue; fi 
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s")
  if [[ $logdate -ge $start ]]; then ((i+=1))
     echo $i > /tmp/counter.$$
   else break
  fi
done
echo "Lines in $input younger than 1 hour:" $(</tmp/counter.$$)
rm /tmp/counter.$$

Open in new window

Please note that I made some modification in the handling of the counter variable (use a counter file instead) to work around an old bash blur.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39655269
This avoids using an external counter file:
input="/home/xyz/my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
while read line; do
  if ! echo $line | grep -Eq "^[a-zA-Z]" ; then continue; fi 
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s")
  if [[ $logdate -ge $start ]]; then ((i+=1))
   else break
  fi
done <<< $(tac $input)
echo "Lines in $input younger than 1 hour:" $i

Open in new window

Note "<<<" is not a typo!
0
 
LVL 14

Accepted Solution

by:
comfortjeanius earned 1000 total points
ID: 39655318
#!/bin/bash

#read the current date/time in unixtime format
NOW=$( date +"%s" )

while read MONTH DAY TIME REST
do
    #change the read date/time to unixtime
    THEN=$( date -d "$MONTH $DAY $TIME" +"%s" )

    #if the difference is less than 1 hour (=3600 seconds), increase the count
    [[ $(( NOW - THEN )) -lt 3600 ]] && (( ++COUNT ))

done <"/home/xyz/my.log"

echo "$COUNT"

Open in new window

0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 800 total points
ID: 39657391
Facing the enlightenments resulting from your last questions here comes my next version:

#!/bin/bash
input="/home/xyz/my.log"
start=$(date -d "1 hour ago" "+%s"); i=0
while read line; do
  if ! echo $line | grep -Eq "^[a-zA-Z]" ; then continue; fi 
  logdate=$(date -d "$(echo $line | awk '{print $1, $2, $3}')" "+%s" 2>/dev/null)
  [[ $? -ne 0 ]] && continue
  if [[ $logdate -ge $start ]]; then ((i+=1))
   else break
  fi
done <<< "$(tac $input)"
echo "Lines in $input younger than 1 hour:" $i

Open in new window

It should be really fast, by the way, because the loop is terminated as soon as too old a record is encountered.
0
 
LVL 16

Author Closing Comment

by:hankknight
ID: 39657646
Thank you all.  Here is the code I will use.  It uses parts from all your comments however it executes faster than any single solution provided.
#!/bin/bash
t1=$( date "+%s.%N" )
NOW=$( date +"%s" )
while read MONTH DAY TIME DATA
do
    THEN=$( date -d "$MONTH $DAY $TIME" +"%s" 2>/dev/null )
      if [ $(( NOW - THEN )) -gt 3600 ];
       then break
      else
        COUNT=$((COUNT+1))
      fi
done <<< "$(tac $1)"
t2=$( date "+%s.%N" )
DIFF=$(echo "scale=3; ($t2 - $t1)/1" | bc)
echo "Execution time: $DIFF seconds"
echo "Lines younger than 1 hour:" $COUNT

Open in new window

0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have a server on collocation with the super-fast CPU, that doesn't mean that you get it running at full power. Here is a preamble. When doing inventory of Linux servers, that I'm administering, I've found that some of them are running on l…
In part one, we reviewed the prerequisites required for installing SQL Server vNext. In this part we will explore how to install Microsoft's SQL Server on Ubuntu 16.04.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question