Solved

Could not extract file contents using awk and regex

Posted on 2012-03-13
29
465 Views
Last Modified: 2012-03-23
My file is called /exports/tmp/ip789 and its content are pasted below:

# ipbackup
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=""
HOME=/var/lib/backup

14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini
*/2 * * * * /exports/tmp/gamer.sh
-----------------------------------------------------------------------
u can see that the file has many lines.
I get the following output when i execute " awk ' $0 ~ /^(14) /' /exports/tmp/ip789"

14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini


But the output when i execute  "awk ' $0 ~ /^(1) /' /exports/tmp/ip789"
is nothing....

any idea?
0
Comment
Question by:pvinodp
  • 13
  • 11
  • 5
29 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 313 total points
ID: 37713979
Seems that in your second version you're searching for "1 " (i.e. a "1" followed by a space) which is obviously not present in your file (at least not at the start of a line).

"awk ' $0 ~ /^(1)/' /exports/tmp/ip789"

wmp
0
 

Author Comment

by:pvinodp
ID: 37715366
awk ' $0 ~ /^([0-9]{2,2}|[*]).*ini$/ ' /exports/tmp/ip789

is returnign nothing:


But it seems to be correct when i checked the regex in linux regex-editor
I need it to match teh pattern where the start of the line could be 2 digits or a * and th eline should end with $
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37715469
The latter statement should indeed work perfectly on the data you posted in your Q.

Could it be that there are whitespace characters following ".ini" ?

In this case the regex would fail.

Better this way then:

awk ' $0 ~ /^([0-9]{2,2}|[*]).*ini[ ]{0,}$/ ' ...
0
 

Author Comment

by:pvinodp
ID: 37715539
i opened the file in vi editor and set list to find that there is a $ at the end of ini.... so no space after that.

what would be your command to filter the lines which have decimal or * at the begining and end with .ini?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37715613
Strange.

As I already said - your command is perfect for that kind of filtering, so I don't think I could give you a better one.

I just tested it here, and indeed, it works for me.
0
 

Author Comment

by:pvinodp
ID: 37715882
Still,,

do you have any other solution..
any help with grep or sed??
would be great help..
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37716380
"grep -E" or "egrep" work just the same way.

grep -E  "^([0-9]{2,2}|[*]).*ini$" /exports/tmp/ip789

With standard "grep" it's a bit more complicated:

grep -e "^[0-9]\{2,2\}" -e "^[*]"  /exports/tmp/ip789 | grep "ini$"

There is no real "and" in grep.

And with "sed" it's almost the same. The problem is always the "and" conjunction:

sed -n "/^[0-9]\{2,2\}/p;/^[*]/p"  /exports/tmp/ip789 | sed -n "/ini$/p"

All the above work (tested).
0
 

Author Comment

by:pvinodp
ID: 37718510
all the commands u sent are giving the same out put:
14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini

but does not show the the line:

*/2 * * * * /exports/tmp/gamer.sh
0
 

Author Comment

by:pvinodp
ID: 37718569
neglect my last post.

I need a a filter to get only teh task definition from crontab -l.
grep -E  "^([0-9]{2,2}|[*])( )([0-9]{2,2}|[*])" /exports/tmp/ip789

does not filter the line starting with*/2..
0
 
LVL 84

Expert Comment

by:ozo
ID: 37718706
*/2 * * * * /exports/tmp/gamer.sh
does not end with "ini"
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37718720
grep -E  "^([0-9]{2,2}|[*])( )([0-9]{2,2}|[*])" /exports/tmp/ip789
*/2 * * * * /exports/tmp/gamer.sh
has a / after the *, not a space
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37718831
As I wrote,

my "egrep" version works even with a slash following the asterisk, but of course (as ozo pointed out) only for lines ending with "ini".

grep -E  "^([0-9]{2,2}|[*]).*ini$" /exports/tmp/ip789

In case you want to see all job entries:

grep -E  "^([0-9]{1,2}|[*]).*" /exports/tmp/ip789

Please note that in this latter version the "minute" part can consist of one or two digits (besides the asterisk stuff, of course), which might better reflect the givens in a real crontab.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37718887
grep -E  "^([0-9]{1,2}|[*]).*"
matches exactly the same lines as does
grep -E  "^[0-9*]"

If you want to see all cron job entries, some versions of cron also allow strings like
@daily
or
@hourly
for the time and date fields, and entries may have leading spaces and tabs
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37718904
Yep,

with this new font here at EE it's a bit hard (for writers as well as for readers) to verify whether there's a space somewhere or not.

grep -E  "^([0-9]{1,2} |[*]).*"

Now it's there.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 84

Expert Comment

by:ozo
ID: 37719122
But now it fails on
14/2
 or
14,16
or
14-16
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37719195
grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*" /exports/tmp/ip789

Please note that inside the first square bracket pair there is a space and a TAB.

This might work as well somewhere, but not in my shell:

grep -E  "^([ \t]{0,}[0-9]{1,2}[ ,/-]|[*@]).*" /exports/tmp/ip789
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719203
ozo,

please don't mention fcrontab. Please!
0
 

Author Comment

by:pvinodp
ID: 37719277
so if the user enter @daily or @monthly... then y pattern will fail to get the tasks.
Is there a pattern to fetch all valid tasks in crontab -l output ?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719280
Did you try my very last suggestion?

For crontab -l :

crontab -l | grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"
0
 

Author Comment

by:pvinodp
ID: 37719377
yes.. it works.
but I am not yet sure on how to check if it can filter all valid task definition.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719391
See man 5 crontab.

Create a test file containing all the formats mentioned there, and check.
0
 

Author Comment

by:pvinodp
ID: 37719938
My final filter command is :
 crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' | awk ' $0 !~ / backup \/usr\/lib\/backup / '

I call this inside a perl file
execute_command(crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' | awk ' $0 !~ / backup \/usr\/lib\/backup / ');

But what I see is that because of the presence of $0 in awk filter the output is not as expected... How do i go ahead?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719970
Use grep instead of awk:

crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' |grep -v "backup /usr/lib/backup"
0
 

Author Comment

by:pvinodp
ID: 37720039
@wmp
In the command: crontab -l | grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"

what is between [ and ]   , is it a space or a tab??
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37720086
See comment #37719195 above.

It's a TAB.
0
 

Author Comment

by:pvinodp
ID: 37723546
In your comment you said "Please note that inside the first square bracket pair there is a space and a TAB."

What is the purpose of having both tab and space?
And why are you not advising to use [\s] rather?
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37723561
your grep may interpret that as matching either the character \ or the character s
but your grep may recognize [[:space:]]
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37723771
Correct,

[[:space:]] or [[:blank:]] should work!

crontab -l | grep -E  "^([ [:space:]]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"

"[[:space:]]" means all "whitespace" characters, which includes TABs, and other "invisible" characters, like vertical TAB, LF etc. [[:blank:]] means just space and TAB.

>> What is the purpose of having both tab and space? <<

That's because a crontab entry might well have leading tabs, not only leading spaces.

>> why are you not advising to use [\s] rather? <<

That's because it doesn't work with my grep. Try it!
0
 

Author Closing Comment

by:pvinodp
ID: 37760058
thanks a lot for your inputs
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now