Solved

Could not extract file contents using awk and regex

Posted on 2012-03-13
29
479 Views
Last Modified: 2012-03-23
My file is called /exports/tmp/ip789 and its content are pasted below:

# ipbackup
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=""
HOME=/var/lib/backup

14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini
*/2 * * * * /exports/tmp/gamer.sh
-----------------------------------------------------------------------
u can see that the file has many lines.
I get the following output when i execute " awk ' $0 ~ /^(14) /' /exports/tmp/ip789"

14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini


But the output when i execute  "awk ' $0 ~ /^(1) /' /exports/tmp/ip789"
is nothing....

any idea?
0
Comment
Question by:pvinodp
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 13
  • 11
  • 5
29 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 313 total points
ID: 37713979
Seems that in your second version you're searching for "1 " (i.e. a "1" followed by a space) which is obviously not present in your file (at least not at the start of a line).

"awk ' $0 ~ /^(1)/' /exports/tmp/ip789"

wmp
0
 

Author Comment

by:pvinodp
ID: 37715366
awk ' $0 ~ /^([0-9]{2,2}|[*]).*ini$/ ' /exports/tmp/ip789

is returnign nothing:


But it seems to be correct when i checked the regex in linux regex-editor
I need it to match teh pattern where the start of the line could be 2 digits or a * and th eline should end with $
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37715469
The latter statement should indeed work perfectly on the data you posted in your Q.

Could it be that there are whitespace characters following ".ini" ?

In this case the regex would fail.

Better this way then:

awk ' $0 ~ /^([0-9]{2,2}|[*]).*ini[ ]{0,}$/ ' ...
0
Learn by Doing. Anytime. Anywhere.

Do you like to learn by doing?
Our labs and exercises give you the chance to do just that: Learn by performing actions on real environments.

Hands-on, scenario-based labs give you experience on real environments provided by us so you don't have to worry about breaking anything.

 

Author Comment

by:pvinodp
ID: 37715539
i opened the file in vi editor and set list to find that there is a $ at the end of ini.... so no space after that.

what would be your command to filter the lines which have decimal or * at the begining and end with .ini?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37715613
Strange.

As I already said - your command is perfect for that kind of filtering, so I don't think I could give you a better one.

I just tested it here, and indeed, it works for me.
0
 

Author Comment

by:pvinodp
ID: 37715882
Still,,

do you have any other solution..
any help with grep or sed??
would be great help..
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37716380
"grep -E" or "egrep" work just the same way.

grep -E  "^([0-9]{2,2}|[*]).*ini$" /exports/tmp/ip789

With standard "grep" it's a bit more complicated:

grep -e "^[0-9]\{2,2\}" -e "^[*]"  /exports/tmp/ip789 | grep "ini$"

There is no real "and" in grep.

And with "sed" it's almost the same. The problem is always the "and" conjunction:

sed -n "/^[0-9]\{2,2\}/p;/^[*]/p"  /exports/tmp/ip789 | sed -n "/ini$/p"

All the above work (tested).
0
 

Author Comment

by:pvinodp
ID: 37718510
all the commands u sent are giving the same out put:
14 00 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_113854.ini
14 11 * * * backup /usr/lib/backup --batch /var/lib/schedule/backup.SR2.1.0.13Mar12_114013.ini

but does not show the the line:

*/2 * * * * /exports/tmp/gamer.sh
0
 

Author Comment

by:pvinodp
ID: 37718569
neglect my last post.

I need a a filter to get only teh task definition from crontab -l.
grep -E  "^([0-9]{2,2}|[*])( )([0-9]{2,2}|[*])" /exports/tmp/ip789

does not filter the line starting with*/2..
0
 
LVL 84

Expert Comment

by:ozo
ID: 37718706
*/2 * * * * /exports/tmp/gamer.sh
does not end with "ini"
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37718720
grep -E  "^([0-9]{2,2}|[*])( )([0-9]{2,2}|[*])" /exports/tmp/ip789
*/2 * * * * /exports/tmp/gamer.sh
has a / after the *, not a space
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37718831
As I wrote,

my "egrep" version works even with a slash following the asterisk, but of course (as ozo pointed out) only for lines ending with "ini".

grep -E  "^([0-9]{2,2}|[*]).*ini$" /exports/tmp/ip789

In case you want to see all job entries:

grep -E  "^([0-9]{1,2}|[*]).*" /exports/tmp/ip789

Please note that in this latter version the "minute" part can consist of one or two digits (besides the asterisk stuff, of course), which might better reflect the givens in a real crontab.
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37718887
grep -E  "^([0-9]{1,2}|[*]).*"
matches exactly the same lines as does
grep -E  "^[0-9*]"

If you want to see all cron job entries, some versions of cron also allow strings like
@daily
or
@hourly
for the time and date fields, and entries may have leading spaces and tabs
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37718904
Yep,

with this new font here at EE it's a bit hard (for writers as well as for readers) to verify whether there's a space somewhere or not.

grep -E  "^([0-9]{1,2} |[*]).*"

Now it's there.
0
 
LVL 84

Expert Comment

by:ozo
ID: 37719122
But now it fails on
14/2
 or
14,16
or
14-16
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37719195
grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*" /exports/tmp/ip789

Please note that inside the first square bracket pair there is a space and a TAB.

This might work as well somewhere, but not in my shell:

grep -E  "^([ \t]{0,}[0-9]{1,2}[ ,/-]|[*@]).*" /exports/tmp/ip789
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719203
ozo,

please don't mention fcrontab. Please!
0
 

Author Comment

by:pvinodp
ID: 37719277
so if the user enter @daily or @monthly... then y pattern will fail to get the tasks.
Is there a pattern to fetch all valid tasks in crontab -l output ?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719280
Did you try my very last suggestion?

For crontab -l :

crontab -l | grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"
0
 

Author Comment

by:pvinodp
ID: 37719377
yes.. it works.
but I am not yet sure on how to check if it can filter all valid task definition.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719391
See man 5 crontab.

Create a test file containing all the formats mentioned there, and check.
0
 

Author Comment

by:pvinodp
ID: 37719938
My final filter command is :
 crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' | awk ' $0 !~ / backup \/usr\/lib\/backup / '

I call this inside a perl file
execute_command(crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' | awk ' $0 !~ / backup \/usr\/lib\/backup / ');

But what I see is that because of the presence of $0 in awk filter the output is not as expected... How do i go ahead?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37719970
Use grep instead of awk:

crontab -l | grep -E  '^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*' |grep -v "backup /usr/lib/backup"
0
 

Author Comment

by:pvinodp
ID: 37720039
@wmp
In the command: crontab -l | grep -E  "^([   ]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"

what is between [ and ]   , is it a space or a tab??
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37720086
See comment #37719195 above.

It's a TAB.
0
 

Author Comment

by:pvinodp
ID: 37723546
In your comment you said "Please note that inside the first square bracket pair there is a space and a TAB."

What is the purpose of having both tab and space?
And why are you not advising to use [\s] rather?
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 187 total points
ID: 37723561
your grep may interpret that as matching either the character \ or the character s
but your grep may recognize [[:space:]]
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 313 total points
ID: 37723771
Correct,

[[:space:]] or [[:blank:]] should work!

crontab -l | grep -E  "^([ [:space:]]{0,}[0-9]{1,2}[ ,/-]|[*@]).*"

"[[:space:]]" means all "whitespace" characters, which includes TABs, and other "invisible" characters, like vertical TAB, LF etc. [[:blank:]] means just space and TAB.

>> What is the purpose of having both tab and space? <<

That's because a crontab entry might well have leading tabs, not only leading spaces.

>> why are you not advising to use [\s] rather? <<

That's because it doesn't work with my grep. Try it!
0
 

Author Closing Comment

by:pvinodp
ID: 37760058
thanks a lot for your inputs
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Six Sigma Control Plans

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question