jackstreet
asked on
Pass a variable to awk inside a Korn shell script.
I'm parsing a logfile for referring pages.
I want to structure my script such that I can pass a variable to awk to tell it which URL to use as the referring page. I can't figure out how to pass the variable.
I'm using the Korn shell.
Here's the script:
egrep ' 20[0-9] | 30[0-9] ' $1 | uniq | awk '{ print $7" "$11}' | awk ' $1 ~ /.htm/' | sed -f ~/bin/nl | sed -f ~/bin/am | awk ' $2 ~ /\/inthenews\//' | sort -fd | uniq -c > $2
This script produces a list of all referrals from any page in the "inthenews" directory. I hardcoded "/inthenews/" because I can't figure out how to pass a variable to awk.
Thanks in advance for any advice.
I want to structure my script such that I can pass a variable to awk to tell it which URL to use as the referring page. I can't figure out how to pass the variable.
I'm using the Korn shell.
Here's the script:
egrep ' 20[0-9] | 30[0-9] ' $1 | uniq | awk '{ print $7" "$11}' | awk ' $1 ~ /.htm/' | sed -f ~/bin/nl | sed -f ~/bin/am | awk ' $2 ~ /\/inthenews\//' | sort -fd | uniq -c > $2
This script produces a list of all referrals from any page in the "inthenews" directory. I hardcoded "/inthenews/" because I can't figure out how to pass a variable to awk.
Thanks in advance for any advice.
ASKER
NovaDenizen,
I got this when I ran the code and passed 3 variables to the script: $1, $2, $3, which are, respectively, logfile name, directory name (inthenews), name of results file :
************************** ********** *****
awk: syntax error at source line 1
context is
>>> \ <<< inthenews\~/\/\//
awk: bailing out at source line 1
************************** ********** *****
Looks like it thinks the $2 in "awk ' $2 ~..." should be the passed $2 variable and not the second field. It should be "awking" the second field ($2) for the $2 variable.
Yes? Clear as mud?
Thanks so much!
[Aside - Perl seems so dense compared to these little UNIX machines!]
I got this when I ran the code and passed 3 variables to the script: $1, $2, $3, which are, respectively, logfile name, directory name (inthenews), name of results file :
**************************
awk: syntax error at source line 1
context is
>>> \ <<< inthenews\~/\/\//
awk: bailing out at source line 1
**************************
Looks like it thinks the $2 in "awk ' $2 ~..." should be the passed $2 variable and not the second field. It should be "awking" the second field ($2) for the $2 variable.
Yes? Clear as mud?
Thanks so much!
[Aside - Perl seems so dense compared to these little UNIX machines!]
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Double quotes might work too.
awk " \$2 \~ /\\/$dirname\\//"
awk " \$2 \~ /\\/$dirname\\//"
ASKER
I will check this Monday. Thanks!
arguments to an awk script can be passed just like command line variables ...
The number of arguments is held in special variable ARGC and arguments themselves are held in an array ARGV (same names as in C/C++ ... just the case is different)
If you can provide the exact input and output (formats), then maybe we can try to provide/suggest a better solution
The number of arguments is held in special variable ARGC and arguments themselves are held in an array ARGV (same names as in C/C++ ... just the case is different)
If you can provide the exact input and output (formats), then maybe we can try to provide/suggest a better solution
The problem is that he wants awk to see '$2', '~', and the '\'s, but wants ksh to see the '$dirname' parameter. It's just a matter of getting the escapes correct.
ASKER
NovaDenizen,
After trying the double quotes I got this:
************************** ********** *****
awk: syntax error at source line 1
context is
$2 >>> \ <<< ~ /\/inthenews\//
awk: bailing out at source line 1
************************** ********** *****
But after trying the first suggestion: awk $' $2 ~ /\\/'$dirname$'\\//'
It worked! I am able to pass a directory name as a variable to awk and it's "awking" the second field.
I have to ask, how are you employing the first, third and fourth dollar signs in the above? All the documentation I've seen refers to the dollar sign as a means of variable substitution and you aren't using them for that.
It works great for directories but I may have to ask a related question if I can't figure out how to pass a partial URL as a variable.
Examples:
WebHome/dirname/dirname/pa gename
WebHome/dirname/pagename
WebHome/homepage
Is there a quick solution to that?
After trying the double quotes I got this:
**************************
awk: syntax error at source line 1
context is
$2 >>> \ <<< ~ /\/inthenews\//
awk: bailing out at source line 1
**************************
But after trying the first suggestion: awk $' $2 ~ /\\/'$dirname$'\\//'
It worked! I am able to pass a directory name as a variable to awk and it's "awking" the second field.
I have to ask, how are you employing the first, third and fourth dollar signs in the above? All the documentation I've seen refers to the dollar sign as a means of variable substitution and you aren't using them for that.
It works great for directories but I may have to ask a related question if I can't figure out how to pass a partial URL as a variable.
Examples:
WebHome/dirname/dirname/pa
WebHome/dirname/pagename
WebHome/homepage
Is there a quick solution to that?
ASKER
And one other example:
WebHome/dirname/pagename.h tml
WebHome/dirname/pagename.h
ASKER
Sorry -- in my question regarding the dollar signs I should have said the first and fourth dollar signs.
The ksh construct $'...' tells ksh to treat the ... the same as a C compiler would treat the double-quoted string "...". C does not treat '$' or '~' characters in any special way, so these pass through unchanged. Then comes $dirname, which is not quoted or escaped, so ksh sees it as a variable and substitutes the variable value for it. Then comes another $'...' sequence for the end.
As far as I can tell, no other shell has a construct like ksh's $'...'. The designers of ksh had some interesting ideas, but their implementation was kind of flawed.
The second question is difficult because awk is hardcoded to recognize '/' characters as boundaries for regular expressions. If you absolutely must use awk, then you will need a routine that substitutes each "/" with "\/" so awk will know they are to be treated as regular characters. Also, '.' is a special character in regular expressions, which normally matches up to any character.
So, I think awk is inappropriate. Told ya so :). Perl is a winner here. Here is a short script for you
#!/usr/bin/perl
while (<STDIN>) {
@a = split(' ');
if (index($a[1], $argv[0] != -1) { print $_ ; }
}
This looks at the second column of the input, and checks if the first script argument is a substring of the second column.
If you want an equality check instead, substitute this line:
if ($a[1] eq $argv[0]) { print $_; }
Name the script something like 'ff2c' (find filename second column) or whatever, and use it instead of the awk command.
... | ff2c $dirname | ...
As far as I can tell, no other shell has a construct like ksh's $'...'. The designers of ksh had some interesting ideas, but their implementation was kind of flawed.
The second question is difficult because awk is hardcoded to recognize '/' characters as boundaries for regular expressions. If you absolutely must use awk, then you will need a routine that substitutes each "/" with "\/" so awk will know they are to be treated as regular characters. Also, '.' is a special character in regular expressions, which normally matches up to any character.
So, I think awk is inappropriate. Told ya so :). Perl is a winner here. Here is a short script for you
#!/usr/bin/perl
while (<STDIN>) {
@a = split(' ');
if (index($a[1], $argv[0] != -1) { print $_ ; }
}
This looks at the second column of the input, and checks if the first script argument is a substring of the second column.
If you want an equality check instead, substitute this line:
if ($a[1] eq $argv[0]) { print $_; }
Name the script something like 'ff2c' (find filename second column) or whatever, and use it instead of the awk command.
... | ff2c $dirname | ...
Oops, I forgot a parenthesis in the first if statement.
if (index($a[1], $argv[0]) != -1)
if (index($a[1], $argv[0]) != -1)
ASKER
I don't have perl on my system. I'm running a barebones Unix in Windows version. I'll have to figrue out what to do next.
Thanks!
Thanks!
use sed to escape the filename before you run your mega-pipeline.
newname = `echo $dirname | sed -e 's#/#\/#g' -e 's/\./\\./g' . I might have the escapes a bit wrong there. The intent is to replace occurances of '/' with '\/', and replace occurances of '.' with '\.'.
newname = `echo $dirname | sed -e 's#/#\/#g' -e 's/\./\\./g' . I might have the escapes a bit wrong there. The intent is to replace occurances of '/' with '\/', and replace occurances of '.' with '\.'.
using the following will also work:
awk '{ print myVar }' myVar=$aKshVar
awk '{ print myVar }' myVar=$aKshVar
Sermon over. The difficulty with parameterizing the variable in the awk statement is that the argument to awk is in single quotes, which do not permit expansion of variable names. The question is, can we rewrite this argument without using single quotes? The answer is yes, and we can do it by escaping individual characters.
Assuming $dir contains the directory,
awk \ \$2\ \~\ /\\/$dir\\//
should do the trick. Note that all spaces, the '$' in $2, the ~, and the backslashes are all individually escaped because we want awk to see them, and the $ in $dir is not escaped because we want ksh to expand the $dir variable. This way, ksh passes ' $2 ~ /\/dirname\//' as its sole argument.
It may also be possible to do it using double quotes, but I don't know the precise details off the top of my head.