Link to home
Start Free TrialLog in
Avatar of jackstreet
jackstreet

asked on

Pass a variable to awk inside a Korn shell script.

I'm parsing a logfile for referring pages.
I want to structure my script such that I can pass a variable to awk to tell it which URL to use as the referring page. I can't figure out how to pass the variable.
I'm using the Korn shell.

Here's the script:

egrep ' 20[0-9] | 30[0-9] ' $1 | uniq | awk '{ print $7" "$11}' | awk ' $1 ~ /.htm/' | sed -f ~/bin/nl |  sed -f ~/bin/am | awk ' $2 ~ /\/inthenews\//'  | sort -fd | uniq -c > $2

This script produces a list of all referrals from any page in the "inthenews" directory.  I hardcoded "/inthenews/" because I can't figure out how to pass a variable to awk.

Thanks in advance for any advice.
Avatar of NovaDenizen
NovaDenizen

I recommend that you switch over to perl.  awk and sed are good for simple tasks, but perl has the power to do this in about a ten line script, and it would be easy to parameterize that.

Sermon over.  The difficulty with parameterizing the variable in the awk statement is that the argument to awk is in single quotes, which do not permit expansion of variable names.  The question is, can we rewrite this argument without using single quotes?  The answer is yes, and we can do it by escaping individual characters.

Assuming $dir contains the directory,
awk \ \$2\ \~\ /\\/$dir\\//
should do the trick.  Note that all spaces, the '$' in $2, the ~, and the backslashes are all individually escaped because we want awk to see them, and the $ in $dir is not escaped because we want ksh to expand the $dir variable.  This way, ksh passes ' $2 ~ /\/dirname\//' as its sole argument.

It may also be possible to do it using double quotes, but I don't know the precise details off the top of my head.
Avatar of jackstreet

ASKER

NovaDenizen,
I got this when I ran the code and passed 3 variables to the script: $1, $2, $3, which are, respectively, logfile name, directory name (inthenews), name of results file :

*****************************************
awk: syntax error at source line 1
context is
       >>> \ <<< inthenews\~/\/\//
awk: bailing out at source line 1

*****************************************

Looks like it thinks the $2 in "awk ' $2 ~..." should be the passed $2 variable and not the second field. It should be "awking" the second field ($2) for the $2 variable.
Yes? Clear as mud?
Thanks so much!
[Aside - Perl seems so dense compared to these little UNIX machines!]
ASKER CERTIFIED SOLUTION
Avatar of NovaDenizen
NovaDenizen

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Double quotes might work too.
awk " \$2 \~ /\\/$dirname\\//"
I will check this Monday. Thanks!
Avatar of sunnycoder
arguments to an awk script can be passed just like command line variables ...

The number of arguments is held in special variable ARGC and arguments themselves are held in an array ARGV (same names as in C/C++ ... just the case is different)

If you can provide the exact input and output (formats), then maybe we can try to provide/suggest a better solution
The problem is that he wants awk to see '$2', '~', and the '\'s, but wants ksh to see the '$dirname' parameter. It's just a matter of getting the escapes correct.
NovaDenizen,
After trying the double quotes I got this:

*****************************************
awk: syntax error at source line 1
context is
      $2 >>> \ <<< ~ /\/inthenews\//
awk: bailing out at source line 1

*****************************************
But after trying the first suggestion: awk $' $2 ~ /\\/'$dirname$'\\//'
It worked! I am able to pass a directory name as a variable to awk and it's "awking" the second field.

I have to ask, how are you employing the first, third and fourth dollar signs in the above? All the documentation I've seen refers to the dollar sign as a means of variable substitution and you aren't using them for that.

It works great for directories but I may have to ask a related question if I can't figure out how to pass a partial URL as a variable.
Examples:  
WebHome/dirname/dirname/pagename
WebHome/dirname/pagename
WebHome/homepage

Is there a quick solution to that?
And one other example:
WebHome/dirname/pagename.html
Sorry -- in my question regarding the dollar signs I should have said the first and fourth dollar signs.
The ksh construct $'...' tells ksh to treat the ... the same as a C compiler would treat the double-quoted string "...".  C does not treat '$' or '~' characters in any special way, so these pass through unchanged.  Then comes $dirname, which is not quoted or escaped, so ksh sees it as a variable and substitutes the variable value for it.  Then comes another $'...' sequence for the end.

As far as I can tell, no other shell has a construct like ksh's $'...'.  The designers of ksh had some interesting ideas, but their implementation was kind of flawed.

The second question is difficult because awk is hardcoded to recognize '/' characters as boundaries for regular expressions.  If you absolutely must use awk, then you will need a routine that substitutes each "/" with "\/" so awk will know they are to be treated as regular characters.  Also, '.' is a special character in regular expressions, which normally matches up to any character.  

So, I think awk is inappropriate.  Told ya so :).  Perl is a winner here.  Here is a short script for you

#!/usr/bin/perl
while (<STDIN>) {
    @a = split(' ');
    if (index($a[1], $argv[0] != -1) { print $_ ; }
}

This looks at the second column of the input, and checks if the first script argument is a substring of the second column.  
If you want an equality check instead, substitute this line:
if ($a[1] eq $argv[0]) { print $_; }

Name the script something like 'ff2c' (find filename second column) or whatever, and use it instead of the awk command.
... | ff2c $dirname | ...


Oops, I forgot a parenthesis in the first if statement.
if (index($a[1], $argv[0]) != -1)
I don't have perl on my system. I'm running a barebones Unix in Windows version.  I'll have to figrue out what to do next.
Thanks!
use sed to escape the filename before you run your mega-pipeline.
newname = `echo $dirname | sed -e 's#/#\/#g' -e 's/\./\\./g'  .  I might have the escapes a bit wrong there.  The intent is to replace occurances of '/' with '\/', and replace occurances of '.' with '\.'.
using the following will also work:

awk '{ print myVar }' myVar=$aKshVar