Link to home
Start Free TrialLog in
Avatar of GAUTAM
GAUTAMFlag for United States of America

asked on

How to parse a file in unix tcsh shell script

Hi experts...
I have afile in unix like..
25|aaa|hello|44
33|bbb|goodbye|55
As shown above the words are delimeted by '|' .
Can anyone help how to check if there are only four coloumns and how to display say first and fourth coloumn.
Avatar of simon3270
simon3270
Flag of United Kingdom of Great Britain and Northern Ireland image

awk -F\| '{if (NF == 4){print $1 "|" $4}else{printf "Row %d contains %d fields\n",NR,NF;}}' afile
Avatar of GAUTAM

ASKER

@simon3270::Can you explain the solution i am a newbie to shell scripting
Avatar of GAUTAM

ASKER

@simon3270:Sorry if i have not explained it correctly, but i meant that if only all records have four coloumns then display the first and the fourth coloumn.
No problem.

awk is a text manipulation program.  The "-F" parameter is the delimiter to use between fields - I put an escape (\) before it to stop the shell seeing it as a pipe between commands.  You can specify a single character to act as the separator bewteen fields (as here), or a set of characters enclosed in square brackets, any of which mark a new field - for example, -F'[|()]' will treat any of |, ( or ) as a field separator.

The next part (between the ' quotes) is the awk script, to tell awk how to process its input.  The usual format has multiple lines with the format:
   /pattern to match/{commands to run}
   /second pattern/{second set of commands}
where awk processes each input line in turn.  It first looks for "pattern to match" in the input line in turn, and if it finds it, run the "commands to run" seciton.  It then looks for "second pattern" in the same line, and if it finds it, runs "second set of commands".

In the case here, the "pattern to match" is missing, so the "commands to run" section is run on every input line.
The "commands to run" here, laid out in a more readable way, is:
    if (NF == 4){
       print $1 "|" $4
    }else{
       printf "Row %d contains %d fields\n",NR,NF;
    }
NF is a variable which awk sets on each input line, giving the number of fields in that line.  NR is an awk variable which gives the line number for the input line being processed.

If the number of fields equals 4, then awk processes the "print $1 "|" $4" section, which prints out the first field, then a pipe character, then the 4th field.  In awk, if values in the script are separated by a space, they are effectively joined together.

If the number for fields doesn't equal 4, the "else" part is processed.  In this case I use the printf command; the first argument to printf is a format string which here contains some text ("Row", "contains" and so on) and a couple of value markers - %d shows where a numeric value will appear, alternatives include %s for a text string, and %c for a single character.  printf processes the remaining arguments, and replaces the first "%" value (here %d, but could be %s, %c etc) with the value of the next argument after the format string (here, NR), the next "%" argument (the second %d) with the next argument after that (here NR) and so on.

Another common thing to see is a "pattern to match" but no "commands to run", e.g.:
    /third pattern/
In this case, if a line matches "third pattern", the entire line is printed out.

You can also limit the pattern to a single field. e.g.
  $2 ~ /want me/{print $1}
which will match the second field ($2) against the pattern "want me", and if it matches, print out the value of the first field (print $1).  "~" here matches a string, which "!~" is true if the field doesn't match the string.  the "pattern to match" can be simple text as here, or can contain regular expressions, such as:
   $3 ~ /^[Ff]re[ed]/
which matches field three so that the start of the field (the ^) matches either "F" or "f", the text "re", followed by either of "d" or "e" - so matches any lines where field 3 starts "Fred", "fred", "Free" or "free".
ASKER CERTIFIED SOLUTION
Avatar of simon3270
simon3270
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As for the shell part, when a program runs, it sets the special variable $? to its exit code - I make the awk script exit with 1 if the file is bad, or 0 if all rows have 4 fields.

The second line tests the value of $? - if it is 1, it processes the command lines up to the corresponding "endif" line (here, just the seocnd awk command).  If you want one set of commands to run if the test is successful, and another if it isn't, then have an else section, as in:
    if ($my_var == 17) then
       command to run if $my_var variable equals 17
       second command to run if my_var=17
    else
       first command to run if $my_var is not 17
       second command for not 17
       third not-17 command
    endif
Oops, just noticed an error in in the shell explanation; the script processes the second awk if the return is 0, not if is 1.