• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 473
  • Last Modified:

How to parse a file in unix tcsh shell script

Hi experts...
I have afile in unix like..
As shown above the words are delimeted by '|' .
Can anyone help how to check if there are only four coloumns and how to display say first and fourth coloumn.
  • 5
  • 2
1 Solution
awk -F\| '{if (NF == 4){print $1 "|" $4}else{printf "Row %d contains %d fields\n",NR,NF;}}' afile
gaugetaAuthor Commented:
@simon3270::Can you explain the solution i am a newbie to shell scripting
gaugetaAuthor Commented:
@simon3270:Sorry if i have not explained it correctly, but i meant that if only all records have four coloumns then display the first and the fourth coloumn.
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

No problem.

awk is a text manipulation program.  The "-F" parameter is the delimiter to use between fields - I put an escape (\) before it to stop the shell seeing it as a pipe between commands.  You can specify a single character to act as the separator bewteen fields (as here), or a set of characters enclosed in square brackets, any of which mark a new field - for example, -F'[|()]' will treat any of |, ( or ) as a field separator.

The next part (between the ' quotes) is the awk script, to tell awk how to process its input.  The usual format has multiple lines with the format:
   /pattern to match/{commands to run}
   /second pattern/{second set of commands}
where awk processes each input line in turn.  It first looks for "pattern to match" in the input line in turn, and if it finds it, run the "commands to run" seciton.  It then looks for "second pattern" in the same line, and if it finds it, runs "second set of commands".

In the case here, the "pattern to match" is missing, so the "commands to run" section is run on every input line.
The "commands to run" here, laid out in a more readable way, is:
    if (NF == 4){
       print $1 "|" $4
       printf "Row %d contains %d fields\n",NR,NF;
NF is a variable which awk sets on each input line, giving the number of fields in that line.  NR is an awk variable which gives the line number for the input line being processed.

If the number of fields equals 4, then awk processes the "print $1 "|" $4" section, which prints out the first field, then a pipe character, then the 4th field.  In awk, if values in the script are separated by a space, they are effectively joined together.

If the number for fields doesn't equal 4, the "else" part is processed.  In this case I use the printf command; the first argument to printf is a format string which here contains some text ("Row", "contains" and so on) and a couple of value markers - %d shows where a numeric value will appear, alternatives include %s for a text string, and %c for a single character.  printf processes the remaining arguments, and replaces the first "%" value (here %d, but could be %s, %c etc) with the value of the next argument after the format string (here, NR), the next "%" argument (the second %d) with the next argument after that (here NR) and so on.

Another common thing to see is a "pattern to match" but no "commands to run", e.g.:
    /third pattern/
In this case, if a line matches "third pattern", the entire line is printed out.

You can also limit the pattern to a single field. e.g.
  $2 ~ /want me/{print $1}
which will match the second field ($2) against the pattern "want me", and if it matches, print out the value of the first field (print $1).  "~" here matches a string, which "!~" is true if the field doesn't match the string.  the "pattern to match" can be simple text as here, or can contain regular expressions, such as:
   $3 ~ /^[Ff]re[ed]/
which matches field three so that the start of the field (the ^) matches either "F" or "f", the text "re", followed by either of "d" or "e" - so matches any lines where field 3 starts "Fred", "fred", "Free" or "free".
Ah, just seen your update.

I think you'd need to do this in two stages.  the first would check for all lines having 4 fields, the second would print fields 1 and 4 if the first succeeded.

awk -F\| '{if (NF != 4){printf "Row %d contains %d fields\n",NR,NF;badfile=1}}END{exit badfile}' afile
if ($? == 0) then
  awk -F\| '{print $1 "|" $4}' afile

In awk I have used a couple of extra items - I set the badfile variable to 1 if I fine a line with not 4 fields (awk will assume badfile has value 0 if it isn't set), then the sppecial "END" "pattern to match" is run after all of the intput lines have been processed so is use, as here, to do work once the whole file has been read.  The corresponding "BEGIN" pattern is run before the first input line is processed, so is often used to initialise variables.
As for the shell part, when a program runs, it sets the special variable $? to its exit code - I make the awk script exit with 1 if the file is bad, or 0 if all rows have 4 fields.

The second line tests the value of $? - if it is 1, it processes the command lines up to the corresponding "endif" line (here, just the seocnd awk command).  If you want one set of commands to run if the test is successful, and another if it isn't, then have an else section, as in:
    if ($my_var == 17) then
       command to run if $my_var variable equals 17
       second command to run if my_var=17
       first command to run if $my_var is not 17
       second command for not 17
       third not-17 command
Oops, just noticed an error in in the shell explanation; the script processes the second awk if the return is 0, not if is 1.

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now