How to parse a file in unix tcsh shell script

Posted on 2011-05-05
Last Modified: 2012-05-11
Hi experts...
I have afile in unix like..
As shown above the words are delimeted by '|' .
Can anyone help how to check if there are only four coloumns and how to display say first and fourth coloumn.
Question by:gaugeta
    LVL 19

    Expert Comment

    awk -F\| '{if (NF == 4){print $1 "|" $4}else{printf "Row %d contains %d fields\n",NR,NF;}}' afile

    Author Comment

    @simon3270::Can you explain the solution i am a newbie to shell scripting

    Author Comment

    @simon3270:Sorry if i have not explained it correctly, but i meant that if only all records have four coloumns then display the first and the fourth coloumn.
    LVL 19

    Expert Comment

    No problem.

    awk is a text manipulation program.  The "-F" parameter is the delimiter to use between fields - I put an escape (\) before it to stop the shell seeing it as a pipe between commands.  You can specify a single character to act as the separator bewteen fields (as here), or a set of characters enclosed in square brackets, any of which mark a new field - for example, -F'[|()]' will treat any of |, ( or ) as a field separator.

    The next part (between the ' quotes) is the awk script, to tell awk how to process its input.  The usual format has multiple lines with the format:
       /pattern to match/{commands to run}
       /second pattern/{second set of commands}
    where awk processes each input line in turn.  It first looks for "pattern to match" in the input line in turn, and if it finds it, run the "commands to run" seciton.  It then looks for "second pattern" in the same line, and if it finds it, runs "second set of commands".

    In the case here, the "pattern to match" is missing, so the "commands to run" section is run on every input line.
    The "commands to run" here, laid out in a more readable way, is:
        if (NF == 4){
           print $1 "|" $4
           printf "Row %d contains %d fields\n",NR,NF;
    NF is a variable which awk sets on each input line, giving the number of fields in that line.  NR is an awk variable which gives the line number for the input line being processed.

    If the number of fields equals 4, then awk processes the "print $1 "|" $4" section, which prints out the first field, then a pipe character, then the 4th field.  In awk, if values in the script are separated by a space, they are effectively joined together.

    If the number for fields doesn't equal 4, the "else" part is processed.  In this case I use the printf command; the first argument to printf is a format string which here contains some text ("Row", "contains" and so on) and a couple of value markers - %d shows where a numeric value will appear, alternatives include %s for a text string, and %c for a single character.  printf processes the remaining arguments, and replaces the first "%" value (here %d, but could be %s, %c etc) with the value of the next argument after the format string (here, NR), the next "%" argument (the second %d) with the next argument after that (here NR) and so on.

    Another common thing to see is a "pattern to match" but no "commands to run", e.g.:
        /third pattern/
    In this case, if a line matches "third pattern", the entire line is printed out.

    You can also limit the pattern to a single field. e.g.
      $2 ~ /want me/{print $1}
    which will match the second field ($2) against the pattern "want me", and if it matches, print out the value of the first field (print $1).  "~" here matches a string, which "!~" is true if the field doesn't match the string.  the "pattern to match" can be simple text as here, or can contain regular expressions, such as:
       $3 ~ /^[Ff]re[ed]/
    which matches field three so that the start of the field (the ^) matches either "F" or "f", the text "re", followed by either of "d" or "e" - so matches any lines where field 3 starts "Fred", "fred", "Free" or "free".
    LVL 19

    Accepted Solution

    Ah, just seen your update.

    I think you'd need to do this in two stages.  the first would check for all lines having 4 fields, the second would print fields 1 and 4 if the first succeeded.

    awk -F\| '{if (NF != 4){printf "Row %d contains %d fields\n",NR,NF;badfile=1}}END{exit badfile}' afile
    if ($? == 0) then
      awk -F\| '{print $1 "|" $4}' afile

    In awk I have used a couple of extra items - I set the badfile variable to 1 if I fine a line with not 4 fields (awk will assume badfile has value 0 if it isn't set), then the sppecial "END" "pattern to match" is run after all of the intput lines have been processed so is use, as here, to do work once the whole file has been read.  The corresponding "BEGIN" pattern is run before the first input line is processed, so is often used to initialise variables.
    LVL 19

    Expert Comment

    As for the shell part, when a program runs, it sets the special variable $? to its exit code - I make the awk script exit with 1 if the file is bad, or 0 if all rows have 4 fields.

    The second line tests the value of $? - if it is 1, it processes the command lines up to the corresponding "endif" line (here, just the seocnd awk command).  If you want one set of commands to run if the test is successful, and another if it isn't, then have an else section, as in:
        if ($my_var == 17) then
           command to run if $my_var variable equals 17
           second command to run if my_var=17
           first command to run if $my_var is not 17
           second command for not 17
           third not-17 command
    LVL 19

    Expert Comment

    Oops, just noticed an error in in the shell explanation; the script processes the second awk if the return is 0, not if is 1.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Enabling OSINT in Activity Based Intelligence

    Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

    Suggested Solutions

    This Windows batch file is useful for organizing image files from a digital camera or other source, but can have many other uses.  It simply renames the file(s) to match their create date.  For example, if you took a picture today at 1:40pm and the …
    How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
    Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
    This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor ( If you're looking for how to monitor bandwidth using netflow or packet s…

    737 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now