GAUTAM
asked on
How to parse a file in unix tcsh shell script
Hi experts...
I have afile in unix like..
25|aaa|hello|44
33|bbb|goodbye|55
As shown above the words are delimeted by '|' .
Can anyone help how to check if there are only four coloumns and how to display say first and fourth coloumn.
I have afile in unix like..
25|aaa|hello|44
33|bbb|goodbye|55
As shown above the words are delimeted by '|' .
Can anyone help how to check if there are only four coloumns and how to display say first and fourth coloumn.
awk -F\| '{if (NF == 4){print $1 "|" $4}else{printf "Row %d contains %d fields\n",NR,NF;}}' afile
ASKER
@simon3270::Can you explain the solution i am a newbie to shell scripting
ASKER
@simon3270:Sorry if i have not explained it correctly, but i meant that if only all records have four coloumns then display the first and the fourth coloumn.
No problem.
awk is a text manipulation program. The "-F" parameter is the delimiter to use between fields - I put an escape (\) before it to stop the shell seeing it as a pipe between commands. You can specify a single character to act as the separator bewteen fields (as here), or a set of characters enclosed in square brackets, any of which mark a new field - for example, -F'[|()]' will treat any of |, ( or ) as a field separator.
The next part (between the ' quotes) is the awk script, to tell awk how to process its input. The usual format has multiple lines with the format:
/pattern to match/{commands to run}
/second pattern/{second set of commands}
where awk processes each input line in turn. It first looks for "pattern to match" in the input line in turn, and if it finds it, run the "commands to run" seciton. It then looks for "second pattern" in the same line, and if it finds it, runs "second set of commands".
In the case here, the "pattern to match" is missing, so the "commands to run" section is run on every input line.
The "commands to run" here, laid out in a more readable way, is:
if (NF == 4){
print $1 "|" $4
}else{
printf "Row %d contains %d fields\n",NR,NF;
}
NF is a variable which awk sets on each input line, giving the number of fields in that line. NR is an awk variable which gives the line number for the input line being processed.
If the number of fields equals 4, then awk processes the "print $1 "|" $4" section, which prints out the first field, then a pipe character, then the 4th field. In awk, if values in the script are separated by a space, they are effectively joined together.
If the number for fields doesn't equal 4, the "else" part is processed. In this case I use the printf command; the first argument to printf is a format string which here contains some text ("Row", "contains" and so on) and a couple of value markers - %d shows where a numeric value will appear, alternatives include %s for a text string, and %c for a single character. printf processes the remaining arguments, and replaces the first "%" value (here %d, but could be %s, %c etc) with the value of the next argument after the format string (here, NR), the next "%" argument (the second %d) with the next argument after that (here NR) and so on.
Another common thing to see is a "pattern to match" but no "commands to run", e.g.:
/third pattern/
In this case, if a line matches "third pattern", the entire line is printed out.
You can also limit the pattern to a single field. e.g.
$2 ~ /want me/{print $1}
which will match the second field ($2) against the pattern "want me", and if it matches, print out the value of the first field (print $1). "~" here matches a string, which "!~" is true if the field doesn't match the string. the "pattern to match" can be simple text as here, or can contain regular expressions, such as:
$3 ~ /^[Ff]re[ed]/
which matches field three so that the start of the field (the ^) matches either "F" or "f", the text "re", followed by either of "d" or "e" - so matches any lines where field 3 starts "Fred", "fred", "Free" or "free".
awk is a text manipulation program. The "-F" parameter is the delimiter to use between fields - I put an escape (\) before it to stop the shell seeing it as a pipe between commands. You can specify a single character to act as the separator bewteen fields (as here), or a set of characters enclosed in square brackets, any of which mark a new field - for example, -F'[|()]' will treat any of |, ( or ) as a field separator.
The next part (between the ' quotes) is the awk script, to tell awk how to process its input. The usual format has multiple lines with the format:
/pattern to match/{commands to run}
/second pattern/{second set of commands}
where awk processes each input line in turn. It first looks for "pattern to match" in the input line in turn, and if it finds it, run the "commands to run" seciton. It then looks for "second pattern" in the same line, and if it finds it, runs "second set of commands".
In the case here, the "pattern to match" is missing, so the "commands to run" section is run on every input line.
The "commands to run" here, laid out in a more readable way, is:
if (NF == 4){
print $1 "|" $4
}else{
printf "Row %d contains %d fields\n",NR,NF;
}
NF is a variable which awk sets on each input line, giving the number of fields in that line. NR is an awk variable which gives the line number for the input line being processed.
If the number of fields equals 4, then awk processes the "print $1 "|" $4" section, which prints out the first field, then a pipe character, then the 4th field. In awk, if values in the script are separated by a space, they are effectively joined together.
If the number for fields doesn't equal 4, the "else" part is processed. In this case I use the printf command; the first argument to printf is a format string which here contains some text ("Row", "contains" and so on) and a couple of value markers - %d shows where a numeric value will appear, alternatives include %s for a text string, and %c for a single character. printf processes the remaining arguments, and replaces the first "%" value (here %d, but could be %s, %c etc) with the value of the next argument after the format string (here, NR), the next "%" argument (the second %d) with the next argument after that (here NR) and so on.
Another common thing to see is a "pattern to match" but no "commands to run", e.g.:
/third pattern/
In this case, if a line matches "third pattern", the entire line is printed out.
You can also limit the pattern to a single field. e.g.
$2 ~ /want me/{print $1}
which will match the second field ($2) against the pattern "want me", and if it matches, print out the value of the first field (print $1). "~" here matches a string, which "!~" is true if the field doesn't match the string. the "pattern to match" can be simple text as here, or can contain regular expressions, such as:
$3 ~ /^[Ff]re[ed]/
which matches field three so that the start of the field (the ^) matches either "F" or "f", the text "re", followed by either of "d" or "e" - so matches any lines where field 3 starts "Fred", "fred", "Free" or "free".
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
As for the shell part, when a program runs, it sets the special variable $? to its exit code - I make the awk script exit with 1 if the file is bad, or 0 if all rows have 4 fields.
The second line tests the value of $? - if it is 1, it processes the command lines up to the corresponding "endif" line (here, just the seocnd awk command). If you want one set of commands to run if the test is successful, and another if it isn't, then have an else section, as in:
if ($my_var == 17) then
command to run if $my_var variable equals 17
second command to run if my_var=17
else
first command to run if $my_var is not 17
second command for not 17
third not-17 command
endif
The second line tests the value of $? - if it is 1, it processes the command lines up to the corresponding "endif" line (here, just the seocnd awk command). If you want one set of commands to run if the test is successful, and another if it isn't, then have an else section, as in:
if ($my_var == 17) then
command to run if $my_var variable equals 17
second command to run if my_var=17
else
first command to run if $my_var is not 17
second command for not 17
third not-17 command
endif
Oops, just noticed an error in in the shell explanation; the script processes the second awk if the return is 0, not if is 1.