Avatar of davidw88
davidw88

asked on 

Bash/awk: how to write this script

hi experts,

I am a newbie in this awk field so any help is greatly appreciated.

My document has 4 columns that are "Id#", "Version#", "Offset" and "Counter". For each Id#, I want to summarize the 4th column numbers ("Counter") in first line, then in first 2 lines, then in first 6 lines, then in first 12 lines, then in first 72 lines, and finally, in first 288 lines if there are so many lines; otherwise just move to the next Id#.

Here is an axample.  Assume the data is
113148              102     6149464                 1
113148              101     8182239                 1
113148              93      6853673                 1
113148              85      5653822                 1
113148              83      5841268                 2
113148              55      9823293                 1
113148              47      6975500                 1
113148              46      2643610                 1
113148              27      8130717                 1
113148              25      9955716                 1
113148              23      8544588                 1
113148              17      3604921                 1
113148              8       9436067                 1
113148              7       7608340                 1
113144              103     5794705                 1
113144              96      3773021                 1
113144              78      6177799                 1
113144              61      1547860                 2
113144              60      8474118                 3
113144              59      2734357                 1
113144              56      1384793                 1
113144              55      9821781                 3
113144              52      2941869                 1
113144              46      2642528                 2
113144              42      4079550                 2
113144              41      1369991                 1
113144              40      2780453                 1
113144              37      1408274                 1
113144              33      7528543                 2
113144              32      7494823                 2
113144              31      9200646                 1
113144              28      8221181                 2
.............


Then for Id# "113148", I should get "1    2   7   13    15"
for Id# "113144", I should get "1   2  9   19   35"
.........

I have a generic code as attached, however it does not work accurately. In the code, "$CURRENT_VERSION" corresponds to the 1st line for a certain Id#, "$TEN_MIN" corresponds to the first 2 lines for this Id#, and "$THIRTY_MIN" corresponds to the first 6 lines, "
$ONE_HOUR" corresponds to the first 12  lines, and "$SIX_HOUR" corresponds to the first 72 lines.

This bash/awk gives following result:
for Id# 113148:   2  7  15
for Id# 113144:  1  1   14
they are not correct.

Any help? Thanks  so much.



INDEX='partnerId.txt'
SUMMARY='partnerId_sum.txt'
 
CURRENT_VERSION=`awk '{ if ($2 > max) max = $2} END { print max }' $INDEX`
let "TEN_MIN = $CURRENT_VERSION - 1"
let "THIRTY_MIN = $CURRENT_VERSION - 5"
let "ONE_HOUR = $CURRENT_VERSION - 10"
let "SIX_HOUR = $CURRENT_VERSION - 60"
 
awk '$2 >= v1 {a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' v1=$CURRENT_VERSION $INDEX > $SUMMARY
echo "---" >> $SUMMARY
awk '$2 >= v1 {a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' v1=$TEN_MIN $INDEX >> $SUMMARY
echo "---" >> $SUMMARY
awk '$2 >= v1 {a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' v1=$THIRTY_MIN $INDEX >> $SUMMARY
echo "---" >> $SUMMARY
awk '$2 >= v1 {a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' v1=$ONE_HOUR $INDEX >> $SUMMARY
echo "---" >> $SUMMARY
awk '$2 >= v1 {a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' v1=$SIX_HOUR $INDEX >> $SUMMARY
echo "---" >> $SUMMARY
awk '{a[$1]++;b[$1]=b[$1]+$4} END {for (i in a) print i,b[i]}' $INDEX >> $SUMMARY
~
~

Open in new window

Shell Scripting

Avatar of undefined
Last Comment
amit_g

8/22/2022 - Mon