# Sum column and count lines

I am trying to sum certain numbers in colum 2, it works with my code. But I want to count also how many times the same value in colum 2 is repeated and print in the last column.

file1

``````36  2605 1 2
36  2605 1 2
36  2603 1 2
36  2605 1 2
36  2605 1 2
36  2605 1 2
36  2606 1 2
``````

Output Desired

``````2603  36 1  2 1
2605 180 5 10 5
2606  36 1  2 1
``````

I tried

``````awk '{a[\$2]+=\$1}{b[\$2]+=\$3}{c[\$2]+=\$4;count[\$2]+=\$2}END{for(i in a)print i,a[i],b[i],c[i],count[i]}' file1
``````

Renamed the vars and added pretty print:

``````awk '
{
sum1[\$2]+=\$1
sum3[\$2]+=\$3
sum4[\$2]+=\$4
count[\$2]++

len2=((l=length(\$2))>len2?l:len2)
len1=((l=length(sum1[\$2]))>len1?l:len1)
len3=((l=length(sum3[\$2]))>len3?l:len3)
len4=((l=length(sum4[\$2]))>len4?l:len4)
len5=((l=length(sum5[\$2]))>len5?l:len5)
}
END {
for(i in count) {
printf "%*d %*d %*d %*d %*d\n",
len2,i,len1,sum1[i],len3,sum3[i],len4,sum4[i],len5,count[i]
}
}' file
``````

Output:

``````2603  36 1  2 1
2605 180 5 10 5
2606  36 1  2 1
``````
• James, Thank your for the code – OXXO Apr 21 at 14:40

Space chars are relatively inexpensive these days, you should really consider getting some for your code, especially if you want other people to read it to help you debug it! Here's the code you posted:

``````awk '{a[\$2]+=\$1}{b[\$2]+=\$3}{c[\$2]+=\$4;count[\$2]+=\$2}END{for(i in a)print i,a[i],b[i],c[i],count[i]}' file1
``````

and here it is after having been run through a code beautifier (I used `gawk -o`):

``````{
a[\$2] += \$1
}

{
b[\$2] += \$3
}

{
c[\$2] += \$4
count[\$2] += \$2
}

END {
for (i in a) {
print i, a[i], b[i], c[i], count[i]
}
}
``````

See how just by adding some white space it's now vastly easier to understand and so the bug in how `count[\$2]` is being populated is glaringly obvious? Some meaningful variable names are always extremely useful too and I hear alphanumeric chars are on special right now!

FWIW here's how I'd do this:

``````\$ cat tst.awk
BEGIN { keyFldNr = 2 }
{
numOutFlds = 0
for (i=1; i<=NF; i++) {
if (i != keyFldNr) {
sum[\$keyFldNr,++numOutFlds] += \$i
}
}
cnt[\$keyFldNr]++
}
END {
for (key in cnt) {
printf "%s%s", key, OFS
for (i=1; i<=numOutFlds; i++) {
printf "%s%s", sum[key,i], OFS
}
print cnt[key]
}
}

\$ awk -f tst.awk file
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1

\$ awk -f tst.awk file | column -t
2603  36   1  2   1
2605  180  5  10  5
2606  36   1  2   1
``````

Notice that it'll work as-is no matter how many fields you have on each line and if you need to use a different field for the key that you count and sum on then you just change the value of keyFldNr in the BEGIN section from 2 to whatever you want it to be.

• Ed Morton, Thanks for all explanations and code. Appreciate your help on this – OXXO Apr 22 at 5:01

You've almost nailed it, you're not increasing `count[\$2]` properly.

``````\$ awk '{a[\$2]+=\$1;b[\$2]+=\$3;c[\$2]+=\$4;count[\$2]++}
END{for(i in a) print i,a[i],b[i],c[i],count[i]}' file
2603 36 1 2 1
2605 180 5 10 5
2606 36 1 2 1
``````

A non-awk approach, using the very useful GNU datamash, which is designed for tasks like this one:

``````\$ datamash -Ws groupby 2 sum 1,3,4 count 2 < input.txt
2603    36  1   2   1
2605    180 5   10  5
2606    36  1   2   1
``````

Read as: For each group of rows with the same value in column 2, display that value, the sums of columns 1, 3 and 4, and the number of rows in the group.

no need external program, faster ~21ms, tried on pure gnu awk

``````awk '{if(\$0~/^[A-Za-z0-9]/)a[NR]=\$2" "\$1" "\$3" "\$4}END{asort(a);\$0="";for(;i++<NR;){split(a[i],b);if(\$1==""||b[1]==\$1){\$2+=b[2];\$3+=b[3];\$4+=b[4];\$5++} else {print;\$2=b[2];\$3=b[3];\$4=b[4];\$5=1} \$1=b[1]} print}' file1
``````