AWK Arrays: A Beginner's Guide to the Fundamentals

Introduction

When you start learning data processing on the command line, many people encounter awk.

Among its features, handling arrays is extremely important, but it can be a somewhat confusing point for beginners.

In this article, we will carefully explain everything from the basics of arrays in awk to practical usage, covering the points where people tend to stumble.

Reference: GNU awk

Basic Rules for Declaring and Initializing Arrays

Create File

cat << 'EOF' > input.txt
apple 100
banana 200
apple 150
orange 300
banana 50
EOF

Command

awk '{ arr[$1] += $2 } END { for (key in arr) print key, arr[key] }' input.txt

Output

apple 250
banana 250
orange 300

How It Works

Element	Description
arr[$1]	Array using $1 (column 1) as the key
+= $2	Adds $2 (column 2) as the value
awk array	Associative array (string keys are OK)
END	Executed after all lines are processed
for (key in arr)	Loops through all keys in the array

Explanation

awk arrays behave as associative arrays and are automatically initialized simply by specifying a key.
This means addition operations are possible without any prior declaration.

How Associative Arrays Work and the Concept of Keys

Create File

cat << 'EOF' > input.txt
apple 100
banana 200
apple 150
orange 300
banana 50
EOF

Command

awk '{ sum[$1] += $2 } END { for (k in sum) print k, sum[k] }' input.txt

Output

apple 250
banana 250
orange 300

How It Works

Element	Description
Array name	sum
Key	$1 (column 1: strings such as "apple")
Value	Accumulated by adding $2 (numeric)
Behavior	Values are accumulated per identical key
END block	Outputs the total per key after all processing

Explanation

awk arrays are associative arrays, and their defining feature is that strings can be used as keys. Because values are automatically grouped by the same key, aggregation processing can be written concisely.

Efficient Loop Processing Using for (index in array)

Create File

cat << 'EOF' > input.txt
apple 3
banana 5
apple 2
orange 4
banana 1
EOF

Command

awk '{ arr[$1] += $2 } END { for (i in arr) print i, arr[i] }' input.txt

Output

apple 5
banana 6
orange 4

How It Works

Element	Description
$1	Column 1 (key)
$2	Column 2 (value to add)
arr[$1] += $2	Accumulates values per key
END	Executed after all lines are processed
for (i in arr)	Loops through all keys in the array
print i, arr[i]	Outputs the key and total value

Explanation

By using awk's associative arrays, you can efficiently aggregate per key in a single pass. The for (i in arr) construct allows you to concisely iterate over dynamically generated keys.

Checking Whether a Specific Element Exists

Create File

cat << 'EOF' > input.txt
apple
banana
orange
apple
grape
EOF

Command

awk '
{
  arr[$1]++
}
END {
  if ("apple" in arr) {
    print "apple exists"
  } else {
    print "apple does not exist"
  }
}' input.txt

Output

apple exists

How It Works

Element	Description
arr[$1]++	Stores each line's value as a key in the array and counts it
"apple" in arr	Checks whether a specific key exists in the array
END	Performs the check after all lines are processed

Explanation

Because awk associative arrays are automatically created the moment a key is encountered, existence checks can be done simply with in.
Combining this with counting operations allows for efficient determination.

Deleting Array Elements with the delete Function and Memory Management

Create File

cat << 'EOF' > input.txt
A 10
B 20
C 30
D 40
EOF

Command

awk '
{
  arr[NR] = $2
}
END {
  delete arr[2]
  for (i=1;i<=NR;i++) {
    if(i in arr) print i, arr[i]
  }
}' input.txt

Output

1 10
3 30
4 40

How It Works

Operation	Description
arr[NR] = $2	Stores the value using the line number as the key
delete arr[2]	Deletes the element with the specified key
for (i=1;i<=NR;i++)	Loops in numeric order
if(i in arr)	Outputs only keys that exist

Explanation

By combining a numeric loop with an in check, you can skip deleted elements while preserving order. This is a stable output method that does not depend on hash ordering.

Generating an Array from a String Using the split Function

Create File

cat << 'EOF' > input.txt
apple,banana,grape
dog,cat,bird
EOF

Command

awk '{ n = split($0, arr, ","); for(i=1;i<=n;i++) print arr[i] }' input.txt

Output

apple
banana
grape
dog
cat
bird

How It Works

Element	Description
split($0, arr, ",")	Splits the string by comma and stores the result in array arr
n	Number of elements after splitting
arr[i]	Each individual element after splitting
for loop	Processes each element of the array in order

Explanation

Using the split function, you can explicitly split a string into an array using any delimiter you choose.
Since the return value gives you the element count, it works well together with loop processing.

Simulating Multidimensional Arrays and the Role of the SUBSEP Variable

Create File

cat << 'EOF' > input.txt
A 1 x
A 2 y
B 1 z
B 2 w
EOF

Command

awk '{
  key = $1 SUBSEP $2
  arr[key] = $3
}
END {
  for (k in arr) {
    split(k, idx, SUBSEP)
    printf("arr[%s][%s] = %s\n", idx[1], idx[2], arr[k])
  }
}' input.txt

Output

arr[A][1] = x
arr[A][2] = y
arr[B][1] = z
arr[B][2] = w

Command

awk 'BEGIN { print "SUBSEP =", SUBSEP }'

Output

SUBSEP =

How It Works

Element	Description	Role
arr[key]	One-dimensional array	awk does not natively support multidimensional arrays
SUBSEP	Separator character (default is \034)	Combines multiple keys into one
$1 SUBSEP $2	Key generation	Achieves a pseudo two-dimensional array
split()	Key decomposition	Restores the original indices

Explanation

In awk, multidimensional arrays are internally managed using string keys, and SUBSEP handles the rules for combining them.
This mechanism allows you to flexibly simulate arrays of any number of dimensions.

How to Output Array Aggregation Results in the END Block

Create File

cat << 'EOF' > input.txt
apple 10
banana 20
apple 15
orange 5
banana 25
EOF

Command

awk '{ sum[$1] += $2 } END { for (i in sum) print i, sum[i] }' input.txt

Output

apple 25
banana 45
orange 5

How It Works

Element	Description
$1	Column 1 (key: fruit name)
$2	Column 2 (value: numeric)
sum[$1] += $2	Adds to the array per key
END	Executed after all lines are processed
for (i in sum)	Loops through all keys in the array
print i, sum[i]	Outputs the aggregation result

Explanation

By using awk's associative arrays, automatic aggregation per key is possible.
Outputting everything together in the END block is the standard pattern.

Joining Two Files on a Common Key (JOIN) Using Arrays

Create File

cat << 'EOF' > file1.txt
1 Alice 25
2 Bob 30
3 Carol 28
EOF

Create File

cat << 'EOF' > file2.txt
1 Tokyo
2 Osaka
4 Fukuoka
EOF

Command

awk 'NR==FNR {a[$1]=$2; next} ($1 in a) {print $1, a[$1], $2, $3}' file2.txt file1.txt

Output

1 Tokyo Alice 25
2 Osaka Bob 30

How It Works

Step	Content	Description
1	NR==FNR	Processes the first file (file2.txt)
2	a[$1]=$2	Stores the value ($2) in the array using the key ($1)
3	next	Moves to the next line
4	($1 in a)	Checks whether file1's key exists in the array
5	print	Outputs the JOIN result

Explanation

By using awk's associative arrays, high-speed JOIN processing based on a key is possible.
The behavior resembles an SQL inner join, achieved with a simple one-liner.

Key Takeaways for Mastering Arrays in awk

Arrays in awk are not mere arrays — they are a powerful feature with the flexibility of associative arrays.

Understanding the basics — such as the no-declaration-required characteristic, the freedom of key choice, and operations using for loops and the in operator — greatly expands the range of applications.

They are especially indispensable in practical scenarios such as log analysis, data aggregation, and file joining.

Beginners may sometimes be confused by ambiguous behavior, but by carefully understanding each rule one by one, you will definitely build skills you can use in practice.

Articles on how to use awk other than with the “Arrays”

The following link is an article about the awk command.

Please make use of it if you want to learn comprehensively.

Mastering the awk Command

Introduction

Basic Rules for Declaring and Initializing Arrays

Create File

Command

Output

How It Works

Explanation

How Associative Arrays Work and the Concept of Keys

Create File

Command

Output

How It Works

Explanation

Efficient Loop Processing Using for (index in array)

Create File

Command

Output

How It Works

Explanation

Checking Whether a Specific Element Exists

Create File

Command

Output

How It Works

Explanation

Deleting Array Elements with the delete Function and Memory Management

Create File

Command

Output

How It Works

Explanation

Generating an Array from a String Using the split Function

Create File

Command

Output

How It Works

Explanation

Simulating Multidimensional Arrays and the Role of the SUBSEP Variable

Create File

Command

Output

Command

Output

How It Works

Explanation

How to Output Array Aggregation Results in the END Block

Create File

Command

Output

How It Works

Explanation

Joining Two Files on a Common Key (JOIN) Using Arrays

Create File

Create File

Command

Output

How It Works

Explanation

Key Takeaways for Mastering Arrays in awk

Articles on how to use awk other than with the “Arrays”

Related Posts:

Leave a Reply Cancel reply