Introduction
awk is a powerful command specialized in text processing, widely used in log analysis, CSV processing, and many other scenarios. The BEGIN block plays an important role as the starting point of processing, but it is also a point where beginners tend to stumble. In this article, we will cover the basic structure and processing flow of awk while organizing practical knowledge centered on how to use BEGIN.
Reference: gawk manual
AWK Basic Structure and Processing Flow (BEGIN | MAIN | END)
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 300
EOF
Command
awk 'BEGIN { print "=== Start Processing ===" }
{ print "Item:", $1, "Price:", $2 }
END { print "=== End Processing ===" }' input.txt
Output
=== Start Processing ===
Item: apple Price: 100
Item: banana Price: 200
Item: orange Price: 300
=== End Processing ===
How It Works
| Phase | Timing | Role | Example |
|---|---|---|---|
| BEGIN | Before input processing | Initialization / Header output | BEGIN { print "Start" } |
| MAIN | Per line | Data processing | { print $1, $2 } |
| END | After input processing | Aggregation / Cleanup | END { print "End" } |
Explanation
BEGIN runs before processing, MAIN runs per line, and END runs after processing — a three-phase structure. Understanding this flow makes designing data processing with awk straightforward.
Variable and Array Initialization: Preparation Before Program Execution
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 150
apple 50
EOF
Command
awk 'BEGIN { sum=0; count=0 } { sum+=$2; count++ } END { print "sum="sum, "avg="sum/count }' input.txt
Output
sum=500 avg=125
Command
awk 'BEGIN { } { arr[$1]+=$2 } END { for (key in arr) print key, arr[key] }' input.txt
Output
apple 150
banana 200
orange 150
How It Works
| Section | Timing | Content | Role |
|---|---|---|---|
| BEGIN | Before execution | sum=0, count=0 | Variable initialization |
| Main processing | Per line | sum+=$2, count++ | Data aggregation |
| Main processing | Per line | arr[$1]+=$2 | Per-key aggregation |
| END | After execution | Average calculation / output | Display results |
Explanation
Arrays and variables are prepared in BEGIN, and values are accumulated per key during processing. Since awk arrays are associative arrays, they allow flexible aggregation using string keys.
Defining Built-in Variables (FS, OFS, RS, ORS)
Creating the File
cat << 'EOF' > input.txt
A 1
B 2
C 3
EOF
Command
awk 'BEGIN{FS=" "; OFS=","} {print $1,$2}' input.txt
Output
A,1
B,2
C,3
Command
awk 'BEGIN{RS="\n"; ORS="---\n"} {print $0}' input.txt
Output
A 1---
B 2---
C 3---
How It Works
| Variable | Meaning | Example | Effect |
|---|---|---|---|
| FS | Input field separator | FS=" " | Split lines by space |
| OFS | Output field separator | OFS="," | Output with comma separation |
| RS | Input record separator | RS="\n" | Process line by line |
| ORS | Output record separator | ORS="---\n" | Change end of output line |
Explanation
Defining built-in variables in the BEGIN block allows you to change separator rules before input processing begins. This enables flexible control over both input parsing and output format.
Using Only BEGIN: Calculations and Output That Require No File Input
Command
awk 'BEGIN { print "Hello, awk BEGIN!" }'
Output
Hello, awk BEGIN!
Command
awk 'BEGIN { sum=0; for(i=1;i<=5;i++) sum+=i; print sum }'
Output
15
How It Works
| Element | Content | Description |
|---|---|---|
| BEGIN | Pre-processing block | Executed before reading any input file |
| Output instruction | Displays results to standard output | |
| Variables | sum, i | Automatically created within awk |
| for statement | Repetition processing | Executes calculations in a loop |
| Input file | Not required | Processing is self-contained within BEGIN |
Explanation
Since the BEGIN block runs without any input, awk alone can perform calculations and produce output. This is extremely useful for simple scripts and one-liner calculations.
Outputting Header Information in CSV Processing
Creating the File
cat << 'EOF' > input.txt
name,age,city
Alice,30,Tokyo
Bob,25,Osaka
Charlie,35,Nagoya
EOF
Command
awk 'BEGIN { print "name,age,city" } { print }' input.txt
Output
name,age,city
name,age,city
Alice,30,Tokyo
Bob,25,Osaka
Charlie,35,Nagoya
Command
awk 'BEGIN { print "HEADER: name,age,city" } NR>1 { print }' input.txt
Output
HEADER: name,age,city
Alice,30,Tokyo
Bob,25,Osaka
Charlie,35,Nagoya
How It Works
| Element | Content |
|---|---|
| BEGIN | A block executed only once before input processing |
| NR | Current record number (line number) |
| NR>1 | Skip the first line (original header) |
| Output a line or string |
Explanation
Using BEGIN allows you to output a header independently of the input file, before any input is processed. Combined with NR conditions, you can flexibly exclude or modify the existing header.
External Arguments (-v Option) and BEGIN Block Priority
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
EOF
Command
awk 'BEGIN { x=1 } { print x, $0 }' input.txt
Output
1 apple 100
1 banana 200
Command
awk -v x=2 'BEGIN { } { print x, $0 }' input.txt
Output
2 apple 100
2 banana 200
Command
awk -v x=2 'BEGIN { x=1 } { print x, $0 }' input.txt
Output
1 apple 100
1 banana 200
How It Works
| Item | Content |
|---|---|
| -v option | Initializes a variable before awk executes |
| BEGIN block | Executed at the start of the script |
| Priority | Assignment inside BEGIN ultimately overwrites |
| Evaluation order | -v → BEGIN → Main processing |
Explanation
The value passed with -v is set as an initial value, and if reassigned inside the BEGIN block, it is overwritten. In other words, the result of BEGIN processing takes final precedence.
Complete Aggregation Processing by Combining with the END Block
Creating the File
cat << 'EOF' > input.txt
Alice 10
Bob 20
Alice 15
Bob 5
Charlie 30
EOF
Command
awk '
BEGIN { print "Aggregation Start" }
{ sum[$1] += $2 }
END {
print "Aggregation Results"
for (name in sum) {
print name, sum[name]
}
}
' input.txt
Output
Aggregation Start
Aggregation Results
Bob 25
Alice 25
Charlie 30
How It Works
| Block | Role | Content |
|---|---|---|
| BEGIN | Initial processing | Executed once at the start of processing |
| Main processing | Per-line processing | Add numeric values per name |
| END | Post-processing | Output aggregated results after all processing is complete |
Explanation
By initializing in BEGIN, accumulating data in the main block, and outputting everything in END, awk can handle the entire aggregation process on its own.
Tips for Writing Readable Scripts by Mastering BEGIN
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 150
EOF
Command
awk 'BEGIN { print "=== Fruit Price List ===" } { print $1, $2 "yen" } END { print "=== End ===" }' input.txt
Output
=== Fruit Price List ===
apple 100yen
banana 200yen
orange 150yen
=== End ===
How It Works
| Block | Timing | Role |
|---|---|---|
| BEGIN | Before reading | Header output / initialization |
| Body {} | Per line | Data processing (field manipulation, etc.) |
| END | After reading | Footer output / display aggregated results |
Explanation
Using BEGIN clearly separates pre-processing and greatly improves script readability. Writing headers and initialization logic there in particular helps keep the structure organized.
Common Errors in BEGIN Blocks and How to Handle Them
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 300
EOF
Command
awk 'BEGIN { print total } { total += $2 } END { print total }' input.txt
Output
600
Command
awk 'BEGIN { print $1 }' input.txt
Output
How It Works
| Item | Content |
|---|---|
| BEGIN block | Executed only once before input processing |
| Field variables (e.g. $1) | Only available after an input line has been read |
| Uninitialized variables | Treated as 0 or an empty string |
| Main block {} | Processed per line |
| END block | Executed once after all processing is complete |
Explanation
Since BEGIN executes before any input, line data and fields are not yet available — this is a classic pitfall. It is safest to use BEGIN exclusively for variable initialization.
Key Points for Mastering awk and BEGIN
BEGIN in awk is not merely a startup step — it is an important element that affects the overall design of the script.
When initialization and configuration in BEGIN are done properly, the MAIN and END processing becomes simple and clear. This effect is especially noticeable in CSV processing and aggregation tasks.
For beginners, the fastest path to understanding is to keep asking yourself "which processing belongs where?" while experimenting with small scripts through trial and error.
