Introduction
awk is a powerful command specialized for text processing, widely used for everything from simple scripts to advanced data analysis.
This article organizes the points that beginners tend to stumble on, focusing on the basic structure of the awk command and how script execution works.
Reference: GNU awk
Basic Structure of the awk Command and How Script Execution Works
Create File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 150
EOF
Command
awk '{print $1, $2}' input.txt
Output
apple 100
banana 200
orange 150
Command
awk '$2 > 150 {print $1}' input.txt
Output
banana
Command
awk '{sum += $2} END {print sum}' input.txt
Output
450
How It Works
| Element | Description |
|---|---|
| Pattern | Condition expression (e.g., $2 > 150) |
| Action | Execution process (e.g., {print $1}) |
| Field | References column data such as $1, $2 |
| END block | Process executed after all lines are processed |
| Script execution | Can be executed inline or via a file |
Explanation
awk is a stream-oriented tool that processes each line using a "pattern + action" model. It flexibly handles everything from simple one-liners to full script files.
Two Basic Methods for Saving and Executing awk Script Files
Create File
cat << 'EOF' > input.txt
apple
banana
cherry
EOF
Create File
cat << 'EOF' > script.awk
BEGIN { print "=== awk script start ===" }
{ print NR ":" $0 }
END { print "=== awk script end ===" }
EOF
Command
awk -f script.awk input.txt
Output
=== awk script start ===
1:apple
2:banana
3:cherry
=== awk script end ===
Command
sed -i '1i #!/usr/bin/awk -f' script.awk
chmod +x script.awk
./script.awk input.txt
Output
=== awk script start ===
1:apple
2:banana
3:cherry
=== awk script end ===
How It Works
| Method | How to Run | Mechanism |
|---|---|---|
| -f option | awk -f script.awk | Loads an external script file into the awk command |
| Executable file | ./script.awk | Specifies awk via shebang and runs the script directly |
Explanation
Separating an awk script into an external file improves reusability, and you can choose the execution method based on your use case. Use -f for simple processing, and make it an executable for tool-like use.
Initialization Processing Using the BEGIN Block
Create File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 150
EOF
Command
awk 'BEGIN { sum=0; print "=== Aggregation Start ===" } { sum += $2 } END { print "Total:", sum }' input.txt
Output
=== Aggregation Start ===
Total: 450
How It Works
| Block | Timing | Process |
|---|---|---|
| BEGIN | Before reading input | Variable initialization, header output |
| Main body | Each line | Adds up the numeric value in column 2 |
| END | After reading input | Outputs the total value |
Explanation
The BEGIN block executes only once before input processing begins, making it ideal for initialization.
It is frequently used in awk scripts as preparation before aggregation processing.
How to Handle Command-Line Arguments as Variables Inside a Script
Create File
cat << 'EOF' > script.awk
BEGIN {
arg1 = ARGV[1]
arg2 = ARGV[2]
print "arg1 =", arg1
print "arg2 =", arg2
# Delete so awk does not treat them as regular files
delete ARGV[1]
delete ARGV[2]
}
{
print "input:", $0
}
EOF
Command
echo "hello world" | awk -f script.awk foo bar
Output
arg1 = foo
arg2 = bar
input: hello world
How It Works
| Element | Description |
|---|---|
| ARGV | Array of command-line arguments |
| ARGV[0] | The awk command itself |
| ARGV[1..] | User-specified arguments |
| delete | Prevents awk from treating the entry as an input file |
| BEGIN | Block executed before input processing |
Explanation
Using ARGV allows you to handle arguments inside an awk script.
It is important to delete unused arguments, otherwise they will be processed as files.
Efficient Use of Regular Expressions in External Script Files
Create File
cat << 'EOF' > input.txt
apple 100
banana 200
apricot 150
grape 300
EOF
Create File
cat << 'EOF' > script.awk
/^(a|b)/ {
if ($2 ~ /^[0-9]+$/) {
sum += $2
print $1, $2
}
}
END {
print "TOTAL:", sum
}
EOF
Command
awk -f script.awk input.txt
Output
apple 100
banana 200
apricot 150
TOTAL: 450
How It Works
| Element | Description |
|---|---|
| Regex `/^(a|b)/` | Targets lines starting with a or b |
| Numeric check /^[0-9]+$/ | Validates that the field contains only digits |
| Action {...} | Describes the process when condition matches |
| sum += $2 | Accumulates numeric values |
| END | Outputs total in final processing |
Explanation
Writing regular expressions directly in the pattern section reduces branching and improves efficiency. Because awk can integrate conditions and processing, even external scripts can perform high-speed processing concisely.
Record Control Using Built-in Variables (NR, NF, FS)
Create File
cat << 'EOF' > input.txt
apple,fruit,100
carrot,vegetable,80
banana,fruit,120
EOF
Create File
cat << 'EOF' > script_all.sh
#!/bin/bash
awk -F',' '{ print "NR=" NR, "NF=" NF, "1=" $1, "2=" $2, "3=" $3 }' input.txt
EOF
Create File
cat << 'EOF' > script_line2.sh
#!/bin/bash
awk -F',' 'NR==2 { print "Line 2:", $1, $2, $3 }' input.txt
EOF
Create File
cat << 'EOF' > script_fruit.sh
#!/bin/bash
awk -F',' '$2=="fruit" { print "Fruit:", $1, $3 }' input.txt
EOF
Command
chmod +x script_all.sh script_line2.sh script_fruit.sh
Command
./script_all.sh
Output
NR=1 NF=3 1=apple 2=fruit 3=100
NR=2 NF=3 1=carrot 2=vegetable 3=80
NR=3 NF=3 1=banana 2=fruit 3=120
Command
./script_line2.sh
Output
Line 2: carrot vegetable 80
Command
./script_fruit.sh
Output
Fruit: apple 100
Fruit: banana 120
How It Works
| Variable | Meaning | Role |
|---|---|---|
| NR | Current record number | Line identification and conditional branching |
| NF | Number of fields | Understanding the column count |
| FS | Field separator | Field splitting (specified with -F) |
Explanation
Running processing while checking the input data makes it easier to understand how awk works.
Combining NR, NF, and FS enables flexible row- and column-based extraction.
Aggregating Results and Generating Reports with the END Block
Create File
cat << 'EOF' > input.txt
apple 100
banana 200
apple 150
banana 50
orange 300
EOF
Command
awk '{ sum[$1]+=$2 } END { for (k in sum) print k, sum[k] }' input.txt
Output
apple 250
banana 250
orange 300
How It Works
| Element | Description |
|---|---|
| $1 | Column 1 (key: product name) |
| $2 | Column 2 (value: numeric) |
| sum[$1]+=$2 | Adds to the total for each product |
| END | Block executed after all lines are processed |
| for (k in sum) | Loops over all keys in the associative array |
| print k, sum[k] | Outputs the aggregated result |
Explanation
Using the END block in awk enables you to aggregate and generate reports all at once after all data has been processed.
Using associative arrays allows flexible aggregation by key.
Building Complex Logic with Control Structures (if, for, while)
Create File
cat << 'EOF' > input.txt
apple 10
banana 5
orange 20
grape 15
EOF
Create File
cat << 'EOF' > script.awk
{
if ($2 >= 15) {
print $1 " is high"
} else if ($2 >= 10) {
print $1 " is medium"
} else {
print $1 " is low"
}
}
EOF
Command
awk -f script.awk input.txt
Output
apple is medium
banana is low
orange is high
grape is high
How It Works
| Element | Description |
|---|---|
| script.awk | The awk script body |
| $1, $2 | Fields (column 1: name, column 2: numeric value) |
| if | Conditional branch (15 or more) |
| else if | Intermediate condition (10 or more) |
| else | Processing for all other cases |
| -f | Specifies a script file for awk |
Explanation
Externalizing an awk script into a file improves reusability and readability. Complex conditional branching can also be organized and managed more easily.
Extending Scripts with the system Function
Create File
cat << 'EOF' > input.txt
apple 100
banana 200
orange 150
EOF
Command
awk '{ system("echo Item:" $1 ", Price:" $2) }' input.txt
Output
Item:apple, Price:100
Item:banana, Price:200
Item:orange, Price:150
How It Works
| Element | Description |
|---|---|
| awk | A scripting language that processes text line by line and field by field |
| $1, $2 | Represent column 1 and column 2 of each line |
| system function | A function that executes an external command |
| echo | A command that outputs a string to standard output |
| Processing flow | Read line → split fields → execute externally with system |
Explanation
Using awk's system function allows you to dynamically execute shell commands for each line. This enables flexible extensions that combine text processing with external commands.
How to Embed an awk Script Inside a Shell Script (bash)
Create File
cat << 'DATA' > input.txt
Alice 80
Bob 65
Charlie 90
DATA
Create File
cat << 'EOF' > script.sh
#!/bin/bash
# Embed and run an awk script
awk '{
if ($2 >= 70) {
print $1 " : Pass"
} else {
print $1 " : Fail"
}
}' input.txt
EOF
Command
chmod +x script.sh
./script.sh
Output
Alice : Pass
Bob : Fail
Charlie : Pass
How It Works
| Element | Content | Description |
|---|---|---|
| Shell script | script.sh | Controls the overall flow |
| Here document | cat << 'DATA' | Generates input data |
| awk script | awk '...' | Text processing logic |
| Field reference | $1, $2 | Space-delimited columns |
| Conditional branch | if ($2 >= 70) | Numeric judgment |
| Output | Displays processed result |
Explanation
Writing awk directly inside bash enables concise text processing without external files.
Combining it with here documents creates highly reproducible scripts.
Performance Optimization and Considerations for Large-Scale Data Processing
Create File
cat << 'EOF' > input.txt
id,name,score
1,Alice,82
2,Bob,91
3,Charlie,78
4,David,88
5,Eve,95
EOF
Create File
cat << 'EOF' > process.awk
BEGIN { FS="," }
NR>1 {
sum += $3
count++
}
END {
print "Average:", sum/count
}
EOF
Create File
cat << 'EOF' > filter.awk
BEGIN { FS="," }
NR==1 || $3 >= 90
EOF
Create File
cat << 'EOF' > skip_header.awk
BEGIN { FS="," }
NR>1 {
print $0
}
EOF
Command
awk -f process.awk input.txt
Output
Average: 86.8
Command
awk -f filter.awk input.txt
Output
id,name,score
2,Bob,91
5,Eve,95
Command
awk -f skip_header.awk input.txt
Output
1,Alice,82
2,Bob,91
3,Charlie,78
4,David,88
5,Eve,95
Command
awk -f skip_header.awk input.txt | sort -t',' -k3 -nr
Output
5,Eve,95
2,Bob,91
4,David,88
1,Alice,82
3,Charlie,78
How It Works
| Item | Description |
|---|---|
| Input splitting | FS="," enables efficient CSV processing |
| Skip | NR>1 excludes the header |
| Aggregation | Sequential addition saves memory |
| Conditional extraction | Outputs only matching conditions |
| Pipe integration | Delegates to sort for external processing |
| Delimiter specification | -t',' specifies the column delimiter |
| Key specification | -k3 uses column 3 as the sort key |
| Numeric sort | -n for numeric comparison |
| Descending sort | -r for descending order |
Explanation
Scripting awk makes it easy to split and reuse processing. For large-scale data, reducing I/O and designing with pipes is essential.
Creating Practical Scripts to Automate Log File Analysis
Create File
cat << 'EOF' > input.txt
2026-05-01 INFO User login success
2026-05-01 ERROR Database connection failed
2026-05-02 INFO File uploaded
2026-05-02 WARNING Disk space low
2026-05-03 ERROR Timeout occurred
EOF
Command
awk '$2 == "ERROR" {print $0}' input.txt
Output
2026-05-01 ERROR Database connection failed
2026-05-03 ERROR Timeout occurred
Command
awk '{count[$2]++} END {for (level in count) print level, count[level]}' input.txt
Output
INFO 2
ERROR 2
WARNING 1
Command
awk '$2=="ERROR" {print $1, $3, $4, $5}' input.txt
Output
2026-05-01 Database connection failed
2026-05-03 Timeout occurred
How It Works
| Process | awk Expression | Description |
|---|---|---|
| Conditional extraction | $2 == "ERROR" | Extracts only lines where column 2 is ERROR |
| Count | count[$2]++ | Counts occurrences by log level |
| End processing | END {} | Outputs results after all lines are processed |
| Field reference | $1, $2 ... | Specifies columns by space delimiter |
Explanation
Using awk allows you to filter, aggregate, and format logs in a single one-liner.
Simple yet powerful, it is highly effective for automation in operational environments.
Avoiding Syntax Errors and Unintended Behavior
Create File
cat << 'EOF' > input.txt
apple 10
banana 20
orange 30
EOF
Command
awk '{ print $1, $2 * 2 }' input.txt
Output
apple 20
banana 40
orange 60
Command
awk 'NF == 2 { sum += $2 } END { print sum }' input.txt
Output
60
Command
awk '{ if ($2 ~ /^[0-9]+$/) print $1 ":" $2 }' input.txt
Output
apple:10
banana:20
orange:30
How It Works
| Element | Description | Error Prevention Point |
|---|---|---|
| awk '{ ... }' | Executes processing for each line | Watch out for unclosed quotes |
| $1, $2 | Field (column) references | Avoid misconfiguring the delimiter |
| NF == 2 | Checks the number of fields | Prevents processing of invalid lines |
| ~ /^[0-9]+$/ | Validates numeric value with regex | Avoids malfunctions due to type mismatch |
| END { ... } | Processing after all lines | Prevents forgotten initialization or undefined variables |
Explanation
In awk scripts, clarifying the assumptions about the input format and including condition checks (NF and regular expressions) prevents syntax errors and unintended behavior.
Summary: Making the Most of awk Scripts
awk is a lightweight yet powerful text processing tool that truly shines when used systematically as a script.
Making use of BEGIN and END, understanding built-in variables, and mastering control structures form the foundation.
Furthermore, external integration and bash embedding enable automation at a practical level.
It is important to be mindful of error avoidance and performance, and to build up skills step by step.
