copied to clipboard!
string awk

Mastering Text Processing: Learning awk and Scripting

updated: 2026/05/05 created: 2026/05/05

Introduction

awk is a powerful command specialized for text processing, widely used for everything from simple scripts to advanced data analysis.

This article organizes the points that beginners tend to stumble on, focusing on the basic structure of the awk command and how script execution works.

Reference: GNU awk

Basic Structure of the awk Command and How Script Execution Works

Create File

cat << 'EOF' > input.txt apple 100 banana 200 orange 150 EOF

Command

awk '{print $1, $2}' input.txt

Output

apple 100
banana 200
orange 150

Command

awk '$2 > 150 {print $1}' input.txt

Output

banana

Command

awk '{sum += $2} END {print sum}' input.txt

Output

450

How It Works

ElementDescription
PatternCondition expression (e.g., $2 > 150)
ActionExecution process (e.g., {print $1})
FieldReferences column data such as $1, $2
END blockProcess executed after all lines are processed
Script executionCan be executed inline or via a file

Explanation

awk is a stream-oriented tool that processes each line using a "pattern + action" model. It flexibly handles everything from simple one-liners to full script files.

Two Basic Methods for Saving and Executing awk Script Files

Create File

cat << 'EOF' > input.txt apple banana cherry EOF

Create File

cat << 'EOF' > script.awk BEGIN { print "=== awk script start ===" } { print NR ":" $0 } END { print "=== awk script end ===" } EOF

Command

awk -f script.awk input.txt

Output

=== awk script start ===
1:apple
2:banana
3:cherry
=== awk script end ===

Command

sed -i '1i #!/usr/bin/awk -f' script.awk chmod +x script.awk ./script.awk input.txt

Output

=== awk script start ===
1:apple
2:banana
3:cherry
=== awk script end ===

How It Works

MethodHow to RunMechanism
-f optionawk -f script.awkLoads an external script file into the awk command
Executable file./script.awkSpecifies awk via shebang and runs the script directly

Explanation

Separating an awk script into an external file improves reusability, and you can choose the execution method based on your use case. Use -f for simple processing, and make it an executable for tool-like use.

Initialization Processing Using the BEGIN Block

Create File

cat << 'EOF' > input.txt apple 100 banana 200 orange 150 EOF

Command

awk 'BEGIN { sum=0; print "=== Aggregation Start ===" } { sum += $2 } END { print "Total:", sum }' input.txt

Output

=== Aggregation Start ===
Total: 450

How It Works

BlockTimingProcess
BEGINBefore reading inputVariable initialization, header output
Main bodyEach lineAdds up the numeric value in column 2
ENDAfter reading inputOutputs the total value

Explanation

The BEGIN block executes only once before input processing begins, making it ideal for initialization.
It is frequently used in awk scripts as preparation before aggregation processing.

How to Handle Command-Line Arguments as Variables Inside a Script

Create File

cat << 'EOF' > script.awk BEGIN { arg1 = ARGV[1] arg2 = ARGV[2] print "arg1 =", arg1 print "arg2 =", arg2 # Delete so awk does not treat them as regular files delete ARGV[1] delete ARGV[2] } { print "input:", $0 } EOF

Command

echo "hello world" | awk -f script.awk foo bar

Output

arg1 = foo
arg2 = bar
input: hello world

How It Works

ElementDescription
ARGVArray of command-line arguments
ARGV[0]The awk command itself
ARGV[1..]User-specified arguments
deletePrevents awk from treating the entry as an input file
BEGINBlock executed before input processing

Explanation

Using ARGV allows you to handle arguments inside an awk script.
It is important to delete unused arguments, otherwise they will be processed as files.

Efficient Use of Regular Expressions in External Script Files

Create File

cat << 'EOF' > input.txt apple 100 banana 200 apricot 150 grape 300 EOF

Create File

cat << 'EOF' > script.awk /^(a|b)/ { if ($2 ~ /^[0-9]+$/) { sum += $2 print $1, $2 } } END { print "TOTAL:", sum } EOF

Command

awk -f script.awk input.txt

Output

apple 100
banana 200
apricot 150
TOTAL: 450

How It Works

ElementDescription
Regex `/^(a|b)/`Targets lines starting with a or b
Numeric check /^[0-9]+$/Validates that the field contains only digits
Action {...}Describes the process when condition matches
sum += $2Accumulates numeric values
ENDOutputs total in final processing

Explanation

Writing regular expressions directly in the pattern section reduces branching and improves efficiency. Because awk can integrate conditions and processing, even external scripts can perform high-speed processing concisely.

Record Control Using Built-in Variables (NR, NF, FS)

Create File

cat << 'EOF' > input.txt apple,fruit,100 carrot,vegetable,80 banana,fruit,120 EOF

Create File

cat << 'EOF' > script_all.sh #!/bin/bash awk -F',' '{ print "NR=" NR, "NF=" NF, "1=" $1, "2=" $2, "3=" $3 }' input.txt EOF

Create File

cat << 'EOF' > script_line2.sh #!/bin/bash awk -F',' 'NR==2 { print "Line 2:", $1, $2, $3 }' input.txt EOF

Create File

cat << 'EOF' > script_fruit.sh #!/bin/bash awk -F',' '$2=="fruit" { print "Fruit:", $1, $3 }' input.txt EOF

Command

chmod +x script_all.sh script_line2.sh script_fruit.sh

Command

./script_all.sh

Output

NR=1 NF=3 1=apple 2=fruit 3=100
NR=2 NF=3 1=carrot 2=vegetable 3=80
NR=3 NF=3 1=banana 2=fruit 3=120

Command

./script_line2.sh

Output

Line 2: carrot vegetable 80

Command

./script_fruit.sh

Output

Fruit: apple 100
Fruit: banana 120

How It Works

VariableMeaningRole
NRCurrent record numberLine identification and conditional branching
NFNumber of fieldsUnderstanding the column count
FSField separatorField splitting (specified with -F)

Explanation

Running processing while checking the input data makes it easier to understand how awk works.
Combining NR, NF, and FS enables flexible row- and column-based extraction.

Aggregating Results and Generating Reports with the END Block

Create File

cat << 'EOF' > input.txt apple 100 banana 200 apple 150 banana 50 orange 300 EOF

Command

awk '{ sum[$1]+=$2 } END { for (k in sum) print k, sum[k] }' input.txt

Output

apple 250
banana 250
orange 300

How It Works

ElementDescription
$1Column 1 (key: product name)
$2Column 2 (value: numeric)
sum[$1]+=$2Adds to the total for each product
ENDBlock executed after all lines are processed
for (k in sum)Loops over all keys in the associative array
print k, sum[k]Outputs the aggregated result

Explanation

Using the END block in awk enables you to aggregate and generate reports all at once after all data has been processed.
Using associative arrays allows flexible aggregation by key.

Building Complex Logic with Control Structures (if, for, while)

Create File

cat << 'EOF' > input.txt apple 10 banana 5 orange 20 grape 15 EOF

Create File

cat << 'EOF' > script.awk { if ($2 >= 15) { print $1 " is high" } else if ($2 >= 10) { print $1 " is medium" } else { print $1 " is low" } } EOF

Command

awk -f script.awk input.txt

Output

apple is medium
banana is low
orange is high
grape is high

How It Works

ElementDescription
script.awkThe awk script body
$1, $2Fields (column 1: name, column 2: numeric value)
ifConditional branch (15 or more)
else ifIntermediate condition (10 or more)
elseProcessing for all other cases
-fSpecifies a script file for awk

Explanation

Externalizing an awk script into a file improves reusability and readability. Complex conditional branching can also be organized and managed more easily.

Extending Scripts with the system Function

Create File

cat << 'EOF' > input.txt apple 100 banana 200 orange 150 EOF

Command

awk '{ system("echo Item:" $1 ", Price:" $2) }' input.txt

Output

Item:apple, Price:100
Item:banana, Price:200
Item:orange, Price:150

How It Works

ElementDescription
awkA scripting language that processes text line by line and field by field
$1, $2Represent column 1 and column 2 of each line
system functionA function that executes an external command
echoA command that outputs a string to standard output
Processing flowRead line → split fields → execute externally with system

Explanation

Using awk's system function allows you to dynamically execute shell commands for each line. This enables flexible extensions that combine text processing with external commands.

How to Embed an awk Script Inside a Shell Script (bash)

Create File

cat << 'DATA' > input.txt Alice 80 Bob 65 Charlie 90 DATA

Create File

cat << 'EOF' > script.sh #!/bin/bash # Embed and run an awk script awk '{ if ($2 >= 70) { print $1 " : Pass" } else { print $1 " : Fail" } }' input.txt EOF

Command

chmod +x script.sh ./script.sh

Output

Alice : Pass
Bob : Fail
Charlie : Pass

How It Works

ElementContentDescription
Shell scriptscript.shControls the overall flow
Here documentcat << 'DATA'Generates input data
awk scriptawk '...'Text processing logic
Field reference$1, $2Space-delimited columns
Conditional branchif ($2 >= 70)Numeric judgment
OutputprintDisplays processed result

Explanation

Writing awk directly inside bash enables concise text processing without external files.
Combining it with here documents creates highly reproducible scripts.

Performance Optimization and Considerations for Large-Scale Data Processing

Create File

cat << 'EOF' > input.txt id,name,score 1,Alice,82 2,Bob,91 3,Charlie,78 4,David,88 5,Eve,95 EOF

Create File

cat << 'EOF' > process.awk BEGIN { FS="," } NR>1 { sum += $3 count++ } END { print "Average:", sum/count } EOF

Create File

cat << 'EOF' > filter.awk BEGIN { FS="," } NR==1 || $3 >= 90 EOF

Create File

cat << 'EOF' > skip_header.awk BEGIN { FS="," } NR>1 { print $0 } EOF

Command

awk -f process.awk input.txt

Output

Average: 86.8

Command

awk -f filter.awk input.txt

Output

id,name,score
2,Bob,91
5,Eve,95

Command

awk -f skip_header.awk input.txt

Output

1,Alice,82
2,Bob,91
3,Charlie,78
4,David,88
5,Eve,95

Command

awk -f skip_header.awk input.txt | sort -t',' -k3 -nr

Output

5,Eve,95
2,Bob,91
4,David,88
1,Alice,82
3,Charlie,78

How It Works

ItemDescription
Input splittingFS="," enables efficient CSV processing
SkipNR>1 excludes the header
AggregationSequential addition saves memory
Conditional extractionOutputs only matching conditions
Pipe integrationDelegates to sort for external processing
Delimiter specification-t',' specifies the column delimiter
Key specification-k3 uses column 3 as the sort key
Numeric sort-n for numeric comparison
Descending sort-r for descending order

Explanation

Scripting awk makes it easy to split and reuse processing. For large-scale data, reducing I/O and designing with pipes is essential.

Creating Practical Scripts to Automate Log File Analysis

Create File

cat << 'EOF' > input.txt 2026-05-01 INFO User login success 2026-05-01 ERROR Database connection failed 2026-05-02 INFO File uploaded 2026-05-02 WARNING Disk space low 2026-05-03 ERROR Timeout occurred EOF

Command

awk '$2 == "ERROR" {print $0}' input.txt

Output

2026-05-01 ERROR Database connection failed
2026-05-03 ERROR Timeout occurred

Command

awk '{count[$2]++} END {for (level in count) print level, count[level]}' input.txt

Output

INFO 2
ERROR 2
WARNING 1

Command

awk '$2=="ERROR" {print $1, $3, $4, $5}' input.txt

Output

2026-05-01 Database connection failed
2026-05-03 Timeout occurred

How It Works

Processawk ExpressionDescription
Conditional extraction$2 == "ERROR"Extracts only lines where column 2 is ERROR
Countcount[$2]++Counts occurrences by log level
End processingEND {}Outputs results after all lines are processed
Field reference$1, $2 ...Specifies columns by space delimiter

Explanation

Using awk allows you to filter, aggregate, and format logs in a single one-liner.
Simple yet powerful, it is highly effective for automation in operational environments.

Avoiding Syntax Errors and Unintended Behavior

Create File

cat << 'EOF' > input.txt apple 10 banana 20 orange 30 EOF

Command

awk '{ print $1, $2 * 2 }' input.txt

Output

apple 20
banana 40
orange 60

Command

awk 'NF == 2 { sum += $2 } END { print sum }' input.txt

Output

60

Command

awk '{ if ($2 ~ /^[0-9]+$/) print $1 ":" $2 }' input.txt

Output

apple:10
banana:20
orange:30

How It Works

ElementDescriptionError Prevention Point
awk '{ ... }'Executes processing for each lineWatch out for unclosed quotes
$1, $2Field (column) referencesAvoid misconfiguring the delimiter
NF == 2Checks the number of fieldsPrevents processing of invalid lines
~ /^[0-9]+$/Validates numeric value with regexAvoids malfunctions due to type mismatch
END { ... }Processing after all linesPrevents forgotten initialization or undefined variables

Explanation

In awk scripts, clarifying the assumptions about the input format and including condition checks (NF and regular expressions) prevents syntax errors and unintended behavior.

Summary: Making the Most of awk Scripts

awk is a lightweight yet powerful text processing tool that truly shines when used systematically as a script.

Making use of BEGIN and END, understanding built-in variables, and mastering control structures form the foundation.

Furthermore, external integration and bash embedding enable automation at a practical level.

It is important to be mindful of error avoidance and performance, and to build up skills step by step.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands