copied to clipboard!
string awk

Mastering the sub Function in awk: A Beginner’s Guide

updated: 2026/05/05 created: 2026/04/30

Introduction

When you want to streamline text processing, awk is an extremely powerful tool.
Among its features, the sub function is a fundamental and important capability for replacing part of a string.
However, for beginners, there are many points where it is easy to get stuck, such as how to use it, what the difference from gsub is, and how to handle regular expressions.
This article explains everything from the role and syntax of the sub function to practical usage in real-world scenarios.
Reference: GNU awk

Role and Syntax of the sub Function

Creating the File

cat << 'EOF' > input.txt apple orange apple banana apple grape EOF

Command to Run

awk '{ sub("apple","APPLE"); print }' input.txt

Output

APPLE orange apple
banana APPLE grape

How It Works

ItemDescription
Function namesub
RoleReplaces only the first matching string
Syntaxsub(regular expression, replacement string, target)
Default target$0 (entire line)
Return value1 if replaced, 0 if not

Explanation

sub is a function that replaces only the first matching part in each line.
If you want to replace all occurrences, use gsub.

Difference Between sub and gsub

Creating the File

cat << 'EOF' > input.txt apple apple apple banana apple banana EOF

Command to Run

awk '{ sub(/apple/, "orange"); print }' input.txt

Output

orange apple apple
banana orange banana

Command to Run

awk '{ gsub(/apple/, "orange"); print }' input.txt

Output

orange orange orange
banana orange banana

How It Works

FunctionNumber of replacementsTargetReturn valueCharacteristic
subOnly onceFirst matched partNumber of replacements (0 or 1)Replaces only the first match
gsubAll occurrencesAll matchesNumber of replacements (integer)Replaces all at once

Explanation

sub replaces only the first match, while gsub replaces all matches.
Use them appropriately depending on the situation.

Regular Expressions with the sub Function

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 apple 300 EOF

Command to Run

awk '{ sub(/apple/, "orange"); print }' input.txt

Output

orange 100
banana 200
orange 300

Command to Run

awk '{ sub(/[0-9]+/, "NUM"); print }' input.txt

Output

apple NUM
banana NUM
apple NUM

How It Works

ElementDescription
sub functionReplaces only the first matching part
First argumentRegular expression (e.g., /apple/, /[0-9]+/)
Second argumentReplacement string
TargetProcessed against each line ($0)
Return value1 if replacement succeeds, 0 if it fails

Explanation

The characteristic of awk's sub is that it replaces only the first occurrence.
Use gsub if you want to replace multiple occurrences.

How to Specify a Particular Column (Field) as the Replacement Target

Creating the File

cat << 'EOF' > input.txt id name age city 1 Alice 25 Tokyo 2 Bob 30 Osaka 3 Carol 22 Nagoya EOF

Command to Run

awk '{ sub(/2[0-9]/, "XX", $3); print }' input.txt

Output

id name age city
1 Alice XX Tokyo
2 Bob 30 Osaka
3 Carol XX Nagoya

Command to Run

awk '{ sub(/A.*/, "NAME", $2); print }' input.txt

Output

id name age city
1 NAME 25 Tokyo
2 Bob 30 Osaka
3 Carol 22 Nagoya

How It Works

ElementDescription
$1, $2, ...Specifies a field (column)
sub(regex, replacement, target)Replaces only within the specified field
$3Targets only the 3rd column (age)
$2Targets only the 2nd column (name)
printOutputs the entire line (with changes applied)

Explanation

By specifying a field as the third argument of awk's sub, you can safely replace only a specific column.
This allows pinpoint editing without affecting other columns.

Variable Assignment and Its Effect on the Original Data

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 apple 300 EOF

Command to Run

awk '{ tmp=$1; sub("apple","orange",tmp); print $1, tmp }' input.txt

Output

apple orange
banana banana
apple orange

Command to Run

awk '{ sub("apple","orange",$1); print $1 }' input.txt

Output

orange
banana
orange

How It Works

OperationTargetEffect on original dataDescription
tmp=$1; sub(..., tmp)Variable tmpNoneReplaces the copy, so the original is unchanged
sub(..., $1)Field $1YesOverwrites the field itself

Explanation

sub is a function that directly overwrites the specified variable or field.
Whether you apply it to a copied variable or not determines whether the original data is affected.

Deleting Strings by Replacing with an Empty String Using the sub Function

Creating the File

cat << 'EOF' > input.txt apple 123 banana 456 cherry 789 EOF

Command to Run

awk '{ sub(/[0-9]+/, ""); print }' input.txt

Output

apple 
banana 
cherry 

How It Works

ElementDescription
subReplaces the first matched pattern
/[0-9]+/Matches one or more digits
""Empty string (means deletion)
$0Entire line (applied automatically when omitted)
printOutputs the line after replacement

Explanation

By replacing the matched part with an empty string using the sub function, you can delete a specific string.
In this example, only the numeric part is deleted, leaving only the text.

How to Reuse the Matched String

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 apple 300 EOF

Command to Run

awk '{ if (sub(/apple/, "&_fruit")) print }' input.txt

Output

apple_fruit 100
apple_fruit 300

How It Works

ElementDescription
subReplaces the first matched string
/apple/Regular expression for the replacement target
"&_fruit"& represents the matched string itself (here, apple)
if (sub(...))Processes only lines where replacement succeeded (return value is 1)
printOutputs lines that matched the condition

Explanation

The matched part can be reused by using &.
This allows flexible processing while retaining the original string.

Examples of the sub Function Useful for Log File Formatting and CSV Processing

Creating the File

cat << 'EOF' > input.txt 2026-04-30 ERROR UserID=1234 Message=LoginFailed 2026-04-30 INFO UserID=5678 Message=LoginSuccess 2026-04-30 ERROR UserID=9999 Message=Timeout EOF

Command to Run

awk '{ sub(/ERROR/, "WARN"); print }' input.txt

Output

2026-04-30 WARN UserID=1234 Message=LoginFailed
2026-04-30 INFO UserID=5678 Message=LoginSuccess
2026-04-30 WARN UserID=9999 Message=Timeout

Command to Run

awk '{ sub(/UserID=[0-9]+/, "UserID=****"); print }' input.txt

Output

2026-04-30 ERROR UserID=**** Message=LoginFailed
2026-04-30 INFO UserID=**** Message=LoginSuccess
2026-04-30 ERROR UserID=**** Message=Timeout

Command to Run

awk '{ sub(/Message=/, "Msg="); print }' input.txt

Output

2026-04-30 ERROR UserID=1234 Msg=LoginFailed
2026-04-30 INFO UserID=5678 Msg=LoginSuccess
2026-04-30 ERROR UserID=9999 Msg=Timeout

How It Works

ElementDescription
sub functionReplaces only the first part that matches the specified regular expression
Syntaxsub(regular expression, replacement string, target)
Omitting target$0 (entire line) becomes the target
Difference from gsubsub replaces once; gsub replaces all
Return value1 if replacement succeeds, 0 if it fails

Explanation

awk's sub is useful when you want to change only a part of log formatting or CSV processing.
The key is to use sub and gsub appropriately depending on whether you need full replacement.

Replacement Efficiency to Keep in Mind When Processing Large Volumes of Data

Creating the File

cat << 'EOF' > input.txt apple banana apple grape apple banana apple orange apple EOF

Command to Run

awk '{ sub("apple", "APPLE"); print }' input.txt

Output

APPLE banana apple grape apple
banana APPLE orange apple

Command to Run

awk '{ gsub("apple", "APPLE"); print }' input.txt

Output

APPLE banana APPLE grape APPLE
banana APPLE orange APPLE

How It Works

Itemsubgsub
Number of replacementsOnly the first per lineAll occurrences per line
PerformanceFast (fewer replacements)Slightly slower (replaces all)
Use caseWhen you want to change only the first matchWhen you want to change all matches
Unit of operationPer linePer line

Explanation

With large volumes of data, avoiding unnecessary full replacements and keeping processing minimal with sub improves efficiency.
It is important to use sub and gsub appropriately depending on the use case.

Key Points to Avoid Getting Stuck with Escape Processing

Creating the File

cat << 'EOF' > input.txt path=/home/user/docs EOF

Command to Run

awk '{ sub(/\//, "_"); print }' input.txt

Output

path=_home/user/docs

Command to Run

awk '{ sub(/\//, "\\/"); print }' input.txt

Output

path=\/home/user/docs

How It Works

ElementDescriptionKey point
/\//Regular expression representing // is a delimiter and must be escaped
sub()Replaces the first matching stringReplaces only once
"\\/"Outputs \/Backslashes must be double-escaped
"_"Simple replacement characterNo escaping needed

Explanation

In awk's sub, escaping is required in both the regular expression and the string, so it is important to note that backslashes tend to multiply.

Summary: Mastering the awk sub Function

The sub function is simple yet has great depth.
By understanding its characteristic of replacing only the first match and being aware of the difference from gsub, more precise text processing becomes possible.
By also grasping key points such as regular expressions, field specification, and escape processing, you will be able to handle data manipulation at a practical level.
To achieve efficient processing with awk, it is important to start by thoroughly understanding the sub function.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands