Introduction
When you want to streamline text processing, awk is an extremely powerful tool.
Among its features, the sub function is a fundamental and important capability for replacing part of a string.
However, for beginners, there are many points where it is easy to get stuck, such as how to use it, what the difference from gsub is, and how to handle regular expressions.
This article explains everything from the role and syntax of the sub function to practical usage in real-world scenarios.
Reference: GNU awk
Role and Syntax of the sub Function
Creating the File
cat << 'EOF' > input.txt
apple orange apple
banana apple grape
EOF
Command to Run
awk '{ sub("apple","APPLE"); print }' input.txt
Output
APPLE orange apple
banana APPLE grape
How It Works
| Item | Description |
|---|---|
| Function name | sub |
| Role | Replaces only the first matching string |
| Syntax | sub(regular expression, replacement string, target) |
| Default target | $0 (entire line) |
| Return value | 1 if replaced, 0 if not |
Explanation
sub is a function that replaces only the first matching part in each line.
If you want to replace all occurrences, use gsub.
Difference Between sub and gsub
Creating the File
cat << 'EOF' > input.txt
apple apple apple
banana apple banana
EOF
Command to Run
awk '{ sub(/apple/, "orange"); print }' input.txt
Output
orange apple apple
banana orange banana
Command to Run
awk '{ gsub(/apple/, "orange"); print }' input.txt
Output
orange orange orange
banana orange banana
How It Works
| Function | Number of replacements | Target | Return value | Characteristic |
|---|---|---|---|---|
| sub | Only once | First matched part | Number of replacements (0 or 1) | Replaces only the first match |
| gsub | All occurrences | All matches | Number of replacements (integer) | Replaces all at once |
Explanation
sub replaces only the first match, while gsub replaces all matches.
Use them appropriately depending on the situation.
Regular Expressions with the sub Function
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
apple 300
EOF
Command to Run
awk '{ sub(/apple/, "orange"); print }' input.txt
Output
orange 100
banana 200
orange 300
Command to Run
awk '{ sub(/[0-9]+/, "NUM"); print }' input.txt
Output
apple NUM
banana NUM
apple NUM
How It Works
| Element | Description |
|---|---|
| sub function | Replaces only the first matching part |
| First argument | Regular expression (e.g., /apple/, /[0-9]+/) |
| Second argument | Replacement string |
| Target | Processed against each line ($0) |
| Return value | 1 if replacement succeeds, 0 if it fails |
Explanation
The characteristic of awk's sub is that it replaces only the first occurrence.
Use gsub if you want to replace multiple occurrences.
How to Specify a Particular Column (Field) as the Replacement Target
Creating the File
cat << 'EOF' > input.txt
id name age city
1 Alice 25 Tokyo
2 Bob 30 Osaka
3 Carol 22 Nagoya
EOF
Command to Run
awk '{ sub(/2[0-9]/, "XX", $3); print }' input.txt
Output
id name age city
1 Alice XX Tokyo
2 Bob 30 Osaka
3 Carol XX Nagoya
Command to Run
awk '{ sub(/A.*/, "NAME", $2); print }' input.txt
Output
id name age city
1 NAME 25 Tokyo
2 Bob 30 Osaka
3 Carol 22 Nagoya
How It Works
| Element | Description |
|---|---|
| $1, $2, ... | Specifies a field (column) |
| sub(regex, replacement, target) | Replaces only within the specified field |
| $3 | Targets only the 3rd column (age) |
| $2 | Targets only the 2nd column (name) |
| Outputs the entire line (with changes applied) |
Explanation
By specifying a field as the third argument of awk's sub, you can safely replace only a specific column.
This allows pinpoint editing without affecting other columns.
Variable Assignment and Its Effect on the Original Data
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
apple 300
EOF
Command to Run
awk '{ tmp=$1; sub("apple","orange",tmp); print $1, tmp }' input.txt
Output
apple orange
banana banana
apple orange
Command to Run
awk '{ sub("apple","orange",$1); print $1 }' input.txt
Output
orange
banana
orange
How It Works
| Operation | Target | Effect on original data | Description |
|---|---|---|---|
| tmp=$1; sub(..., tmp) | Variable tmp | None | Replaces the copy, so the original is unchanged |
| sub(..., $1) | Field $1 | Yes | Overwrites the field itself |
Explanation
sub is a function that directly overwrites the specified variable or field.
Whether you apply it to a copied variable or not determines whether the original data is affected.
Deleting Strings by Replacing with an Empty String Using the sub Function
Creating the File
cat << 'EOF' > input.txt
apple 123
banana 456
cherry 789
EOF
Command to Run
awk '{ sub(/[0-9]+/, ""); print }' input.txt
Output
apple
banana
cherry
How It Works
| Element | Description |
|---|---|
| sub | Replaces the first matched pattern |
| /[0-9]+/ | Matches one or more digits |
| "" | Empty string (means deletion) |
| $0 | Entire line (applied automatically when omitted) |
| Outputs the line after replacement |
Explanation
By replacing the matched part with an empty string using the sub function, you can delete a specific string.
In this example, only the numeric part is deleted, leaving only the text.
How to Reuse the Matched String
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
apple 300
EOF
Command to Run
awk '{ if (sub(/apple/, "&_fruit")) print }' input.txt
Output
apple_fruit 100
apple_fruit 300
How It Works
| Element | Description |
|---|---|
| sub | Replaces the first matched string |
| /apple/ | Regular expression for the replacement target |
| "&_fruit" | & represents the matched string itself (here, apple) |
| if (sub(...)) | Processes only lines where replacement succeeded (return value is 1) |
| Outputs lines that matched the condition |
Explanation
The matched part can be reused by using &.
This allows flexible processing while retaining the original string.
Examples of the sub Function Useful for Log File Formatting and CSV Processing
Creating the File
cat << 'EOF' > input.txt
2026-04-30 ERROR UserID=1234 Message=LoginFailed
2026-04-30 INFO UserID=5678 Message=LoginSuccess
2026-04-30 ERROR UserID=9999 Message=Timeout
EOF
Command to Run
awk '{ sub(/ERROR/, "WARN"); print }' input.txt
Output
2026-04-30 WARN UserID=1234 Message=LoginFailed
2026-04-30 INFO UserID=5678 Message=LoginSuccess
2026-04-30 WARN UserID=9999 Message=Timeout
Command to Run
awk '{ sub(/UserID=[0-9]+/, "UserID=****"); print }' input.txt
Output
2026-04-30 ERROR UserID=**** Message=LoginFailed
2026-04-30 INFO UserID=**** Message=LoginSuccess
2026-04-30 ERROR UserID=**** Message=Timeout
Command to Run
awk '{ sub(/Message=/, "Msg="); print }' input.txt
Output
2026-04-30 ERROR UserID=1234 Msg=LoginFailed
2026-04-30 INFO UserID=5678 Msg=LoginSuccess
2026-04-30 ERROR UserID=9999 Msg=Timeout
How It Works
| Element | Description |
|---|---|
| sub function | Replaces only the first part that matches the specified regular expression |
| Syntax | sub(regular expression, replacement string, target) |
| Omitting target | $0 (entire line) becomes the target |
| Difference from gsub | sub replaces once; gsub replaces all |
| Return value | 1 if replacement succeeds, 0 if it fails |
Explanation
awk's sub is useful when you want to change only a part of log formatting or CSV processing.
The key is to use sub and gsub appropriately depending on whether you need full replacement.
Replacement Efficiency to Keep in Mind When Processing Large Volumes of Data
Creating the File
cat << 'EOF' > input.txt
apple banana apple grape apple
banana apple orange apple
EOF
Command to Run
awk '{ sub("apple", "APPLE"); print }' input.txt
Output
APPLE banana apple grape apple
banana APPLE orange apple
Command to Run
awk '{ gsub("apple", "APPLE"); print }' input.txt
Output
APPLE banana APPLE grape APPLE
banana APPLE orange APPLE
How It Works
| Item | sub | gsub |
|---|---|---|
| Number of replacements | Only the first per line | All occurrences per line |
| Performance | Fast (fewer replacements) | Slightly slower (replaces all) |
| Use case | When you want to change only the first match | When you want to change all matches |
| Unit of operation | Per line | Per line |
Explanation
With large volumes of data, avoiding unnecessary full replacements and keeping processing minimal with sub improves efficiency.
It is important to use sub and gsub appropriately depending on the use case.
Key Points to Avoid Getting Stuck with Escape Processing
Creating the File
cat << 'EOF' > input.txt
path=/home/user/docs
EOF
Command to Run
awk '{ sub(/\//, "_"); print }' input.txt
Output
path=_home/user/docs
Command to Run
awk '{ sub(/\//, "\\/"); print }' input.txt
Output
path=\/home/user/docs
How It Works
| Element | Description | Key point |
|---|---|---|
| /\// | Regular expression representing / | / is a delimiter and must be escaped |
| sub() | Replaces the first matching string | Replaces only once |
| "\\/" | Outputs \/ | Backslashes must be double-escaped |
| "_" | Simple replacement character | No escaping needed |
Explanation
In awk's sub, escaping is required in both the regular expression and the string, so it is important to note that backslashes tend to multiply.
Summary: Mastering the awk sub Function
The sub function is simple yet has great depth.
By understanding its characteristic of replacing only the first match and being aware of the difference from gsub, more precise text processing becomes possible.
By also grasping key points such as regular expressions, field specification, and escape processing, you will be able to handle data manipulation at a practical level.
To achieve efficient processing with awk, it is important to start by thoroughly understanding the sub function.
