Introduction
When you want to process text efficiently, awk is a very useful tool.
Among its features, gsub is an important function that allows flexible string replacement.
However, for beginners, the syntax and behavior can be a bit confusing, and things often don't work as expected.
This article focuses on awk's gsub, explaining everything from its basic structure to practical usage in a step-by-step manner.
It is organized so that you can understand it as a readable guide, with particular attention to the points where beginners tend to struggle.
Reference: GNU awk
Basic Structure of the gsub Function
Creating the File
cat << 'EOF' > input.txt
apple orange apple grape
EOF
Command
awk '{ gsub("apple", "banana"); print }' input.txt
Output
banana orange banana grape
Command
awk '{ count = gsub("apple", "banana"); print count, $0 }' input.txt
Output
2 banana orange banana grape
How It Works
| Element | Description |
|---|---|
| gsub(search, replace) | Replaces strings targeting the entire line ($0) |
| Return value | Returns the number of replacements made |
| Default target | $0 (entire line) |
| Scope | All matching positions within the line |
Explanation
By default, gsub is applied to the entire line $0.
Using the return value also allows you to check the number of replacements at the same time.
Choosing Between Global Replacement and First-Match Replacement
Creating the File
cat << 'EOF' > input.txt
apple apple banana apple
EOF
Command
awk '{ gsub("apple","orange"); print }' input.txt
Output
orange orange banana orange
Command
awk '{ sub("apple","orange"); print }' input.txt
Output
orange apple banana apple
How It Works
| Function | Behavior | Replacement Count | Use Case |
|---|---|---|---|
| gsub | Replaces all occurrences of the specified string | Multiple times | When you want to replace all occurrences at once |
| sub | Replaces only the first matching string | Once only | When you want to change only the first occurrence |
Explanation
gsub performs a global replacement on the entire line, while sub replaces only the first matching occurrence.
Using them appropriately based on your needs prevents unintended over-replacement.
Replacement Using Pattern Matching with Regular Expressions
Creating the File
cat << 'EOF' > input.txt
apple 100
banana 200
apple 300
grape 150
EOF
Command
awk '{ gsub(/apple/, "orange"); print }' input.txt
Output
orange 100
banana 200
orange 300
grape 150
Command
awk '{ gsub(/[0-9]+/, "&円"); print }' input.txt
Output
apple 100yen
banana 200yen
apple 300yen
grape 150yen
How It Works
| Element | Description |
|---|---|
| gsub | Replaces all regex-matched parts in the entire string |
| /apple/ | The pattern to replace (regular expression) |
| "&" | Reuses the matched string itself |
| Outputs the line after replacement |
Explanation
A key feature of awk's gsub is that it can replace all occurrences matching a regular expression at once.
By flexibly combining patterns and replacement strings, a wide variety of text transformations are possible.
Escaping Metacharacters ($, *, +, etc.) When Using Them as Replacement Targets
Creating the File
cat << 'EOF' > input.txt
price=$100
pattern=***
math=a+b+c
EOF
Command
awk '{ gsub(/\$/, "USD"); print }' input.txt
Output
price=USD100
pattern=***
math=a+b+c
Command
awk '{ gsub(/\*\*\*/, "###"); print }' input.txt
Output
price=$100
pattern=###
math=a+b+c
Command
awk '{ gsub(/\+/, "-"); print }' input.txt
Output
price=$100
pattern=***
math=a-b-c
How It Works
| Metacharacter | Escape Method | Reason |
|---|---|---|
| $ | \$ | Interpreted as an end-of-line anchor |
| * | \* | Treated as a repetition of the preceding element |
| + | \+ | Carries the meaning of one or more repetitions |
| \ | \\ | To represent the escape character itself |
Explanation
Since awk's gsub uses regular expressions, metacharacters have special meanings as-is.
Therefore, you need to escape them with a backslash to treat them as literal characters.
How to Apply gsub to a Specific Column (Field) Only
Creating the File
cat << 'EOF' > input.txt
id,name,price
1,apple,100yen
2,banana,200yen
3,orange,300yen
EOF
Command
awk -F',' '{gsub(/yen/, "", $3); print}' OFS=',' input.txt
Output
id,name,price
1,apple,100
2,banana,200
3,orange,300
How It Works
| Element | Description |
|---|---|
| -F',' | Sets the field separator to a comma |
| $3 | Specifies the 3rd column (price column) |
| gsub(/yen/,"",$3) | Removes "yen" within the 3rd column |
| OFS=',' | Uses comma as the output separator |
| Outputs the entire line |
Explanation
You can specify a particular column with $n and use gsub on that variable to perform partial replacement.
Implementing Conditional Branching Using gsub’s Return Value (Replacement Count)
Creating the File
cat << 'EOF' > input.txt
apple orange apple banana
grape apple melon
banana orange
EOF
Command
awk '{
count = gsub(/apple/, "APPLE")
if (count > 0) {
print "[Replaced:" count "] " $0
} else {
print "[Not replaced] " $0
}
}' input.txt
Output
[Replaced:2] APPLE orange APPLE banana
[Replaced:1] grape APPLE melon
[Not replaced] banana orange
How It Works
| Element | Description |
|---|---|
| gsub(/apple/, "APPLE") | Replaces all occurrences of apple with APPLE |
| Return value | Returns the number of replacements made |
| count > 0 | Determines whether the line was replaced at least once |
| Conditional branch | Changes output content based on whether replacement occurred |
Explanation
Since gsub returns the number of replacements, you can use that value to determine whether a target string was present.
This allows flexible conditional branching to be implemented with awk alone.
Technique for Dynamically Passing Replacement Strings from External or Shell Variables
Creating the File
cat << 'EOF' > input.txt
Hello NAME, welcome to PLACE.
EOF
Command
name="Alice"
place="Tokyo"
awk -v name="$name" -v place="$place" '{
gsub(/NAME/, name);
gsub(/PLACE/, place);
print
}' input.txt
Output
Hello Alice, welcome to Tokyo.
How It Works
| Element | Description | Role |
|---|---|---|
| -v name=... | Passes a shell variable to awk | Receiving external variables |
| gsub(/A/, B) | Replaces all occurrences of pattern A with B | Dynamic string replacement |
| /NAME/ | Regular expression for the replacement target | Match condition |
| Outputs the processed line | Displaying results |
Explanation
Using awk's -v option allows you to safely pass shell variables and dynamically replace strings with gsub.
This is well-suited for flexible text processing based on environment variables or arguments.
Tips for Optimizing gsub to Process Large Log Files Efficiently
Creating the File
cat << 'EOF' > input.txt
ERROR: user_id=123 failed login
INFO: user_id=456 success
ERROR: user_id=789 failed login
EOF
Command
awk '{ gsub(/ERROR/, "WARN"); print }' input.txt
Output
WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login
Command
awk 'BEGIN{pattern="ERROR"} { gsub(pattern, "WARN"); print }' input.txt
Output
WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login
Command
awk '{ if ($0 ~ /ERROR/) gsub(/ERROR/, "WARN"); print }' input.txt
Output
WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login
How It Works
| Optimization Point | Description | Effect |
|---|---|---|
| Storing fixed strings as variables | Store the pattern in a variable in BEGIN | Reduces re-evaluation of regular expressions |
| Conditional gsub | Pre-check with if ($0 ~ /ERROR/) | Avoids unnecessary replacement processing |
| Minimal processing | Execute gsub only on lines that need it | Speeds up processing for large logs |
Explanation
Since running gsub on every line is costly, it is important to reduce unnecessary evaluations through conditional branching and variable use.
This difference is especially significant when dealing with large volumes of log data.
Reconstruction Behavior of $0 (Entire Record) After Applying gsub
Creating the File
cat << 'EOF' > input.txt
foo bar baz
EOF
Command
awk '{ gsub(/foo/, "XXX"); print $0 }' input.txt
Output
XXX bar baz
Command
awk '{ gsub(/foo/, "XXX", $1); print $0 }' input.txt
Output
XXX bar baz
Command
awk '{ gsub(/foo/, "XXX", $1); print $1, $2, $3 }' input.txt
Output
XXX bar baz
How It Works
| Operation | Target | Change to $0 (Entire Record) | Notes |
|---|---|---|---|
| gsub(/foo/, "XXX") | $0 | Automatically reconstructed | Applied directly to the whole line |
| gsub(/foo/, "XXX", $1) | $1 | $0 is also reconstructed | Field change triggers reconstruction |
| print after rewriting $1 | $1 | Reconstructed $0 is output | Fields are joined by OFS |
| print without rewriting $0 | - | Original $0 is output | No field change |
Explanation
In awk, when you rewrite a field (such as $1), $0 is automatically reconstructed.
Not knowing this behavior can be a pitfall where "a partial change unexpectedly affects the whole line."
Key Points for Efficient Text Processing with awk and gsub
awk's gsub is not limited to simple string replacement — it is a powerful tool that can be combined with many features such as regular expressions, conditional branching, and integration with external variables.
By understanding the basic structure and grasping the difference between global and first-match replacement, you will be able to process text exactly as intended.
In addition, by mastering key points such as escaping, column targeting, and using return values, you will be able to write more practical scripts.
When handling large volumes of data, keep optimization in mind as well, and make the most of awk and gsub to achieve efficient text processing.
