copied to clipboard!
string awk

Mastering awk gsub: A Beginner’s Guide to String Replacement from Basics to Advanced

updated: 2026/05/05 created: 2026/04/30

Introduction

When you want to process text efficiently, awk is a very useful tool.
Among its features, gsub is an important function that allows flexible string replacement.
However, for beginners, the syntax and behavior can be a bit confusing, and things often don't work as expected.
This article focuses on awk's gsub, explaining everything from its basic structure to practical usage in a step-by-step manner.
It is organized so that you can understand it as a readable guide, with particular attention to the points where beginners tend to struggle.

Reference: GNU awk

Basic Structure of the gsub Function

Creating the File

cat << 'EOF' > input.txt apple orange apple grape EOF

Command

awk '{ gsub("apple", "banana"); print }' input.txt

Output

banana orange banana grape

Command

awk '{ count = gsub("apple", "banana"); print count, $0 }' input.txt

Output

2 banana orange banana grape

How It Works

ElementDescription
gsub(search, replace)Replaces strings targeting the entire line ($0)
Return valueReturns the number of replacements made
Default target$0 (entire line)
ScopeAll matching positions within the line

Explanation

By default, gsub is applied to the entire line $0.
Using the return value also allows you to check the number of replacements at the same time.

Choosing Between Global Replacement and First-Match Replacement

Creating the File

cat << 'EOF' > input.txt apple apple banana apple EOF

Command

awk '{ gsub("apple","orange"); print }' input.txt

Output

orange orange banana orange

Command

awk '{ sub("apple","orange"); print }' input.txt

Output

orange apple banana apple

How It Works

FunctionBehaviorReplacement CountUse Case
gsubReplaces all occurrences of the specified stringMultiple timesWhen you want to replace all occurrences at once
subReplaces only the first matching stringOnce onlyWhen you want to change only the first occurrence

Explanation

gsub performs a global replacement on the entire line, while sub replaces only the first matching occurrence.
Using them appropriately based on your needs prevents unintended over-replacement.

Replacement Using Pattern Matching with Regular Expressions

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 apple 300 grape 150 EOF

Command

awk '{ gsub(/apple/, "orange"); print }' input.txt

Output

orange 100
banana 200
orange 300
grape 150

Command

awk '{ gsub(/[0-9]+/, "&円"); print }' input.txt

Output

apple 100yen
banana 200yen
apple 300yen
grape 150yen

How It Works

ElementDescription
gsubReplaces all regex-matched parts in the entire string
/apple/The pattern to replace (regular expression)
"&"Reuses the matched string itself
printOutputs the line after replacement

Explanation

A key feature of awk's gsub is that it can replace all occurrences matching a regular expression at once.
By flexibly combining patterns and replacement strings, a wide variety of text transformations are possible.

Escaping Metacharacters ($, *, +, etc.) When Using Them as Replacement Targets

Creating the File

cat << 'EOF' > input.txt price=$100 pattern=*** math=a+b+c EOF

Command

awk '{ gsub(/\$/, "USD"); print }' input.txt

Output

price=USD100
pattern=***
math=a+b+c

Command

awk '{ gsub(/\*\*\*/, "###"); print }' input.txt

Output

price=$100
pattern=###
math=a+b+c

Command

awk '{ gsub(/\+/, "-"); print }' input.txt

Output

price=$100
pattern=***
math=a-b-c

How It Works

MetacharacterEscape MethodReason
$\$Interpreted as an end-of-line anchor
*\*Treated as a repetition of the preceding element
+\+Carries the meaning of one or more repetitions
\\\To represent the escape character itself

Explanation

Since awk's gsub uses regular expressions, metacharacters have special meanings as-is.
Therefore, you need to escape them with a backslash to treat them as literal characters.

How to Apply gsub to a Specific Column (Field) Only

Creating the File

cat << 'EOF' > input.txt id,name,price 1,apple,100yen 2,banana,200yen 3,orange,300yen EOF

Command

awk -F',' '{gsub(/yen/, "", $3); print}' OFS=',' input.txt

Output

id,name,price
1,apple,100
2,banana,200
3,orange,300

How It Works

ElementDescription
-F','Sets the field separator to a comma
$3Specifies the 3rd column (price column)
gsub(/yen/,"",$3)Removes "yen" within the 3rd column
OFS=','Uses comma as the output separator
printOutputs the entire line

Explanation

You can specify a particular column with $n and use gsub on that variable to perform partial replacement.

Implementing Conditional Branching Using gsub’s Return Value (Replacement Count)

Creating the File

cat << 'EOF' > input.txt apple orange apple banana grape apple melon banana orange EOF

Command

awk '{ count = gsub(/apple/, "APPLE") if (count > 0) { print "[Replaced:" count "] " $0 } else { print "[Not replaced] " $0 } }' input.txt

Output

[Replaced:2] APPLE orange APPLE banana
[Replaced:1] grape APPLE melon
[Not replaced] banana orange

How It Works

ElementDescription
gsub(/apple/, "APPLE")Replaces all occurrences of apple with APPLE
Return valueReturns the number of replacements made
count > 0Determines whether the line was replaced at least once
Conditional branchChanges output content based on whether replacement occurred

Explanation

Since gsub returns the number of replacements, you can use that value to determine whether a target string was present.
This allows flexible conditional branching to be implemented with awk alone.

Technique for Dynamically Passing Replacement Strings from External or Shell Variables

Creating the File

cat << 'EOF' > input.txt Hello NAME, welcome to PLACE. EOF

Command

name="Alice" place="Tokyo" awk -v name="$name" -v place="$place" '{ gsub(/NAME/, name); gsub(/PLACE/, place); print }' input.txt

Output

Hello Alice, welcome to Tokyo.

How It Works

ElementDescriptionRole
-v name=...Passes a shell variable to awkReceiving external variables
gsub(/A/, B)Replaces all occurrences of pattern A with BDynamic string replacement
/NAME/Regular expression for the replacement targetMatch condition
printOutputs the processed lineDisplaying results

Explanation

Using awk's -v option allows you to safely pass shell variables and dynamically replace strings with gsub.
This is well-suited for flexible text processing based on environment variables or arguments.

Tips for Optimizing gsub to Process Large Log Files Efficiently

Creating the File

cat << 'EOF' > input.txt ERROR: user_id=123 failed login INFO: user_id=456 success ERROR: user_id=789 failed login EOF

Command

awk '{ gsub(/ERROR/, "WARN"); print }' input.txt

Output

WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login

Command

awk 'BEGIN{pattern="ERROR"} { gsub(pattern, "WARN"); print }' input.txt

Output

WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login

Command

awk '{ if ($0 ~ /ERROR/) gsub(/ERROR/, "WARN"); print }' input.txt

Output

WARN: user_id=123 failed login
INFO: user_id=456 success
WARN: user_id=789 failed login

How It Works

Optimization PointDescriptionEffect
Storing fixed strings as variablesStore the pattern in a variable in BEGINReduces re-evaluation of regular expressions
Conditional gsubPre-check with if ($0 ~ /ERROR/)Avoids unnecessary replacement processing
Minimal processingExecute gsub only on lines that need itSpeeds up processing for large logs

Explanation

Since running gsub on every line is costly, it is important to reduce unnecessary evaluations through conditional branching and variable use.
This difference is especially significant when dealing with large volumes of log data.

Reconstruction Behavior of $0 (Entire Record) After Applying gsub

Creating the File

cat << 'EOF' > input.txt foo bar baz EOF

Command

awk '{ gsub(/foo/, "XXX"); print $0 }' input.txt

Output

XXX bar baz

Command

awk '{ gsub(/foo/, "XXX", $1); print $0 }' input.txt

Output

XXX bar baz

Command

awk '{ gsub(/foo/, "XXX", $1); print $1, $2, $3 }' input.txt

Output

XXX bar baz

How It Works

OperationTargetChange to $0 (Entire Record)Notes
gsub(/foo/, "XXX")$0Automatically reconstructedApplied directly to the whole line
gsub(/foo/, "XXX", $1)$1$0 is also reconstructedField change triggers reconstruction
print after rewriting $1$1Reconstructed $0 is outputFields are joined by OFS
print without rewriting $0-Original $0 is outputNo field change

Explanation

In awk, when you rewrite a field (such as $1), $0 is automatically reconstructed.
Not knowing this behavior can be a pitfall where "a partial change unexpectedly affects the whole line."

Key Points for Efficient Text Processing with awk and gsub

awk's gsub is not limited to simple string replacement — it is a powerful tool that can be combined with many features such as regular expressions, conditional branching, and integration with external variables.

By understanding the basic structure and grasping the difference between global and first-match replacement, you will be able to process text exactly as intended.

In addition, by mastering key points such as escaping, column targeting, and using return values, you will be able to write more practical scripts.

When handling large volumes of data, keep optimization in mind as well, and make the most of awk and gsub to achieve efficient text processing.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands