copied to clipboard!
string awk

Mastering the awk substr Function: A Hands-on Guide

updated: 2026/05/05 created: 2026/05/02

Introduction

awk is a powerful tool specialized in text processing, widely used in log analysis and data formatting.

Among its features, substr is a fundamental and important function for extracting parts of strings.

However, for beginners, it can be a tricky point when it comes to understanding the meaning of arguments and how to combine it with other functions.

This article explains awk's substr from the basics to advanced usage, from a perspective useful in real-world practice.

Reference: GNU awk

Basic Syntax and Argument Definitions of awk’s substr Function

Creating the File

cat << 'EOF' > input.txt Hello,awk,substr,function EOF

Command

awk '{ print substr($0,1,5) }' input.txt

Output

Hello

Command

awk '{ print substr($0,7,3) }' input.txt

Output

awk

Command

awk '{ print substr($0,11) }' input.txt

Output

substr,function

How It Works

ElementDescription
substr(string, start, length)Basic syntax
stringThe target string (e.g., $0 represents the entire line)
startIndex starting from 1
lengthOptional (if omitted, extracts to the end)
Return valueThe substring within the specified range

Explanation

substr is a function that extracts a string starting from a specified position.
The start position is 1-based, and if the length is omitted, the string is extracted to the end.

Behavior When the Third Argument (Length) Is Omitted and Use Cases

Creating the File

cat << 'EOF' > input.txt HelloWorld AWKsubstrExample EOF

Command

awk '{print substr($0,6)}' input.txt

Output

World
bstrExample

Command

awk '{print substr($0,1)}' input.txt

Output

HelloWorld
AWKsubstrExample

How It Works

ItemDescription
Functionsubstr(string, start, length)
When third argument is omittedRetrieves everything from the start position to the end
Start position basisStarts from 1 (not 0)
Return valueSubstring from the specified position onward
Use casesLog analysis and string processing where you want to retrieve everything from a midpoint to the end

Explanation

Omitting the third argument allows you to retrieve everything from the start position to the end in one go, which is convenient for extracting the latter part of variable-length data.
It is especially useful for writing concise code when processing logs or strings with ambiguous delimiters.

Dynamically Identifying the Extraction Start Position by Combining with the index Function

Creating the File

cat << 'EOF' > input.txt apple:100 banana:200 cherry:300 EOF

Command

awk -F: '{ pos = index($0, ":"); print substr($0, pos+1) }' input.txt

Output

100
200
300

Command

awk -F: '{ pos = index($0, ":"); print substr($0, 1, pos-1) }' input.txt

Output

apple
banana
cherry

How It Works

ElementDescription
index($0, ":")Gets the position of ":" within the line
posThe reference position for extraction start
substr($0, pos+1)Extracts everything after ":" (the value part)
substr($0, 1, pos-1)Extracts everything before ":" (the key part)
$0Processes the entire line

Explanation

By dynamically obtaining the position with index, you can flexibly handle cases where the delimiter position changes.
Combining it with substr makes it easy to extract strings from any position.

Extracting Strings from the End Using the length Function Together

Creating the File

cat << 'EOF' > input.txt apple banana cherry EOF

Command

awk '{ print substr($0, length($0)-2, 3) }' input.txt

Output

ple
ana
rry

How It Works

ElementDescription
length($0)Gets the number of characters in the entire line
substr($0, start, count)Extracts a string from the specified position
length($0)-2Calculates the start position of the last 3 characters
$0Represents the entire line

Explanation

By getting the character count with length and calculating the start position from the end, extraction from the back becomes possible.
The key is to combine it with substr.

Filtering Lines with Specific Patterns by Combining if Statements with substr

Creating the File

cat << 'EOF' > input.txt apple_001 banana_002 apple_123 orange_999 apple_abc EOF

Command

awk '{ if (substr($0,1,5) == "apple" && substr($0,7,3) ~ /^[0-9]{3}$/) print }' input.txt

Output

apple_001
apple_123

How It Works

ElementDescription
substr($0,1,5)Gets the first 5 characters (checks for "apple")
substr($0,7,3)Gets 3 characters starting from position 7 (the numeric part)
~ /^[0-9]{3}$/Checks with a regex whether it is a 3-digit number
if conditionFilters for "starts with apple AND is a 3-digit number"
printOutputs only lines matching the condition

Explanation

By extracting character positions with substr and branching with if, you can efficiently extract only lines matching a specific pattern.
Flexible filtering is possible with awk alone.

Splitting a String Character by Character and Storing in an Array Using a for Loop

Creating the File

cat << 'EOF' > input.txt hello EOF

Command

awk '{ for(i=1;i<=length($0);i++){ arr[i]=substr($0,i,1) } for(i=1;i<=length($0);i++){ print arr[i] } }' input.txt

Output

h
e
l
l
o

How It Works

ProcessDescription
length($0)Gets the number of characters in the line
substr($0,i,1)Gets the i-th single character
arr[i]Stores one character at a time in the array
for(i=1;i<=length($0);i++)Processes sequentially from the beginning
print arr[i]Outputs the array contents in order

Explanation

Using awk's substr, you can decompose a string one character at a time.
Note that using for(i in arr) does not guarantee array order and may result in disordered output such as 2 3 4 5 1, so using an index-based for loop is the safer approach.

Efficient Use of substr for Parsing Fixed-Width Text Data

Creating the File

cat << 'EOF' > input.txt 00001Yamada Tokyo 030 00002Suzuki Osaka 045 00003Tanaka Nagoya 028 EOF

Command

awk '{id=substr($0,1,5); name=substr($0,6,9); city=substr($0,15,10); age=substr($0,25,3); printf "ID:%s Name:%s City:%s Age:%s\n", id, name, city, age}' input.txt

Output

ID:00001 Name:Yamada    City:Tokyo      Age:030
ID:00002 Name:Suzuki    City:Osaka      Age:045
ID:00003 Name:Tanaka    City:Nagoya     Age:028

Command

awk '{id=substr($0,1,5); name=substr($0,6,9); city=substr($0,15,10); age=substr($0,25,3); gsub(/^ +| +$/,"",name); gsub(/^ +| +$/,"",city); printf "%s,%s,%s,%s\n", id, name, city, age}' input.txt

Output

00001,Yamada,Tokyo,030
00002,Suzuki,Osaka,045
00003,Tanaka,Nagoya,028

How It Works

ItemDescription
substr($0,1,5)Gets 5 characters from position 1 (ID)
substr($0,6,9)Gets 9 characters from position 6 (name)
substr($0,15,10)Gets 10 characters from position 15 (city)
substr($0,25,3)Gets 3 characters from position 25 (age)
$0The entire line string
gsubTrims whitespace

Explanation

Fixed-width data has no delimiter characters, so awk substr with position-based extraction is very fast and simple.
The advantage is that parsing remains stable once the column positions are fixed in advance.

Criteria for Choosing Between Regex Replacement Functions (sub|gsub) and substr

Creating the File

cat << 'EOF' > input.txt apple 123 orange banana 456 grape cherry 789 melon EOF

Command

awk '{print substr($1,1,3)}' input.txt

Output

app
ban
che

Command

awk '{sub(/[0-9]+/,"NUM"); print}' input.txt

Output

apple NUM orange
banana NUM grape
cherry NUM melon

Command

awk '{gsub(/[aeiou]/,"_"); print}' input.txt

Output

_ppl_ 123 _r_ng_
b_n_n_ 456 gr_p_
ch_rry 789 m_l_n

How It Works

FeatureFunctionScopeCharacteristicDifference from awk substr
Partial extractionsubstrPosition-specifiedExtracts string from the specified positionDoes not replace
Single replacementsubFirst match onlyReplaces only the first matched portionPattern-based
Global replacementgsubAll matching locationsReplaces all matched portionsApplied repeatedly

Explanation

substr extracts by "position," while sub/gsub replaces by "regex pattern" — that is the deciding criterion.
Choose based on whether the purpose is "structure" or "pattern."

How to Extract Strings Between Specific Symbols

Creating the File

cat << 'EOF' > input.txt abc[hello]def 123[world]456 xxx[test123]yyy EOF

Command

awk '{ start = index($0, "[") + 1 end = index($0, "]") print substr($0, start, end - start) }' input.txt

Output

hello
world
test123

How It Works

ElementDescription
index($0, "[")Gets the position of [
index($0, "]")Gets the position of ]
startExtraction start position (the character after [)
end - startNumber of characters to extract
substr($0, start, length)Extracts the string within the specified range

Explanation

The start and end positions are obtained with index, and that range is extracted with substr.
This is a flexible extraction method that does not depend on the delimiter character.

A Practical Summary for Mastering awk and substr

awk's substr is a simple yet broadly applicable function.

By understanding the basic syntax and then learning how to omit the third argument and combine it with index and length, you will be able to handle production-level processing.

Furthermore, combining it with if statements and for loops enables even more flexible data manipulation.

It is especially powerful for processing fixed-width data and pattern extraction, and being mindful of when to use sub and gsub instead will improve both code readability and efficiency.

Properly understanding awk and substr, and building up from small tasks, is the fastest path to improving your skills.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands