copied to clipboard!
string awk

A Beginner’s Guide to awk and Regular Expressions

updated: 2026/05/05 created: 2026/05/02

Introduction

AWK is a powerful tool commonly used in text processing scenarios such as log analysis and CSV manipulation.

Among its features, understanding regular expressions is an unavoidable key point when it comes to mastering AWK.

However, for beginners, it is also an area where the abundance of symbols and differences in syntax can be confusing.

This article provides a thorough explanation of regular expressions in AWK, from the basics to practical usage, with careful attention to common stumbling points.

Reference: GNU awk

Basic Syntax and the Role of Metacharacters in AWK Regular Expressions

Creating the File

cat << 'EOF' > input.txt apple 123 banana 456 cherry_789 grape-001 EOF

Command

awk '/[a-z]+ [0-9]+/' input.txt

Output

apple 123
banana 456

Command

awk '/^cherry_[0-9]+$/' input.txt

Output

cherry_789

Command

awk '/-/' input.txt

Output

grape-001

How It Works

ElementExample SyntaxRoleMatch Example
Character class[a-z]One lowercase alphabetic characterapple
Repetition+One or more of the preceding pattern123, abc
Start of line^Matches the beginning of a linecherry_789
End of line$Matches the end of a linecherry_789
Wildcard.Any single charactera1, b-
Escape\_ or \-Treats a special character as a literalcherry_789, -

Explanation

In AWK, /regex/ filters lines, and metacharacters enable flexible pattern matching.
By combining start-of-line, end-of-line, and repetition, precise extraction is possible.

Conditional Extraction Using Pattern Matching Operators (~ and !~)

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 grape 150 pineapple 300 EOF

Command

awk '$1 ~ /apple/' input.txt

Output

apple 100
pineapple 300

Command

awk '$1 !~ /apple/' input.txt

Output

banana 200
grape 150

How It Works

OperatorMeaningCondition ExampleBehavior
~Matches the regular expression$1 ~ /apple/Extracts lines where the string contains "apple"
!~Does not match the regular expression$1 !~ /apple/Extracts lines that do not contain "apple"

Explanation

AWK's ~ and !~ enable flexible filtering using regular expressions.
Because conditions can be extracted based on partial matches, they are extremely powerful for text processing.

Filtering Specific Columns (Fields) Using Regular Expressions

Creating the File

cat << 'EOF' > input.txt id,name,age 1,Alice,23 2,Bob,17 3,Charlie,30 4,David,15 EOF

Command

awk -F',' '$3 ~ /^[2-9][0-9]$/ {print $0}' input.txt

Output

1,Alice,23
3,Charlie,30

How It Works

ElementContentDescription
-F','Field separator specificationProcesses the input as a comma-separated CSV
$3Third column (age)The column to filter on
~Regular expression matchEvaluates whether the condition is met
/^[2-9][0-9]$/Matches values from 20 to 99Extracts only rows where age is 20 or older
{print $0}Outputs the entire lineDisplays lines that match the condition

Explanation

By using AWK's regular expression match (~) on a specific column, flexible filtering is possible.
In this example, only rows where the age in the third column is 20 or greater are extracted.

Escaping Metacharacters and Handling Literal Characters

Creating the File

cat << 'EOF' > input.txt abc$def abc.def abc\def abc*def EOF

Command

awk '/abc\$def/' input.txt

Output

abc$def

Command

awk '/abc\.def/' input.txt

Output

abc.def

Command

awk '/abc\\def/' input.txt

Output

abc\def

Command

awk '/abc\*def/' input.txt

Output

abc*def

How It Works

MetacharacterMeaning (in regex)After EscapingMatch Target
$End of line\$A literal $
.Any single character\.A literal .
\Escape character\\A literal \
*Zero or more repetitions\*A literal *

Explanation

In AWK regular expressions, metacharacters have special meanings, so escaping them with \ treats them as literal characters.
The key point is to be aware of the double interpretation by the shell and AWK.

How to Configure Case-Insensitive Matching

Creating the File

cat << 'EOF' > input.txt Apple apple APPLE Banana EOF

Command

awk '/apple/' input.txt

Output

apple

Command

awk '{ if (tolower($0) ~ /apple/) print }' input.txt

Output

Apple
apple
APPLE

How It Works

ItemContent
Default behaviorAWK regular expressions are case-sensitive
WorkaroundConvert to lowercase with tolower() before comparing
Match conditionConvert $0 (entire line) and compare against the regular expression
Scope of effectCase-insensitive comparison only occurs within the conditional expression

Explanation

By converting the string before the regular expression comparison, you can match without regard to case.
This approach is less susceptible to environment differences and operates reliably.

Flexible Pattern Definition Using Variables

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 cherry 300 apple 150 banana 250 EOF

Command

pattern="apple|banana" awk -v pat="$pattern" '$1 ~ pat {print $0}' input.txt

Output

apple 100
banana 200
apple 150
banana 250

Command

min=150 awk -v m="$min" '$2 >= m {print $0}' input.txt

Output

banana 200
cherry 300
apple 150
banana 250

How It Works

ElementContentDescription
-v pat="$pattern"Variable passingPasses a shell variable into AWK
$1 ~ patRegular expression matchEvaluates whether the first column matches the pattern
pat="apple|banana"Dynamic regular expressionFlexibly defines an OR condition via a variable
$2 >= mNumeric conditionA conditional expression using another variable
-v m="$min"Dynamic conditionNumeric conditions can also be changed from outside

Explanation

In AWK, passing variables with -v allows regular expressions and conditions to be changed dynamically.
This makes flexible filtering possible without rewriting the script.

String Substitution by Combining the gsub Function with Regular Expressions

Creating the File

cat << 'EOF' > input.txt apple 123 banana 456 cherry 789 EOF

Command

awk '{ gsub(/[0-9]+/, "NUM"); print }' input.txt

Output

apple NUM
banana NUM
cherry NUM

Command

awk '{ gsub(/a/, "A"); print }' input.txt

Output

Apple 123
bAnAnA 456
cherry 789

How It Works

ElementContent
awkText processing tool
gsubGlobal substitution (replaces all matches)
/[0-9]+/Regular expression matching one or more digits
/a/Matches the character a
"NUM" / "A"The replacement string
printOutputs line by line

Explanation

gsub replaces all strings that match the regular expression.
Combined with AWK, it enables flexible string transformation on a per-line basis.

Techniques for Using Regular Expressions as Delimiters with the split Function

Creating the File

cat << 'EOF' > input.txt apple,orange;banana grape:melon dog|cat bird:fish EOF

Command

awk -F '[,;:| ]+' '{for(i=1;i<=NF;i++) print $i}' input.txt

Output

apple
orange
banana
grape
melon
dog
cat
bird
fish

How It Works

ElementContent
-FSpecifies the field separator
'[,;:| ]+'The delimiter characters
NFNumber of fields (number of elements after splitting)
$iThe i-th field (the split value)
for loopOutputs all fields in order

Explanation

In AWK, specifying a regular expression with -F allows multiple delimiters to be handled together.
The strength here is that split-equivalent processing can be written concisely.

Combining Logical Operators with Regular Expressions

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 orange 150 apple 300 grape 50 EOF

Command

awk '$1 ~ /apple|orange/ && $2 > 120' input.txt

Output

orange 150
apple 300

How It Works

ElementContent
$1 ~ /apple|orange/First column matches "apple" or "orange" (regular expression)
&&AND condition (both must be satisfied)
$2 > 120The numeric value in the second column is greater than 120
OverallOutputs only lines that satisfy all conditions

Explanation

In AWK, combining regular expressions with logical operators enables flexible conditional extraction.
The strength is being able to evaluate multiple conditions simultaneously.

Log Analysis and CSV Processing with Regular Expressions

Creating the File

cat << 'EOF' > input.txt 2026-05-01 10:00:00 INFO user=alice action=login 2026-05-01 10:05:23 ERROR user=bob action=upload 2026-05-01 10:10:45 INFO user=carol action=logout 2026-05-01 10:15:12 ERROR user=alice action=download EOF

Command

awk '/ERROR/' input.txt

Output

2026-05-01 10:05:23 ERROR user=bob action=upload
2026-05-01 10:15:12 ERROR user=alice action=download

Command

awk '{ for(i=1;i<=NF;i++){ if($i ~ /^user=/){ split($i,u,"=") } if($i ~ /^action=/){ split($i,a,"=") } } print u[2], a[2] }' input.txt

Output

alice login
bob upload
carol logout
alice download

Command

awk '{ user=""; action=""; for(i=1;i<=NF;i++){ if($i ~ /^user=/){ split($i,u,"="); user=u[2] } if($i ~ /^action=/){ split($i,a,"="); action=a[2] } } print $1","$2","$3","user","action }' input.txt

Output

2026-05-01,10:00:00,INFO,alice,login
2026-05-01,10:05:23,ERROR,bob,upload
2026-05-01,10:10:45,INFO,carol,logout
2026-05-01,10:15:12,ERROR,alice,download

How It Works

ElementContent
$i ~ /^user=/Regular expression match on a per-field basis
split()Splits key=value pairs
NFNumber of fields
$1,$2,$3Date, time, and log level
Loop processingA safe extraction method for BSD awk

Explanation

Because BSD awk has weak array capture support in match(), parsing with split() and a loop is the stable approach.
If portability is a priority, this style of writing is the safest choice.

Key Differences and Caveats in Regular Expression Specifications Between BSD AWK and GNU AWK (gawk)

Creating the File

cat << 'EOF' > input.txt apple 123 banana 456 cherry_789 EOF

Command

awk '/[0-9]+/' input.txt

Output

apple 123
banana 456
cherry_789

Command

gawk '/\w+_[0-9]+/' input.txt

Output

-bash: gawk: command not found

Command

awk '/\w+_[0-9]+/' input.txt

Output


How It Works

ItemBSD awkGNU awk (gawk)
\wNot supportedSupported (alphanumeric + _)
\dNot supportedSupported (digits)
POSIX character class [[:alnum:]]SupportedSupported
Extended regular expressionsBasic onlyExtended features available
CompatibilityHigh (closer to the standard)Rich extensions

Explanation

The BSD version is POSIX-compliant with limited features, whereas gawk supports convenient extended regular expressions.
If portability is important, using POSIX notation is the safer approach.

Summary: Understanding AWK and Regular Expressions

The combination of AWK and regular expressions may seem difficult at first, but once you grasp the basic concepts, the path forward becomes clear.

The key is to understand the meaning of each metacharacter one by one and to practice by actually writing and running commands.

Furthermore, by combining field specification and functions, you can go beyond simple text searching and achieve flexible data processing.

By staying mindful of environment differences and gradually building hands-on experience, AWK becomes a powerful weapon in your toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands