Text Processing Basics with awk and sed

Introduction

If you want to streamline text processing in Linux or Unix environments, awk and sed are indispensable commands.

Comparing the basic syntax of awk and sed, sed excels at line-by-line editing, while awk is strong at processing by column or condition.

This article explains everything from the basics to practical usage of awk and sed in Linux and Unix environments in an easy-to-understand way.

Reference: GNU awk
Reference: GNU sed

Comparing the Basic Syntax of awk and sed

Create File

cat << 'EOF' > input.txt
apple 100
banana 200
orange 300
EOF

Command

awk '{ print $1, $2 }' input.txt

Output

apple 100
banana 200
orange 300

Command

sed 's/ / : /' input.txt

Output

apple : 100
banana : 200
orange : 300

How It Works

Item	awk	sed
Main Use	Column-based processing	Line-based substitution
Basic Syntax	awk '{ action }' file	sed 'operation' file
Data Processing	Excels at field manipulation	Excels at string conversion
Delimiter	Automatically recognizes whitespace	Processes with regular expressions
Typical Example	Column reference with $1, $2	Substitution with s/old/new/

Explanation

awk can handle data flexibly on a field-by-field basis, making it well-suited for CSV and log analysis.

sed can perform string substitutions quickly, so it is commonly used for editing configuration files and batch conversions.

How to Search and Extract Strings with awk and sed

Create File

cat << 'EOF' > input.txt
apple red
banana yellow
grape purple
apple green
orange orange
EOF

Command

awk '/apple/' input.txt

Output

apple red
apple green

Command

sed -n '/apple/p' input.txt

Output

apple red
apple green

Command

awk '{print $1}' input.txt

Output

apple
banana
grape
apple
orange

Command

sed -n '2p' input.txt

Output

banana yellow

How It Works

Command	How It Works
awk '/apple/' input.txt	Searches for and displays lines containing apple
sed -n '/apple/p' input.txt	Outputs only lines matching apple
awk '{print $1}' input.txt	Extracts and displays only the first column
sed -n '2p' input.txt	Displays only the second line

Explanation

awk excels at column-based extraction and conditional processing.
sed can simply perform line-based searching, substitution, and extraction.

How to Delete and Filter Specific Lines with awk and sed

Create File

cat << 'EOF' > input.txt
apple
banana
orange
grape
banana
melon
EOF

Command

sed '/banana/d' input.txt

Output

apple
orange
grape
melon

Command

awk '!/banana/' input.txt

Output

apple
orange
grape
melon

Command

sed '2d' input.txt

Output

apple
orange
grape
banana
melon

Command

awk 'NR!=2' input.txt

Output

apple
orange
grape
banana
melon

How It Works

Command	How It Works
sed '/banana/d'	Deletes lines matching banana
awk '!/banana/'	Displays only lines that do not match banana
sed '2d'	Deletes the second line
awk 'NR!=2'	Displays lines where the line number (NR) is not 2

Explanation

sed is a stream editor specialized for line editing, allowing deletion operations to be written concisely.

awk also excels at conditional branching and column processing, making it suitable for complex filtering.

How to Substitute Strings with awk and sed and When to Use Each

Create File

cat << 'EOF' > input.txt
apple orange banana
apple grape orange
banana apple melon
EOF

Command

sed 's/apple/APPLE/g' input.txt

Output

APPLE orange banana
APPLE grape orange
banana APPLE melon

Command

awk '{gsub(/orange/,"ORANGE"); print}' input.txt

Output

apple ORANGE banana
apple grape ORANGE
banana apple melon

How It Works

Command	Use Case	Feature	Best For
sed	String substitution	Excels at stream editing	Simple batch substitution
awk	Pattern processing	Supports column operations and conditional branching	Complex transformation and extraction

Explanation

sed can perform simple string substitutions quickly, making it suitable for log editing and similar tasks.

awk supports field-based processing and conditional transformations, making it powerful for data manipulation.

How to Use Pattern Matching with Regular Expressions in awk and sed

Create File

cat << 'EOF' > input.txt
apple 100
banana 200
grape 300
pineapple 400
orange 500
EOF

Command

awk '/apple/ {print $1, $2}' input.txt

Output

apple 100
pineapple 400

Command

sed -n '/apple/p' input.txt

Output

apple 100
pineapple 400

Command

awk '/^a/ {print $1}' input.txt

Output

apple

Command

sed -n '/^a/p' input.txt

Output

apple 100

How It Works

Command	How It Works	Description
awk '/apple/'	Line search with regular expression	Extracts lines containing apple
sed -n '/apple/p'	Display on pattern match	Outputs lines containing apple
awk '/^a/'	Start-of-line match with ^	Searches for strings beginning with a
sed -n '/^a/p'	Line-start match	Displays lines starting with a

Explanation

awk can process matched data on a column-by-column basis, while sed is suited for line-based substitution and extraction.
Both enable flexible text processing through the use of regular expressions.

How to Process with Multiple Conditions in awk and sed

Create File

cat << 'EOF' > input.txt
Alice 85 Tokyo
Bob 72 Osaka
Charlie 90 Tokyo
David 68 Nagoya
Eve 88 Osaka
Frank 95 Tokyo
EOF

Command

awk '$2 >= 80 && $3 == "Tokyo"' input.txt

Output

Alice 85 Tokyo
Charlie 90 Tokyo
Frank 95 Tokyo

Command

awk '$2 < 80 || $3 == "Osaka"' input.txt

Output

Bob 72 Osaka
David 68 Nagoya
Eve 88 Osaka

Command

sed -n '/Tokyo/p' input.txt | sed -n '/8[5-9]/p;/9[0-9]/p'

Output

Alice 85 Tokyo
Charlie 90 Tokyo
Frank 95 Tokyo

Command

sed -n '/Osaka/p;/Nagoya/p' input.txt

Output

Bob 72 Osaka
David 68 Nagoya
Eve 88 Osaka

How It Works

Command	Condition	How It Works
awk '$2 >= 80 && $3 == "Tokyo"'	AND condition	Extracts rows where column 2 is 80 or more AND column 3 is Tokyo
awk '$2 < 80 \|\| $3 == "Osaka"'	OR condition	Extracts rows where column 2 is less than 80 OR column 3 is Osaka
sed -n '/Tokyo/p' \| sed -n '/8[5-9]/p;/9[0-9]/p'	Multiple conditions	First stage extracts Tokyo rows; second stage extracts scores 85–99
sed -n '/Osaka/p;/Nagoya/p'	OR condition	Displays lines containing Osaka or Nagoya

Explanation

awk can evaluate conditions on a column-by-column basis, allowing numeric and string comparisons to be written concisely.

sed can achieve flexible text processing by using pipes to filter data step by step.

How to Process CSV and TSV with Custom Delimiters in awk and sed

Create File

cat << 'EOF' > input.txt
id,name,department,salary
1,Alice,Sales,5000
2,Bob,Engineering,7000
3,Charlie,HR,4500
EOF

Create File

cat << 'EOF' > input2.txt
id	name	department	salary
1	Alice	Sales	5000
2	Bob	Engineering	7000
3	Charlie	HR	4500
EOF

Command

awk -F',' '{print $2, $3}' input.txt

Output

name department
Alice Sales
Bob Engineering
Charlie HR

Command

awk -F'\t' '{print $2, $4}' input2.txt

Output

name salary
Alice 5000
Bob 7000
Charlie 4500

Command

sed 's/,/ - /g' input.txt

Output

id - name - department - salary
1 - Alice - Sales - 5000
2 - Bob - Engineering - 7000
3 - Charlie - HR - 4500

Command

sed 's/\t/ - /g' input2.txt

Output

id - name - department - salary
1 - Alice - Sales - 5000
2 - Bob - Engineering - 7000
3 - Charlie - HR - 4500

How It Works

Command	How It Works
awk -F',' '{print $2, $3}'	Sets the CSV delimiter to , with -F',' and outputs columns 2 and 3
awk -F'\t' '{print $2, $4}'	Sets the TSV delimiter to tab with -F'\t' and retrieves columns 2 and 4
sed 's/,/ - /g'	Uses s/search/replace/g to replace all , with -
sed 's/\t/ - /g'	Batch-converts the tab character \t to - and displays the result

Explanation

awk is strong at column-based extraction and aggregation, while sed is suited for string substitution.

With CSV and TSV files, specifying the delimiter appropriately enables flexible data processing.

How to Directly Edit and Overwrite Files with awk and sed

Create File

cat << 'EOF' > input.txt
apple 100
banana 200
orange 300
EOF

Command

awk '{ $2=$2*2; print }' input.txt > tmp.txt && mv tmp.txt input.txt
cat input.txt

Output

apple 200
banana 400
orange 600

Command

sed -i '' 's/200/999/' input.txt
cat input.txt

Output

apple 999
banana 400
orange 600

How It Works

Command	How It Works	Feature
awk	Outputs the processed result to a temporary file, then replaces the original with mv	Supports flexible column editing and calculations
sed -i ''	Rewrites the file directly using the -i option	String substitution can be performed concisely

Explanation

awk excels at field-based manipulation and calculation, while sed is suited for fast string substitution.
Both are standard commands frequently used in shell scripts.

How to Analyze Log Files by Combining awk and sed

Create File

cat << 'EOF' > input.txt
2026-05-08 10:00:01 INFO User login success
2026-05-08 10:01:15 ERROR Database connection failed
2026-05-08 10:02:20 INFO File uploaded
2026-05-08 10:03:45 WARN Disk usage 85%
2026-05-08 10:04:12 ERROR Timeout while processing request
EOF

Command

awk '/ERROR/' input.txt

Output

2026-05-08 10:01:15 ERROR Database connection failed
2026-05-08 10:04:12 ERROR Timeout while processing request

Command

sed 's/ERROR/[CRITICAL]/' input.txt

Output

2026-05-08 10:00:01 INFO User login success
2026-05-08 10:01:15 [CRITICAL] Database connection failed
2026-05-08 10:02:20 INFO File uploaded
2026-05-08 10:03:45 WARN Disk usage 85%
2026-05-08 10:04:12 [CRITICAL] Timeout while processing request

Command

awk '/ERROR/' input.txt | sed 's/ERROR/[CRITICAL]/'

Output

2026-05-08 10:01:15 [CRITICAL] Database connection failed
2026-05-08 10:04:12 [CRITICAL] Timeout while processing request

How It Works

Command	How It Works
awk '/ERROR/' input.txt	Extracts only lines containing ERROR
sed 's/ERROR/[CRITICAL]/' input.txt	Substitutes the string ERROR
awk … \| sed …	Further processes the extracted result

Explanation

awk is strong at log extraction and conditional searching, while sed is convenient for string transformation.
Combining them allows log analysis and formatting to be automated efficiently.

Practical Text Processing Using Pipes with awk and sed

Create File

cat << 'EOF' > input.txt
2026-05-01 INFO user=alice action=login
2026-05-01 ERROR user=bob action=failed_login
2026-05-02 INFO user=carol action=upload
2026-05-02 ERROR user=dave action=timeout
2026-05-03 INFO user=alice action=logout
EOF

Command

cat input.txt | awk '$2=="ERROR"' | sed 's/action=/status=/'

Output

2026-05-01 ERROR user=bob status=failed_login
2026-05-02 ERROR user=dave status=timeout

Command

cat input.txt | awk '{print $3}' | sed 's/user=//'

Output

alice
bob
carol
dave
alice

Command

cat input.txt | awk '$2=="INFO" {print $1, $4}' | sed 's/action=//'

Output

2026-05-01 login
2026-05-02 upload
2026-05-03 logout

How It Works

Command	awk's Role	sed's Role	Role of Pipe (\|)
awk '$2=="ERROR"'	Extracts ERROR lines	Replaces action= with status=	Passes awk's output to sed
awk '{print $3}'	Extracts column 3 (user info)	Removes user=	Links text formatting
awk '$2=="INFO" {print $1, $4}'	Extracts date and action columns	Removes action=	Processes the extracted result

Explanation

awk excels at column-based extraction and conditional branching, while sed is suited for string substitution and formatting.
Combining them with pipes allows log analysis and CSV processing to be performed efficiently.

How to Batch Process Multiple Files with awk and sed

Create File

cat << 'EOF' > file1.txt
apple 100
banana 200
orange 300
EOF

Create File

cat << 'EOF' > file2.txt
apple 150
banana 250
orange 350
EOF

Create File

cat << 'EOF' > file3.txt
apple 180
banana 280
orange 380
EOF

Command

awk '{sum += $2} END {print FILENAME " Total=" sum}' file*.txt

Output

file3.txt Total=2190

Command

awk '{print FILENAME, $1, $2 * 1.1}' file*.txt

Output

file1.txt apple 110
file1.txt banana 220
file1.txt orange 330
file2.txt apple 165
file2.txt banana 275
file2.txt orange 385
file3.txt apple 198
file3.txt banana 308
file3.txt orange 418

Command

sed 's/apple/grape/g' file*.txt

Output

grape 100
banana 200
orange 300
grape 150
banana 250
orange 350
grape 180
banana 280
orange 380

Command

sed -n '/banana/p' file*.txt

Output

banana 200
banana 250
banana 280

How It Works

Command	How It Works
awk '{sum += $2} END {print FILENAME " Total=" sum}' file*.txt	Adds up column 2 across all files and displays the total at the end
awk '{print FILENAME, $1, $2 * 1.1}' file*.txt	Outputs values with the filename after applying a calculation
sed 's/apple/grape/g' file*.txt	Substitutes content across multiple files and displays to standard output
sed -n '/banana/p' file*.txt	Extracts only lines matching the condition

Explanation

awk is strong at column-based aggregation and calculation, while sed is suited for string substitution and line extraction.
Using the wildcard file*.txt allows multiple files to be processed in one batch.

Built-in Variables and Address Specification in awk and sed

Create File

cat << 'EOF' > input.txt
apple 100
orange 200
banana 300
grape 400
melon 500
EOF

Command

awk 'BEGIN{FS=" "} {print NR ":" $1 "," $2} END{print "total=" NR}' input.txt

Output

1:apple,100
2:orange,200
3:banana,300
4:grape,400
5:melon,500
total=5

Command

awk '$2 >= 300 {print FNR ":" $1}' input.txt

Output

3:banana
4:grape
5:melon

Command

sed -n '2,4p' input.txt

Output

orange 200
banana 300
grape 400

Command

sed -n '/banana/,/melon/p' input.txt

Output

banana 300
grape 400
melon 500

How It Works

Command	Built-in Variables / Address Specification	How It Works
awk 'BEGIN{FS=" "} ...'	FS, NR, END	FS sets the delimiter, NR retrieves the line number, and END runs aggregation at the end
awk '$2 >= 300 ...'	FNR, conditional expression	Extracts only rows where column 2 is 300 or more and displays the in-file line number with FNR
sed -n '2,4p'	Line address	Outputs only lines 2 through 4 by range specification
sed -n '/banana/,/melon/p'	Pattern address	Outputs the range matching from banana to melon

Explanation

awk's built-in variables allow line numbers and delimiters to be handled flexibly.

sed can efficiently extract target ranges using line number or string pattern address specifications.

Shell Script Automation Techniques Using awk and sed

Create File

cat << 'EOF' > input.txt
2026-05-01,Tanaka,Sales,120000
2026-05-02,Suzuki,Engineering,180000
2026-05-03,Sato,Sales,150000
2026-05-04,Takahashi,Marketing,130000
2026-05-05,Yamada,Engineering,210000
EOF

Command

awk -F',' '$3=="Engineering"{print $2 " : " $4}' input.txt

Output

Suzuki : 180000
Yamada : 210000

Command

sed 's/Sales/Business/g' input.txt

Output

2026-05-01,Tanaka,Business,120000
2026-05-02,Suzuki,Engineering,180000
2026-05-03,Sato,Business,150000
2026-05-04,Takahashi,Marketing,130000
2026-05-05,Yamada,Engineering,210000

Command

awk -F',' '{sum[$3]+=$4} END {for (d in sum) print d, sum[d]}' input.txt

Output

Marketing 130000
Sales 270000
Engineering 390000

Command

sed -n '2,4p' input.txt

Output

2026-05-02,Suzuki,Engineering,180000
2026-05-03,Sato,Sales,150000
2026-05-04,Takahashi,Marketing,130000

How It Works

Command	How It Works	Use Case
awk -F','	Splits columns by comma delimiter	CSV data analysis
$3=="Engineering"	Extracts only rows matching column 3	Conditional filter
sum[$3]+=$4	Aggregates amounts by department	Aggregation processing
sed 's/old/new/g'	Batch replaces strings	Data conversion
sed -n '2,4p'	Displays only the specified lines	Log extraction

Explanation

awk excels at column-based analysis and aggregation, while sed can process string transformations and line editing quickly.

Combining both allows log analysis and automated CSV processing via shell scripts to be made more efficient.

Techniques for Data Aggregation and Formatting with awk and sed

Create File

cat << 'EOF' > input.txt
2026-05-01,Tokyo,Alice,120
2026-05-01,Osaka,Bob,95
2026-05-02,Tokyo,Charlie,140
2026-05-02,Osaka,Alice,110
2026-05-03,Nagoya,Bob,130
2026-05-03,Tokyo,Alice,150
EOF

Command

awk -F',' '{sum[$2]+=$4} END {for (city in sum) print city, sum[city]}' input.txt

Output

Tokyo 410
Osaka 205
Nagoya 130

Command

awk -F',' '$4 >= 120 {print $1, $2, $3, $4}' input.txt

Output

2026-05-01 Tokyo Alice 120
2026-05-02 Tokyo Charlie 140
2026-05-03 Nagoya Bob 130
2026-05-03 Tokyo Alice 150

Command

sed 's/Tokyo/TOKYO/g' input.txt

Output

2026-05-01,TOKYO,Alice,120
2026-05-01,Osaka,Bob,95
2026-05-02,TOKYO,Charlie,140
2026-05-02,Osaka,Alice,110
2026-05-03,Nagoya,Bob,130
2026-05-03,TOKYO,Alice,150

Command

sed -n '2,4p' input.txt

Output

2026-05-01,Osaka,Bob,95
2026-05-02,Tokyo,Charlie,140
2026-05-02,Osaka,Alice,110

How It Works

Command	How It Works
awk -F',' '{sum[$2]+=$4}'	Uses column 2 as a key and totals the numeric values in column 4
awk '$4 >= 120'	Extracts only rows matching the condition
sed 's/Tokyo/TOKYO/g'	Batch replaces strings
sed -n '2,4p'	Displays only the specified line range

Explanation

awk excels at column-based aggregation and conditional extraction, making it useful for log analysis and CSV processing.

sed can perform text substitution and line editing quickly, making it convenient for pre-processing and formatting.

Saving awk and sed Commands as Script Files

Create File

cat << 'EOF' > input.txt
orange 100
apple 200
banana 150
grape 300
EOF

Create File

cat << 'EOF' > sample.awk
{
  print $1 ":" $2
}
EOF

Create File

cat << 'EOF' > sample.sed
s/banana/melon/g
EOF

Command

awk -f sample.awk input.txt

Output

orange:100
apple:200
banana:150
grape:300

Command

sed -f sample.sed input.txt

Output

orange 100
apple 200
melon 150
grape 300

How It Works

Command	How It Works
awk -f sample.awk input.txt	Reads the awk script file with -f and processes each line on a field-by-field basis
sed -f sample.sed input.txt	Reads the sed script file with -f and automatically applies the substitution

Explanation

awk excels at field-based manipulation, making it well-suited for formatting column data.

sed can perform string substitution and line editing concisely, making it useful for log processing and similar tasks.

Key Learning Points for Mastering awk and sed

awk and sed are representative commands for streamlining text processing.

Beginners should first get comfortable with the basic syntax and regular expressions, then move on to CSV processing and log analysis.

In particular, combining these tools with pipes and shell scripts greatly expands the scope of automation.

It is important to learn practically by repeatedly trying things out with small samples.