Mastering awk and sort: A Beginner's Guide

Introduction

When performing text processing in Linux or UNIX-like environments, awk and sort are extremely useful commands.

awk handles column extraction and conditional filtering, sort handles ordering, and combining them with uniq enables duplicate removal and count aggregation.

This article explains everything from basic text processing with awk and sort to practical log analysis, organized in a way that beginners can easily understand.

Reference: GNU awk

Basic Syntax and Usage of awk and sort

Creating the File

cat << 'EOF' > input.txt
apple 300
orange 150
banana 200
grape 180
apple 120
EOF

Command

awk '{ print $1, $2 }' input.txt

Output

apple 300
orange 150
banana 200
grape 180
apple 120

Command

awk '{ sum += $2 } END { print "total =", sum }' input.txt

Output

total = 950

Command

sort input.txt

Output

apple 120
apple 300
banana 200
grape 180
orange 150

Command

sort -k2 -n input.txt

Output

apple 120
orange 150
grape 180
banana 200
apple 300

How It Works

Command	Mechanism	Use Case
awk '{ print $1, $2 }'	Processes columns separated by whitespace	Column extraction
awk '{ sum += $2 } END { print sum }'	Adds up the second column and prints at END	Sum calculation
sort input.txt	Sorts in string order	Alphabetical sort
sort -k2 -n input.txt	Sorts the second column numerically	Numeric sort

Explanation

awk is a powerful text processing command that can manipulate data column by column.

sort is commonly used to reorder log or CSV-style data by various conditions.

Differences Between awk, sort, and uniq

Creating the File

cat << 'EOF' > input.txt
apple
orange
banana
apple
banana
grape
orange
apple
EOF

Command

awk '{print $1}' input.txt

Output

apple
orange
banana
apple
banana
grape
orange
apple

Command

sort input.txt

Output

apple
apple
apple
banana
banana
grape
orange
orange

Command

sort input.txt | uniq

Output

apple
banana
grape
orange

How It Works

Command	Role	Characteristics	Primary Use
awk	Text processing	Can process data column by column	Extraction and aggregation
sort	Sorting	Orders data ascending or descending	Data organization
uniq	Duplicate removal	Only removes consecutive duplicates	Duplicate checking

Explanation

awk is well suited for handling text column by column, making it frequently used for log analysis and CSV processing.

Combining sort and uniq allows efficient deduplication of data.

Basic Patterns for Combining awk and sort

Creating the File

cat << 'EOF' > input.txt
orange 120
apple 80
banana 200
grape 150
melon 90
EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

80 apple
90 melon
120 orange
150 grape
200 banana

Command

awk '{print $2, $1}' input.txt | sort -nr

Output

200 banana
150 grape
120 orange
90 melon
80 apple

How It Works

Command	Role
awk '{print $2, $1}'	Moves the second column to the front to prepare data for sorting
sort -n	Sorts numerically in ascending order
sort -nr	Sorts numerically in descending order
\|	Pipe that passes awk output to sort

Explanation

The basic pattern is to reorder columns with awk and then sort with sort.
Using -n for numeric sort and combining it with -r for descending order is the standard approach.

How to Sort Numbers and Strings with awk and sort

Creating the File

cat << 'EOF' > input.txt
orange 15
apple 3
banana 27
grape 9
melon 21
EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

3 apple
9 grape
15 orange
21 melon
27 banana

Command

awk '{print $1, $2}' input.txt | sort

Output

apple 3
banana 27
grape 9
melon 21
orange 15

How It Works

Command	Mechanism
awk '{print $2, $1}'	Moves the numeric second column to the front for sorting
sort -n	Sorts numerically in ascending order
awk '{print $1, $2}'	Outputs the string first column as-is
sort	Sorts strings in alphabetical order

Explanation

awk is convenient for swapping and extracting columns, and combining it with sort enables flexible sorting.
The basic rule is to use -n for numeric sort and no option for string sort.

How to Switch Between Ascending and Descending Order with awk and sort

Creating the File

cat << 'EOF' > input.txt
orange 30
apple 10
banana 20
grape 15
EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

10 apple
15 grape
20 banana
30 orange

Command

awk '{print $2, $1}' input.txt | sort -nr

Output

30 orange
20 banana
15 grape
10 apple

How It Works

Command	Role
awk '{print $2, $1}'	Moves the second column to the front to generate sort-ready data
sort -n	Sorts numerically in ascending order
sort -nr	Sorts numerically in descending order
\|	Passes awk output to sort

Explanation

sort -n sorts numbers in ascending order, and sort -nr reverses that to descending order.
Combining awk allows flexible sorting based on any column.

How to Sort by Column with awk and sort

Creating the File

cat << 'EOF' > input.txt
id name score
3 Suzuki 82
1 Tanaka 95
5 Sato 76
2 Yamada 88
4 Kobayashi 91
EOF

Command

awk 'NR==1{header=$0; next} {print}' input.txt | sort -k1,1n | awk 'BEGIN{print "id name score"} {print}'

Output

id name score
1 Tanaka 95
2 Yamada 88
3 Suzuki 82
4 Kobayashi 91
5 Sato 76

Command

awk 'NR==1{header=$0; next} {print}' input.txt | sort -k3,3nr | awk 'BEGIN{print "id name score"} {print}'

Output

id name score
1 Tanaka 95
4 Kobayashi 91
2 Yamada 88
3 Suzuki 82
5 Sato 76

How It Works

Command	Mechanism
awk 'NR==1{header=$0; next} {print}'	Excludes the header row and outputs only the data portion
sort -k1,1n	Sorts the first column numerically in ascending order
sort -k3,3nr	Sorts the third column numerically in descending order
awk 'BEGIN{print "id name score"}'	Re-displays the header row after sorting

Explanation

Column-based sorting is achieved by excluding the header with awk and then sorting the target column with sort.
The n flag in sort means numeric order, and r means reverse order.

How to Sort with a Custom Delimiter Using awk and sort

Creating the File

cat << 'EOF' > input.txt
orange:30
apple:10
banana:20
grape:15
EOF

Command

awk -F ':' '{print $1, $2}' input.txt

Output

orange 30
apple 10
banana 20
grape 15

Command

awk -F ':' '{print $1, $2}' input.txt | sort -k2 -n

Output

apple 10
grape 15
banana 20
orange 30

How It Works

Command	Role
awk -F ':'	Specifies : as the delimiter
{print $1, $2}	Outputs the first and second columns
sort -k2 -n	Sorts the second column numerically in ascending order

Explanation

The -F option in awk lets you specify a custom delimiter.
Passing that output to sort enables flexible sorting of any column.

How to Sort a CSV File by Key Using awk and sort

Creating the File

cat << 'EOF' > input.txt
id,name,score
3,Suzuki,82
1,Tanaka,95
2,Sato,88
4,Yamada,70
EOF

Command

awk -F',' 'NR==1{print;next} {print | "sort -t, -k1,1n"}' input.txt

Output

id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70

Command

awk -F',' 'NR==1{print;next} {print | "sort -t, -k3,3nr"}' input.txt

Output

id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70

How It Works

Command	Role
awk -F','	Processes CSV using , as the delimiter
NR==1{print;next}	Prints the header row first
sort -t,	Sorts using , as the delimiter
-k1,1n	Sorts the first column numerically in ascending order
-k3,3nr	Sorts the third column numerically in descending order

Explanation

awk preserves the header row while passing only the data portion to sort.
Changing the column number in the CSV is all it takes to sort by any key.

How to Remove Duplicate Lines with awk and sort

Creating the File

cat << 'EOF' > input.txt
apple
banana
apple
orange
banana
grape
EOF

Command

awk '!seen[$0]++' input.txt

Output

apple
banana
orange
grape

Command

sort -u input.txt

Output

apple
banana
grape
orange

How It Works

Command	Mechanism
awk '!seen[$0]++' input.txt	Records each line in the seen array and outputs it only on the first occurrence
sort -u input.txt	Automatically removes duplicate lines after sorting

Explanation

awk can remove duplicates while preserving the original input order.

sort -u performs sorting and duplicate removal simultaneously, making it convenient for processing large datasets.

How to Count Unique Occurrences with awk and sort

Creating the File

cat << 'EOF' > input.txt
apple red
banana yellow
apple red
orange orange
banana yellow
apple green
EOF

Command

awk '{print $1}' input.txt | sort | uniq -c

Output

      3 apple
      2 banana
      1 orange

Command

awk '{print $1}' input.txt | sort | uniq | wc -l

Output

How It Works

Command	Role
awk '{print $1}'	Extracts only the first column
sort	Groups identical values together to make counting easier
uniq -c	Counts duplicate occurrences
uniq	Removes duplicates
wc -l	Counts lines to display the unique count

Explanation

Extracting the needed column with awk and sorting it with sort enables duplicate detection by uniq.
This combination is frequently used for shell-based data analysis such as log analysis and CSV aggregation.

How to Display Aggregated Results as a Ranking with awk and sort

Creating the File

cat << 'EOF' > input.txt
apple
orange
apple
banana
orange
apple
banana
grape
orange
apple
EOF

Command

awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt

Output

4 apple
2 banana
1 grape
3 orange

Command

awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt | sort -nr

Output

4 apple
3 orange
2 banana
1 grape

How It Works

Command	Role
awk '{count[$1]++}'	Uses the first column string as a key to count occurrences
END {for (word in count) ...}	Outputs the aggregated results at the end
sort -nr	Sorts numerically in descending order
count[word]	Holds occurrence counts in an awk associative array

Explanation

Aggregating data with awk and sorting it into a ranking format with sort makes log analysis and frequency surveys straightforward. In particular, combining sort -nr lets you quickly produce easy-to-read results ordered by count.

How to Filter and Sort Specific Rows with awk and sort

Creating the File

cat << 'EOF' > input.txt
id,name,score
101,Alice,82
102,Bob,67
103,Charlie,91
104,David,75
105,Eve,88
EOF

Command

awk -F',' 'NR > 1 && $3 >= 80' input.txt | sort -t',' -k3,3nr

Output

103,Charlie,91
105,Eve,88
101,Alice,82

How It Works

Process	Details
awk	Excludes the header row and extracts matching rows
NR > 1	Excludes the first row (id,name,score)
$3 >= 80	Targets only rows where the third column (score) is 80 or above
sort	Sorts the extracted results
-t','	Specifies comma as the delimiter
-k3,3nr	Sorts the third column numerically in descending order

Explanation

Adding NR > 1 allows you to exclude the header row.

Filtering with awk and sorting with sort is the standard combination.

How to Analyze Logs with awk and sort

Creating the File

cat << 'EOF' > input.txt
2026-05-16 10:01:25 INFO userA login
2026-05-16 10:03:11 ERROR userB timeout
2026-05-16 10:04:09 INFO userC upload
2026-05-16 10:05:44 ERROR userA disconnect
2026-05-16 10:06:30 WARN userD retry
2026-05-16 10:07:12 ERROR userC timeout
EOF

Command

awk '$3=="ERROR" {print $4, $5}' input.txt

Output

userB timeout
userA disconnect
userC timeout

Command

awk '$3=="ERROR" {print $4}' input.txt | sort

Output

userA
userB
userC

Command

awk '{count[$3]++} END {for (level in count) print level, count[level]}' input.txt | sort

Output

ERROR 3
INFO 2
WARN 1

How It Works

Command	Mechanism
awk '$3=="ERROR"'	Extracts only rows where the third column is ERROR
print $4	Displays the username in the fourth column
sort	Sorts the results in ascending order
count[$3]++	Aggregates the count for each log level
END {for ...}	Outputs the aggregated results at the end

Explanation

awk can efficiently extract log data column by column, and combining it with sort makes aggregation and organization straightforward.
It operates with a small footprint and high speed even on large log volumes, making it a common tool in server operations.

How to Sort Date Data with awk and sort

Creating the File

cat << 'EOF' > input.txt
2024-12-01 Tokyo
2023-05-10 Osaka
2025-01-15 Nagoya
2022-08-20 Fukuoka
EOF

Command

awk '{print $1, $2}' input.txt | sort

Output

2022-08-20 Fukuoka
2023-05-10 Osaka
2024-12-01 Tokyo
2025-01-15 Nagoya

Command

awk '{print $1, $2}' input.txt | sort -r

Output

2025-01-15 Nagoya
2024-12-01 Tokyo
2023-05-10 Osaka
2022-08-20 Fukuoka

How It Works

Command	Mechanism
awk '{print $1, $2}' input.txt	Extracts the date in the first column and the city name in the second column
sort	Sorts dates in ascending order
sort -r	Sorts dates in descending order

Explanation

awk extracts only the needed columns, and sort arranges them in date order.
Dates in ISO format (YYYY-MM-DD) sort correctly even with a plain string sort.

How to Organize and Aggregate IP Addresses with awk and sort

Creating the File

cat << 'EOF' > input.txt
192.168.1.10 access
10.0.0.5 access
192.168.1.10 error
172.16.0.1 access
10.0.0.5 access
192.168.1.10 access
EOF

Command

awk '{print $1}' input.txt

Output

192.168.1.10
10.0.0.5
192.168.1.10
172.16.0.1
10.0.0.5
192.168.1.10

Command

awk '{print $1}' input.txt | sort

Output

10.0.0.5
10.0.0.5
172.16.0.1
192.168.1.10
192.168.1.10
192.168.1.10

Command

awk '{print $1}' input.txt | sort | uniq -c

Output

      2 10.0.0.5
      1 172.16.0.1
      3 192.168.1.10

How It Works

Command	Mechanism
awk '{print $1}'	Extracts only the IP address in the first column
sort	Sorts IP addresses in ascending order
uniq -c	Groups duplicates and counts occurrences

Explanation

Extracting the needed column with awk and sorting it with sort enables aggregation via uniq -c.
This is a fundamental command combination widely used for log analysis and access counting.

Summary: Text Processing Fundamentals with awk and sort

awk and sort are representative commands for efficient text processing.
Simply learning the flow of extracting the needed data with awk and ordering it with sort will dramatically simplify CSV management and log analysis.
Start by trying awk and sort on small sample datasets and gradually build up to more practical usage patterns.