copied to clipboard!
string awk

Mastering awk and sort: A Beginner’s Guide

updated: 2026/05/16 created: 2026/05/16

Introduction

When performing text processing in Linux or UNIX-like environments, awk and sort are extremely useful commands.

awk handles column extraction and conditional filtering, sort handles ordering, and combining them with uniq enables duplicate removal and count aggregation.

This article explains everything from basic text processing with awk and sort to practical log analysis, organized in a way that beginners can easily understand.

Reference: GNU awk

Basic Syntax and Usage of awk and sort

Creating the File

cat << 'EOF' > input.txt apple 300 orange 150 banana 200 grape 180 apple 120 EOF

Command

awk '{ print $1, $2 }' input.txt

Output

apple 300
orange 150
banana 200
grape 180
apple 120

Command

awk '{ sum += $2 } END { print "total =", sum }' input.txt

Output

total = 950

Command

sort input.txt

Output

apple 120
apple 300
banana 200
grape 180
orange 150

Command

sort -k2 -n input.txt

Output

apple 120
orange 150
grape 180
banana 200
apple 300

How It Works

Command Mechanism Use Case
awk '{ print $1, $2 }' Processes columns separated by whitespace Column extraction
awk '{ sum += $2 } END { print sum }' Adds up the second column and prints at END Sum calculation
sort input.txt Sorts in string order Alphabetical sort
sort -k2 -n input.txt Sorts the second column numerically Numeric sort

Explanation

awk is a powerful text processing command that can manipulate data column by column.

sort is commonly used to reorder log or CSV-style data by various conditions.

Differences Between awk, sort, and uniq

Creating the File

cat << 'EOF' > input.txt apple orange banana apple banana grape orange apple EOF

Command

awk '{print $1}' input.txt

Output

apple
orange
banana
apple
banana
grape
orange
apple

Command

sort input.txt

Output

apple
apple
apple
banana
banana
grape
orange
orange

Command

sort input.txt | uniq

Output

apple
banana
grape
orange

How It Works

Command Role Characteristics Primary Use
awk Text processing Can process data column by column Extraction and aggregation
sort Sorting Orders data ascending or descending Data organization
uniq Duplicate removal Only removes consecutive duplicates Duplicate checking

Explanation

awk is well suited for handling text column by column, making it frequently used for log analysis and CSV processing.

Combining sort and uniq allows efficient deduplication of data.

Basic Patterns for Combining awk and sort

Creating the File

cat << 'EOF' > input.txt orange 120 apple 80 banana 200 grape 150 melon 90 EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

80 apple
90 melon
120 orange
150 grape
200 banana

Command

awk '{print $2, $1}' input.txt | sort -nr

Output

200 banana
150 grape
120 orange
90 melon
80 apple

How It Works

Command Role
awk '{print $2, $1}' Moves the second column to the front to prepare data for sorting
sort -n Sorts numerically in ascending order
sort -nr Sorts numerically in descending order
| Pipe that passes awk output to sort

Explanation

The basic pattern is to reorder columns with awk and then sort with sort.
Using -n for numeric sort and combining it with -r for descending order is the standard approach.

How to Sort Numbers and Strings with awk and sort

Creating the File

cat << 'EOF' > input.txt orange 15 apple 3 banana 27 grape 9 melon 21 EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

3 apple
9 grape
15 orange
21 melon
27 banana

Command

awk '{print $1, $2}' input.txt | sort

Output

apple 3
banana 27
grape 9
melon 21
orange 15

How It Works

Command Mechanism
awk '{print $2, $1}' Moves the numeric second column to the front for sorting
sort -n Sorts numerically in ascending order
awk '{print $1, $2}' Outputs the string first column as-is
sort Sorts strings in alphabetical order

Explanation

awk is convenient for swapping and extracting columns, and combining it with sort enables flexible sorting.
The basic rule is to use -n for numeric sort and no option for string sort.

How to Switch Between Ascending and Descending Order with awk and sort

Creating the File

cat << 'EOF' > input.txt orange 30 apple 10 banana 20 grape 15 EOF

Command

awk '{print $2, $1}' input.txt | sort -n

Output

10 apple
15 grape
20 banana
30 orange

Command

awk '{print $2, $1}' input.txt | sort -nr

Output

30 orange
20 banana
15 grape
10 apple

How It Works

Command Role
awk '{print $2, $1}' Moves the second column to the front to generate sort-ready data
sort -n Sorts numerically in ascending order
sort -nr Sorts numerically in descending order
| Passes awk output to sort

Explanation

sort -n sorts numbers in ascending order, and sort -nr reverses that to descending order.
Combining awk allows flexible sorting based on any column.

How to Sort by Column with awk and sort

Creating the File

cat << 'EOF' > input.txt id name score 3 Suzuki 82 1 Tanaka 95 5 Sato 76 2 Yamada 88 4 Kobayashi 91 EOF

Command

awk 'NR==1{header=$0; next} {print}' input.txt | sort -k1,1n | awk 'BEGIN{print "id name score"} {print}'

Output

id name score
1 Tanaka 95
2 Yamada 88
3 Suzuki 82
4 Kobayashi 91
5 Sato 76

Command

awk 'NR==1{header=$0; next} {print}' input.txt | sort -k3,3nr | awk 'BEGIN{print "id name score"} {print}'

Output

id name score
1 Tanaka 95
4 Kobayashi 91
2 Yamada 88
3 Suzuki 82
5 Sato 76

How It Works

Command Mechanism
awk 'NR==1{header=$0; next} {print}' Excludes the header row and outputs only the data portion
sort -k1,1n Sorts the first column numerically in ascending order
sort -k3,3nr Sorts the third column numerically in descending order
awk 'BEGIN{print "id name score"}' Re-displays the header row after sorting

Explanation

Column-based sorting is achieved by excluding the header with awk and then sorting the target column with sort.
The n flag in sort means numeric order, and r means reverse order.

How to Sort with a Custom Delimiter Using awk and sort

Creating the File

cat << 'EOF' > input.txt orange:30 apple:10 banana:20 grape:15 EOF

Command

awk -F ':' '{print $1, $2}' input.txt

Output

orange 30
apple 10
banana 20
grape 15

Command

awk -F ':' '{print $1, $2}' input.txt | sort -k2 -n

Output

apple 10
grape 15
banana 20
orange 30

How It Works

Command Role
awk -F ':' Specifies : as the delimiter
{print $1, $2} Outputs the first and second columns
sort -k2 -n Sorts the second column numerically in ascending order

Explanation

The -F option in awk lets you specify a custom delimiter.
Passing that output to sort enables flexible sorting of any column.

How to Sort a CSV File by Key Using awk and sort

Creating the File

cat << 'EOF' > input.txt id,name,score 3,Suzuki,82 1,Tanaka,95 2,Sato,88 4,Yamada,70 EOF

Command

awk -F',' 'NR==1{print;next} {print | "sort -t, -k1,1n"}' input.txt

Output

id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70

Command

awk -F',' 'NR==1{print;next} {print | "sort -t, -k3,3nr"}' input.txt

Output

id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70

How It Works

Command Role
awk -F',' Processes CSV using , as the delimiter
NR==1{print;next} Prints the header row first
sort -t, Sorts using , as the delimiter
-k1,1n Sorts the first column numerically in ascending order
-k3,3nr Sorts the third column numerically in descending order

Explanation

awk preserves the header row while passing only the data portion to sort.
Changing the column number in the CSV is all it takes to sort by any key.

How to Remove Duplicate Lines with awk and sort

Creating the File

cat << 'EOF' > input.txt apple banana apple orange banana grape EOF

Command

awk '!seen[$0]++' input.txt

Output

apple
banana
orange
grape

Command

sort -u input.txt

Output

apple
banana
grape
orange

How It Works

Command Mechanism
awk '!seen[$0]++' input.txt Records each line in the seen array and outputs it only on the first occurrence
sort -u input.txt Automatically removes duplicate lines after sorting

Explanation

awk can remove duplicates while preserving the original input order.

sort -u performs sorting and duplicate removal simultaneously, making it convenient for processing large datasets.

How to Count Unique Occurrences with awk and sort

Creating the File

cat << 'EOF' > input.txt apple red banana yellow apple red orange orange banana yellow apple green EOF

Command

awk '{print $1}' input.txt | sort | uniq -c

Output

      3 apple
      2 banana
      1 orange

Command

awk '{print $1}' input.txt | sort | uniq | wc -l

Output

3

How It Works

Command Role
awk '{print $1}' Extracts only the first column
sort Groups identical values together to make counting easier
uniq -c Counts duplicate occurrences
uniq Removes duplicates
wc -l Counts lines to display the unique count

Explanation

Extracting the needed column with awk and sorting it with sort enables duplicate detection by uniq.
This combination is frequently used for shell-based data analysis such as log analysis and CSV aggregation.

How to Display Aggregated Results as a Ranking with awk and sort

Creating the File

cat << 'EOF' > input.txt apple orange apple banana orange apple banana grape orange apple EOF

Command

awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt

Output

4 apple
2 banana
1 grape
3 orange

Command

awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt | sort -nr

Output

4 apple
3 orange
2 banana
1 grape

How It Works

Command Role
awk '{count[$1]++}' Uses the first column string as a key to count occurrences
END {for (word in count) ...} Outputs the aggregated results at the end
sort -nr Sorts numerically in descending order
count[word] Holds occurrence counts in an awk associative array

Explanation

Aggregating data with awk and sorting it into a ranking format with sort makes log analysis and frequency surveys straightforward. In particular, combining sort -nr lets you quickly produce easy-to-read results ordered by count.

How to Filter and Sort Specific Rows with awk and sort

Creating the File

cat << 'EOF' > input.txt id,name,score 101,Alice,82 102,Bob,67 103,Charlie,91 104,David,75 105,Eve,88 EOF

Command

awk -F',' 'NR > 1 && $3 >= 80' input.txt | sort -t',' -k3,3nr

Output

103,Charlie,91
105,Eve,88
101,Alice,82

How It Works

Process Details
awk Excludes the header row and extracts matching rows
NR > 1 Excludes the first row (id,name,score)
$3 >= 80 Targets only rows where the third column (score) is 80 or above
sort Sorts the extracted results
-t',' Specifies comma as the delimiter
-k3,3nr Sorts the third column numerically in descending order

Explanation

Adding NR > 1 allows you to exclude the header row.

Filtering with awk and sorting with sort is the standard combination.

How to Analyze Logs with awk and sort

Creating the File

cat << 'EOF' > input.txt 2026-05-16 10:01:25 INFO userA login 2026-05-16 10:03:11 ERROR userB timeout 2026-05-16 10:04:09 INFO userC upload 2026-05-16 10:05:44 ERROR userA disconnect 2026-05-16 10:06:30 WARN userD retry 2026-05-16 10:07:12 ERROR userC timeout EOF

Command

awk '$3=="ERROR" {print $4, $5}' input.txt

Output

userB timeout
userA disconnect
userC timeout

Command

awk '$3=="ERROR" {print $4}' input.txt | sort

Output

userA
userB
userC

Command

awk '{count[$3]++} END {for (level in count) print level, count[level]}' input.txt | sort

Output

ERROR 3
INFO 2
WARN 1

How It Works

Command Mechanism
awk '$3=="ERROR"' Extracts only rows where the third column is ERROR
print $4 Displays the username in the fourth column
sort Sorts the results in ascending order
count[$3]++ Aggregates the count for each log level
END {for ...} Outputs the aggregated results at the end

Explanation

awk can efficiently extract log data column by column, and combining it with sort makes aggregation and organization straightforward.
It operates with a small footprint and high speed even on large log volumes, making it a common tool in server operations.

How to Sort Date Data with awk and sort

Creating the File

cat << 'EOF' > input.txt 2024-12-01 Tokyo 2023-05-10 Osaka 2025-01-15 Nagoya 2022-08-20 Fukuoka EOF

Command

awk '{print $1, $2}' input.txt | sort

Output

2022-08-20 Fukuoka
2023-05-10 Osaka
2024-12-01 Tokyo
2025-01-15 Nagoya

Command

awk '{print $1, $2}' input.txt | sort -r

Output

2025-01-15 Nagoya
2024-12-01 Tokyo
2023-05-10 Osaka
2022-08-20 Fukuoka

How It Works

Command Mechanism
awk '{print $1, $2}' input.txt Extracts the date in the first column and the city name in the second column
sort Sorts dates in ascending order
sort -r Sorts dates in descending order

Explanation

awk extracts only the needed columns, and sort arranges them in date order.
Dates in ISO format (YYYY-MM-DD) sort correctly even with a plain string sort.

How to Organize and Aggregate IP Addresses with awk and sort

Creating the File

cat << 'EOF' > input.txt 192.168.1.10 access 10.0.0.5 access 192.168.1.10 error 172.16.0.1 access 10.0.0.5 access 192.168.1.10 access EOF

Command

awk '{print $1}' input.txt

Output

192.168.1.10
10.0.0.5
192.168.1.10
172.16.0.1
10.0.0.5
192.168.1.10

Command

awk '{print $1}' input.txt | sort

Output

10.0.0.5
10.0.0.5
172.16.0.1
192.168.1.10
192.168.1.10
192.168.1.10

Command

awk '{print $1}' input.txt | sort | uniq -c

Output

      2 10.0.0.5
      1 172.16.0.1
      3 192.168.1.10

How It Works

Command Mechanism
awk '{print $1}' Extracts only the IP address in the first column
sort Sorts IP addresses in ascending order
uniq -c Groups duplicates and counts occurrences

Explanation

Extracting the needed column with awk and sorting it with sort enables aggregation via uniq -c.
This is a fundamental command combination widely used for log analysis and access counting.

Summary: Text Processing Fundamentals with awk and sort

awk and sort are representative commands for efficient text processing.
Simply learning the flow of extracting the needed data with awk and ordering it with sort will dramatically simplify CSV management and log analysis.
Start by trying awk and sort on small sample datasets and gradually build up to more practical usage patterns.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands