Introduction
When performing text processing in Linux or UNIX-like environments, awk and sort are extremely useful commands.
awk handles column extraction and conditional filtering, sort handles ordering, and combining them with uniq enables duplicate removal and count aggregation.
This article explains everything from basic text processing with awk and sort to practical log analysis, organized in a way that beginners can easily understand.
Reference: GNU awk
Basic Syntax and Usage of awk and sort
Creating the File
cat << 'EOF' > input.txt
apple 300
orange 150
banana 200
grape 180
apple 120
EOF
Command
awk '{ print $1, $2 }' input.txt
Output
apple 300
orange 150
banana 200
grape 180
apple 120
Command
awk '{ sum += $2 } END { print "total =", sum }' input.txt
Output
total = 950
Command
sort input.txt
Output
apple 120
apple 300
banana 200
grape 180
orange 150
Command
sort -k2 -n input.txt
Output
apple 120
orange 150
grape 180
banana 200
apple 300
How It Works
| Command | Mechanism | Use Case |
|---|---|---|
| awk '{ print $1, $2 }' | Processes columns separated by whitespace | Column extraction |
| awk '{ sum += $2 } END { print sum }' | Adds up the second column and prints at END | Sum calculation |
| sort input.txt | Sorts in string order | Alphabetical sort |
| sort -k2 -n input.txt | Sorts the second column numerically | Numeric sort |
Explanation
awk is a powerful text processing command that can manipulate data column by column.
sort is commonly used to reorder log or CSV-style data by various conditions.
Differences Between awk, sort, and uniq
Creating the File
cat << 'EOF' > input.txt
apple
orange
banana
apple
banana
grape
orange
apple
EOF
Command
awk '{print $1}' input.txt
Output
apple
orange
banana
apple
banana
grape
orange
apple
Command
sort input.txt
Output
apple
apple
apple
banana
banana
grape
orange
orange
Command
sort input.txt | uniq
Output
apple
banana
grape
orange
How It Works
| Command | Role | Characteristics | Primary Use |
|---|---|---|---|
| awk | Text processing | Can process data column by column | Extraction and aggregation |
| sort | Sorting | Orders data ascending or descending | Data organization |
| uniq | Duplicate removal | Only removes consecutive duplicates | Duplicate checking |
Explanation
awk is well suited for handling text column by column, making it frequently used for log analysis and CSV processing.
Combining sort and uniq allows efficient deduplication of data.
Basic Patterns for Combining awk and sort
Creating the File
cat << 'EOF' > input.txt
orange 120
apple 80
banana 200
grape 150
melon 90
EOF
Command
awk '{print $2, $1}' input.txt | sort -n
Output
80 apple
90 melon
120 orange
150 grape
200 banana
Command
awk '{print $2, $1}' input.txt | sort -nr
Output
200 banana
150 grape
120 orange
90 melon
80 apple
How It Works
| Command | Role |
|---|---|
| awk '{print $2, $1}' | Moves the second column to the front to prepare data for sorting |
| sort -n | Sorts numerically in ascending order |
| sort -nr | Sorts numerically in descending order |
| | | Pipe that passes awk output to sort |
Explanation
The basic pattern is to reorder columns with awk and then sort with sort.
Using -n for numeric sort and combining it with -r for descending order is the standard approach.
How to Sort Numbers and Strings with awk and sort
Creating the File
cat << 'EOF' > input.txt
orange 15
apple 3
banana 27
grape 9
melon 21
EOF
Command
awk '{print $2, $1}' input.txt | sort -n
Output
3 apple
9 grape
15 orange
21 melon
27 banana
Command
awk '{print $1, $2}' input.txt | sort
Output
apple 3
banana 27
grape 9
melon 21
orange 15
How It Works
| Command | Mechanism |
|---|---|
| awk '{print $2, $1}' | Moves the numeric second column to the front for sorting |
| sort -n | Sorts numerically in ascending order |
| awk '{print $1, $2}' | Outputs the string first column as-is |
| sort | Sorts strings in alphabetical order |
Explanation
awk is convenient for swapping and extracting columns, and combining it with sort enables flexible sorting.
The basic rule is to use -n for numeric sort and no option for string sort.
How to Switch Between Ascending and Descending Order with awk and sort
Creating the File
cat << 'EOF' > input.txt
orange 30
apple 10
banana 20
grape 15
EOF
Command
awk '{print $2, $1}' input.txt | sort -n
Output
10 apple
15 grape
20 banana
30 orange
Command
awk '{print $2, $1}' input.txt | sort -nr
Output
30 orange
20 banana
15 grape
10 apple
How It Works
| Command | Role |
|---|---|
| awk '{print $2, $1}' | Moves the second column to the front to generate sort-ready data |
| sort -n | Sorts numerically in ascending order |
| sort -nr | Sorts numerically in descending order |
| | | Passes awk output to sort |
Explanation
sort -n sorts numbers in ascending order, and sort -nr reverses that to descending order.
Combining awk allows flexible sorting based on any column.
How to Sort by Column with awk and sort
Creating the File
cat << 'EOF' > input.txt
id name score
3 Suzuki 82
1 Tanaka 95
5 Sato 76
2 Yamada 88
4 Kobayashi 91
EOF
Command
awk 'NR==1{header=$0; next} {print}' input.txt | sort -k1,1n | awk 'BEGIN{print "id name score"} {print}'
Output
id name score
1 Tanaka 95
2 Yamada 88
3 Suzuki 82
4 Kobayashi 91
5 Sato 76
Command
awk 'NR==1{header=$0; next} {print}' input.txt | sort -k3,3nr | awk 'BEGIN{print "id name score"} {print}'
Output
id name score
1 Tanaka 95
4 Kobayashi 91
2 Yamada 88
3 Suzuki 82
5 Sato 76
How It Works
| Command | Mechanism |
|---|---|
| awk 'NR==1{header=$0; next} {print}' | Excludes the header row and outputs only the data portion |
| sort -k1,1n | Sorts the first column numerically in ascending order |
| sort -k3,3nr | Sorts the third column numerically in descending order |
| awk 'BEGIN{print "id name score"}' | Re-displays the header row after sorting |
Explanation
Column-based sorting is achieved by excluding the header with awk and then sorting the target column with sort.
The n flag in sort means numeric order, and r means reverse order.
How to Sort with a Custom Delimiter Using awk and sort
Creating the File
cat << 'EOF' > input.txt
orange:30
apple:10
banana:20
grape:15
EOF
Command
awk -F ':' '{print $1, $2}' input.txt
Output
orange 30
apple 10
banana 20
grape 15
Command
awk -F ':' '{print $1, $2}' input.txt | sort -k2 -n
Output
apple 10
grape 15
banana 20
orange 30
How It Works
| Command | Role |
|---|---|
| awk -F ':' | Specifies : as the delimiter |
| {print $1, $2} | Outputs the first and second columns |
| sort -k2 -n | Sorts the second column numerically in ascending order |
Explanation
The -F option in awk lets you specify a custom delimiter.
Passing that output to sort enables flexible sorting of any column.
How to Sort a CSV File by Key Using awk and sort
Creating the File
cat << 'EOF' > input.txt
id,name,score
3,Suzuki,82
1,Tanaka,95
2,Sato,88
4,Yamada,70
EOF
Command
awk -F',' 'NR==1{print;next} {print | "sort -t, -k1,1n"}' input.txt
Output
id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70
Command
awk -F',' 'NR==1{print;next} {print | "sort -t, -k3,3nr"}' input.txt
Output
id,name,score
1,Tanaka,95
2,Sato,88
3,Suzuki,82
4,Yamada,70
How It Works
| Command | Role |
|---|---|
| awk -F',' | Processes CSV using , as the delimiter |
| NR==1{print;next} | Prints the header row first |
| sort -t, | Sorts using , as the delimiter |
| -k1,1n | Sorts the first column numerically in ascending order |
| -k3,3nr | Sorts the third column numerically in descending order |
Explanation
awk preserves the header row while passing only the data portion to sort.
Changing the column number in the CSV is all it takes to sort by any key.
How to Remove Duplicate Lines with awk and sort
Creating the File
cat << 'EOF' > input.txt
apple
banana
apple
orange
banana
grape
EOF
Command
awk '!seen[$0]++' input.txt
Output
apple
banana
orange
grape
Command
sort -u input.txt
Output
apple
banana
grape
orange
How It Works
| Command | Mechanism |
|---|---|
| awk '!seen[$0]++' input.txt | Records each line in the seen array and outputs it only on the first occurrence |
| sort -u input.txt | Automatically removes duplicate lines after sorting |
Explanation
awk can remove duplicates while preserving the original input order.
sort -u performs sorting and duplicate removal simultaneously, making it convenient for processing large datasets.
How to Count Unique Occurrences with awk and sort
Creating the File
cat << 'EOF' > input.txt
apple red
banana yellow
apple red
orange orange
banana yellow
apple green
EOF
Command
awk '{print $1}' input.txt | sort | uniq -c
Output
3 apple
2 banana
1 orange
Command
awk '{print $1}' input.txt | sort | uniq | wc -l
Output
3
How It Works
| Command | Role |
|---|---|
| awk '{print $1}' | Extracts only the first column |
| sort | Groups identical values together to make counting easier |
| uniq -c | Counts duplicate occurrences |
| uniq | Removes duplicates |
| wc -l | Counts lines to display the unique count |
Explanation
Extracting the needed column with awk and sorting it with sort enables duplicate detection by uniq.
This combination is frequently used for shell-based data analysis such as log analysis and CSV aggregation.
How to Display Aggregated Results as a Ranking with awk and sort
Creating the File
cat << 'EOF' > input.txt
apple
orange
apple
banana
orange
apple
banana
grape
orange
apple
EOF
Command
awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt
Output
4 apple
2 banana
1 grape
3 orange
Command
awk '{count[$1]++} END {for (word in count) print count[word], word}' input.txt | sort -nr
Output
4 apple
3 orange
2 banana
1 grape
How It Works
| Command | Role |
|---|---|
| awk '{count[$1]++}' | Uses the first column string as a key to count occurrences |
| END {for (word in count) ...} | Outputs the aggregated results at the end |
| sort -nr | Sorts numerically in descending order |
| count[word] | Holds occurrence counts in an awk associative array |
Explanation
Aggregating data with awk and sorting it into a ranking format with sort makes log analysis and frequency surveys straightforward. In particular, combining sort -nr lets you quickly produce easy-to-read results ordered by count.
How to Filter and Sort Specific Rows with awk and sort
Creating the File
cat << 'EOF' > input.txt
id,name,score
101,Alice,82
102,Bob,67
103,Charlie,91
104,David,75
105,Eve,88
EOF
Command
awk -F',' 'NR > 1 && $3 >= 80' input.txt | sort -t',' -k3,3nr
Output
103,Charlie,91
105,Eve,88
101,Alice,82
How It Works
| Process | Details |
|---|---|
| awk | Excludes the header row and extracts matching rows |
| NR > 1 | Excludes the first row (id,name,score) |
| $3 >= 80 | Targets only rows where the third column (score) is 80 or above |
| sort | Sorts the extracted results |
| -t',' | Specifies comma as the delimiter |
| -k3,3nr | Sorts the third column numerically in descending order |
Explanation
Adding NR > 1 allows you to exclude the header row.
Filtering with awk and sorting with sort is the standard combination.
How to Analyze Logs with awk and sort
Creating the File
cat << 'EOF' > input.txt
2026-05-16 10:01:25 INFO userA login
2026-05-16 10:03:11 ERROR userB timeout
2026-05-16 10:04:09 INFO userC upload
2026-05-16 10:05:44 ERROR userA disconnect
2026-05-16 10:06:30 WARN userD retry
2026-05-16 10:07:12 ERROR userC timeout
EOF
Command
awk '$3=="ERROR" {print $4, $5}' input.txt
Output
userB timeout
userA disconnect
userC timeout
Command
awk '$3=="ERROR" {print $4}' input.txt | sort
Output
userA
userB
userC
Command
awk '{count[$3]++} END {for (level in count) print level, count[level]}' input.txt | sort
Output
ERROR 3
INFO 2
WARN 1
How It Works
| Command | Mechanism |
|---|---|
| awk '$3=="ERROR"' | Extracts only rows where the third column is ERROR |
| print $4 | Displays the username in the fourth column |
| sort | Sorts the results in ascending order |
| count[$3]++ | Aggregates the count for each log level |
| END {for ...} | Outputs the aggregated results at the end |
Explanation
awk can efficiently extract log data column by column, and combining it with sort makes aggregation and organization straightforward.
It operates with a small footprint and high speed even on large log volumes, making it a common tool in server operations.
How to Sort Date Data with awk and sort
Creating the File
cat << 'EOF' > input.txt
2024-12-01 Tokyo
2023-05-10 Osaka
2025-01-15 Nagoya
2022-08-20 Fukuoka
EOF
Command
awk '{print $1, $2}' input.txt | sort
Output
2022-08-20 Fukuoka
2023-05-10 Osaka
2024-12-01 Tokyo
2025-01-15 Nagoya
Command
awk '{print $1, $2}' input.txt | sort -r
Output
2025-01-15 Nagoya
2024-12-01 Tokyo
2023-05-10 Osaka
2022-08-20 Fukuoka
How It Works
| Command | Mechanism |
|---|---|
| awk '{print $1, $2}' input.txt | Extracts the date in the first column and the city name in the second column |
| sort | Sorts dates in ascending order |
| sort -r | Sorts dates in descending order |
Explanation
awk extracts only the needed columns, and sort arranges them in date order.
Dates in ISO format (YYYY-MM-DD) sort correctly even with a plain string sort.
How to Organize and Aggregate IP Addresses with awk and sort
Creating the File
cat << 'EOF' > input.txt
192.168.1.10 access
10.0.0.5 access
192.168.1.10 error
172.16.0.1 access
10.0.0.5 access
192.168.1.10 access
EOF
Command
awk '{print $1}' input.txt
Output
192.168.1.10
10.0.0.5
192.168.1.10
172.16.0.1
10.0.0.5
192.168.1.10
Command
awk '{print $1}' input.txt | sort
Output
10.0.0.5
10.0.0.5
172.16.0.1
192.168.1.10
192.168.1.10
192.168.1.10
Command
awk '{print $1}' input.txt | sort | uniq -c
Output
2 10.0.0.5
1 172.16.0.1
3 192.168.1.10
How It Works
| Command | Mechanism |
|---|---|
| awk '{print $1}' | Extracts only the IP address in the first column |
| sort | Sorts IP addresses in ascending order |
| uniq -c | Groups duplicates and counts occurrences |
Explanation
Extracting the needed column with awk and sorting it with sort enables aggregation via uniq -c.
This is a fundamental command combination widely used for log analysis and access counting.
Summary: Text Processing Fundamentals with awk and sort
awk and sort are representative commands for efficient text processing.
Simply learning the flow of extracting the needed data with awk and ordering it with sort will dramatically simplify CSV management and log analysis.
Start by trying awk and sort on small sample datasets and gradually build up to more practical usage patterns.
