Introduction
awk is a powerful command specialized in text processing, frequently used in log analysis and data formatting.
Understanding the concepts of fields and separators is the first hurdle to mastering awk.
There are many points where beginners tend to get stuck, so it is important to learn while imagining the actual behavior.
This article explains awk from the basics to advanced usage in a step-by-step manner.
Reference: GNU awk
Behavior and Notes on Default Delimiters (Spaces and Tabs)
Creating the File
cat << 'EOF' > input.txt
apple orange banana
cat dog mouse
one two three
EOF
※ To input a tab, press Ctrl+v and then press the Tab key.
Command
awk '{print $1, $2, $3}' input.txt
Output
apple orange banana
cat dog mouse
one two three
Command
awk '{print NF}' input.txt
Output
3
3
3
Command
awk '{print "["$2"]"}' input.txt
Output
[orange]
[dog]
[two]
How It Works
| Item | Description |
|---|---|
| Default delimiter | Spaces and tabs |
| Handling of consecutive delimiters | Consecutive spaces are treated as a single delimiter |
| Field variables | $1, $2, $3 ... |
| Number of fields | Obtainable with NF |
| Specifying delimiter | Can be changed with the -F option |
Explanation
The default delimiter in awk is spaces and tabs, and it is important to note that consecutive spaces are treated as one. Therefore, fields are correctly split regardless of the number of visible spaces.
Specifying Field Delimiters Using the -F Option
Creating the File
cat << 'EOF' > input.txt
name,age,city
Alice,25,Tokyo
Bob,30,Osaka
Charlie,35,Nagoya
EOF
Command
awk -F ',' '{print $1, $2}' input.txt
Output
name age
Alice 25
Bob 30
Charlie 35
Command
awk -F ',' 'NR > 1 {print $3}' input.txt
Output
Tokyo
Osaka
Nagoya
How It Works
| Element | Description |
|---|---|
| awk | A command that processes text by rows and columns |
| -F ',' | Specifies comma as the field delimiter |
| $1, $2, $3 | Represent the 1st, 2nd, and 3rd columns respectively |
| NR > 1 | Condition to exclude the first row (header) |
| {print ...} | Process to output the specified fields |
Explanation
Using the -F option allows you to flexibly change the delimiter. This is an essential basic feature especially for processing CSV-like data.
Defining Delimiters Within a Script Using the Built-in Variable FS
Creating the File
cat << 'EOF' > input.txt
apple,banana,orange
dog,cat,bird
EOF
Command
awk 'BEGIN { FS="," } { print $1, $2 }' input.txt
Output
apple banana
dog cat
Command
awk -F',' '{ print $1, $3 }' input.txt
Output
apple orange
dog bird
How It Works
| Element | Description |
|---|---|
| FS | Built-in variable that defines the field delimiter |
| BEGIN | A block executed only once before input processing |
| $1, $2 | Data of the 1st and 2nd columns after splitting |
| -F | Option to specify the delimiter from the command line |
Explanation
By setting FS, awk splits each line using the specified delimiter. The delimiter can be flexibly changed either within BEGIN or with the -F option.
Using Multiple Different Characters (Comma, Tab, Semicolon, etc.) as Delimiters Simultaneously
Creating the File
cat << 'EOF' > input.txt
apple,orange;grape banana
dog;cat,bird fish
EOF
※ To input a tab, press Ctrl+v and then press the Tab key.
Command
awk -F '[,;\t]' '{print $1, $2, $3, $4}' input.txt
Output
apple orange grape banana
dog cat bird fish
How It Works
| Element | Description |
|---|---|
| -F | Specifies the field delimiter (Field Separator) |
| [,;\t] | A regex character class (specifying comma, semicolon, and tab simultaneously) |
| $1, $2... | References each field after splitting |
| awk | A command that processes text on a per-field basis |
Explanation
In awk, using a regular expression with -F allows you to handle multiple delimiters simultaneously. Using a character class [] is the simplest and most practical approach.
Flexible Field Splitting Techniques Using Regular Expressions
Creating the File
cat << 'EOF' > input.txt
name:John, age=25; city Tokyo
name:Alice, age=30; city Osaka
name:Bob, age=22; city Nagoya
EOF
Command
awk -F '[:,=; ]+' '{print $2, $4, $6}' input.txt
Output
John 25 Tokyo
Alice 30 Osaka
Bob 22 Nagoya
Command
awk -F '[,;]' '{print $1 "|" $2 "|" $3}' input.txt
Output
name:John| age=25| city Tokyo
name:Alice| age=30| city Osaka
name:Bob| age=22| city Nagoya
How It Works
| Element | Description |
|---|---|
| -F | Specifies the field delimiter |
| [:,=; ]+ | A regex that treats colon, comma, equals, semicolon, and space together as a delimiter |
| $1, $2 ... | References fields after splitting |
| [ ,; ] | Splits by comma and semicolon |
| + | Treats consecutive delimiters as one |
Explanation
By using regular expressions as delimiters in awk, you can flexibly split data in multiple formats. This is extremely effective for parsing complex logs and mixed-format data.
Using the Built-in Variable OFS to Control Output Delimiters
Creating the File
cat << 'EOF' > input.txt
apple orange banana
dog cat mouse
EOF
Command
awk '{print $1, $2, $3}' input.txt
Output
apple orange banana
dog cat mouse
Command
awk 'BEGIN {OFS=","} {print $1, $2, $3}' input.txt
Output
apple,orange,banana
dog,cat,mouse
Command
awk 'BEGIN {OFS="\t"} {print $1, $2, $3}' input.txt
Output
apple orange banana
dog cat mouse
How It Works
| Element | Description |
|---|---|
| FS | Input field delimiter (default is space) |
| OFS | Output field delimiter (used when printing) |
| $1, $2... | Field (column) references |
| BEGIN | A block executed only once before input processing |
| Outputs fields joined by OFS |
Explanation
The output delimiter is controlled by OFS and is applied when multiple fields are specified with print.
By combining it with FS, you can flexibly manipulate both input and output formats.
Splitting and Processing One Character at a Time Using an Empty String as a Delimiter
Creating the File
cat << 'EOF' > input.txt
Hello
EOF
Command
awk -v FS="" '{ for(i=1;i<=NF;i++) print $i }' input.txt
Output
H
e
l
l
o
How It Works
| Element | Description |
|---|---|
| FS="" | Sets the delimiter to an empty string (splits on each character) |
| NF | Number of fields (= number of characters) |
| $i | The i-th character |
| for loop | Iterates through one character at a time |
Explanation
Specifying FS="" in awk causes each character to be treated as a field. This makes it easy to process data one character at a time.
Settings for Handling Fixed-width Data
Creating the File
cat << 'EOF' > input.txt
John 25Engineer
Alice 30Designer
Bob 22Student
EOF
Command
awk '{name=substr($0,1,10); age=substr($0,11,2); job=substr($0,13,9); print name, age, job}' input.txt
Output
John 25 Engineer
Alice 30 Designer
Bob 22 Student
Command
awk '{print $1, $2}' input.txt
Output
John 25Engineer
Alice 30Designer
Bob 22Student
How It Works
| Item | Description |
|---|---|
| substr($0,1,10) | Characters 1 through 10 (name) |
| substr($0,11,2) | Characters 11 through 12 (age) |
| substr($0,13,9) | Characters 13 through 21 (job) |
| $0 | The entire line as a string |
| Standard awk | Processes using spaces as the field separator |
Explanation
substr allows you to extract fixed positions. The basic approach for fixed-width data is to handle it by character position.
Extracting and Formatting Specific Columns from System Logs
Creating the File
cat << 'EOF' > input.txt
2026-05-01 INFO user1 login
2026-05-01 ERROR user2 failed
2026-05-02 INFO user3 logout
EOF
Command
awk '{print $1, $3}' input.txt
Output
2026-05-01 user1
2026-05-01 user2
2026-05-02 user3
Command
awk -F ' ' '{print $2, $4}' input.txt
Output
INFO login
ERROR failed
INFO logout
How It Works
| Element | Description |
|---|---|
| awk | A text processing tool |
| -F | Specifies the field delimiter (field separator) |
| $1, $2... | Represents each column (field) |
| Outputs the specified columns |
Explanation
awk splits columns using a delimiter and can extract only the needed fields.
This is extremely effective for extracting specific columns in log analysis.
Summary: Key Points for Using Fields and Separators in awk
Understanding fields and separators is essential to mastering awk.
By appropriately using -F, FS, and OFS, you can flexibly handle everything from data splitting to formatting.
Furthermore, combining regular expressions makes it possible to handle complex, real-world data.
For beginners, it is important to first understand the default behavior and then gradually progress to more advanced usage.
