copied to clipboard!
string awk

AWK Variables Explained: A Beginner’s Guide to the Essentials

updated: 2026/05/05 created: 2026/05/05

Introduction

awk is a powerful command specialized in text processing, widely used for log analysis and data formatting.

Among its features, the handling of variables is a critical element that greatly influences the flexibility and efficiency of processing.

However, for beginners, there are many points that tend to cause confusion, such as "where do you declare them?" and "how are types handled?"

By reading this article, you will be able to use awk not just as a one-liner tool, but as a practical tool one step further.

Reference: GNU awk

Basic Concepts of Variables in AWK and Declaration Rules

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 orange 300 EOF

Command

awk '{ total += $2 } END { print total }' input.txt

Output

600

Command

awk '{ count++; sum += $2 } END { print "avg=" sum/count }' input.txt

Output

avg=200

How It Works

ElementDescriptionKey Point
totalVariable that holds the total valueStarts from 0 without initialization
countVariable for counting rowsAutomatically created upon use
$2Second column fieldTreated as a numeric value
+=Addition assignmentEquivalent to total = total + $2
ENDBlock executed after input processingUsed for outputting aggregated results

Explanation

awk variables require no declaration and are automatically created with an initial value of 0 or an empty string, making it easy to write aggregation processes simply.

Passing Variables from Command-Line Arguments Using the -v Option

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 orange 300 EOF

Command

awk -v threshold=150 '$2 > threshold {print $1, $2}' input.txt

Output

banana 200
orange 300

How It Works

ElementDescriptionExplanation
-v threshold=150Defining an awk variablePasses a variable from the shell to awk
$2 > thresholdCondition expressionExtracts rows where the second column's value is greater than threshold
{print $1, $2}ActionOutputs the first and second columns of matching rows
input.txtInput fileData to be processed

Explanation

Using the -v option allows you to flexibly use shell-side values within awk.
Because conditions can be changed externally, the reusability of scripts is improved.

Variable Initialization and Efficiency Using the BEGIN Block

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 orange 150 EOF

Command

awk 'BEGIN { total=0 } { total += $2 } END { print total }' input.txt

Output

450

How It Works

BlockTimingContentRole
BEGINBefore processing startstotal=0Variable initialization (executed only once, efficiently)
Main processingPer each rowtotal += $2Adds the numeric value of the second column
ENDAfter processing endsprint totalOutputs the aggregated result

Explanation

Initializing variables in the BEGIN block prevents unnecessary re-initialization and improves efficiency. Although awk variables are created automatically, managing them explicitly also improves readability.

How to Use Built-in Variables (NR, NF, FS, RS) vs. User-Defined Variables

Creating the File

cat << 'EOF' > input.txt apple 100 red banana 200 yellow grape 300 purple EOF

Command

awk '{ print "NR=" NR, "NF=" NF, $0 }' input.txt

Output

NR=1 NF=3 apple 100 red
NR=2 NF=3 banana 200 yellow
NR=3 NF=3 grape 300 purple

Command

awk 'BEGIN { FS=" " } { print $1, $2 }' input.txt

Output

apple 100
banana 200
grape 300

Command

awk '{ total += $2 } END { print "Total=" total }' input.txt

Output

Total=600

Command

awk 'BEGIN { RS="\n" } { print NR ":" $0 }' input.txt

Output

1:apple 100 red
2:banana 200 yellow
3:grape 300 purple

How It Works

TypeVariableRoleExample
Built-in variableNRRecord numberRetrieving the row number
Built-in variableNFNumber of fieldsRetrieving the column count
Built-in variableFSField separatorSplitting by spaces or commas
Built-in variableRSRecord separatorDelimiting by newline or arbitrary character
User-defined variabletotalHolds arbitrary valuesCalculating a total

Explanation

awk's built-in variables are for handling the structure of input data, while user-defined variables are used for calculations and state retention.
Separating these roles improves readability and flexibility.

How to Reference Environment Variables in AWK Scripts (ENVIRON Array)

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 EOF

Command

export RATE=1.1 awk '{ price = $2 * ENVIRON["RATE"]; print $1, price }' input.txt

Output

apple 110
banana 220

How It Works

ElementDescriptionExample
ENVIRON arrayA mechanism to reference environment variables within awkENVIRON["RATE"]
exportSets an environment variable in the shellexport RATE=1.1
$1, $2Field references of the input rowapple, 100
ArithmeticCalculations possible within awk$2 * ENVIRON["RATE"]
printOutputs the resultprint $1, price

Explanation

In awk, you can use the ENVIRON array to directly reference shell environment variables. This allows you to flexibly switch values from outside without modifying the script.

Notes on Automatic Type Conversion Between Numbers and Strings

Creating the File

cat << 'EOF' > input.txt 10 apple 20 banana 30 cherry EOF

Command

awk '{ total += $1; text += $2 } END { print total, text }' input.txt

Output

60 0

Command

awk '{ total += $1; text = text $2 } END { print total, text }' input.txt

Output

60 applebananacherry

How It Works

ElementBehaviorNote
total += $1Added as a numberNumeric total is computed correctly
text += $2Attempts to add as a numberStrings are converted to 0
text = text $2String concatenationCombined as a string as expected
awk variableType is inferred automatically without declarationSwitches between numeric/string depending on context

Explanation

Since awk variables can be either numeric or string depending on context, using += can cause unintended numeric conversion. For string operations, it is safer to use explicit concatenation (text = text $2).

Updating Variables Using Arithmetic and Assignment Operators

Creating the File

cat << 'EOF' > input.txt 10 20 30 EOF

Command

awk '{ sum += $1; print "Current total:", sum }' input.txt

Output

Current total: 10
Current total: 30
Current total: 60

Command

awk '{ product *= ($1==""?1:$1); if(NR==1) product=$1; print "Current product:", product }' input.txt

Output

Current product: 10
Current product: 200
Current product: 6000

How It Works

ElementDescriptionExplanation
sum += $1Addition assignmentAdds the current value to the variable sum
product *= $1Multiplication assignmentMultiplies the variable product by the value
$1Field referenceThe value of the first column of each row
NRRecord numberRow number (used for initialization check)
VariableAuto-created within awkInitial value is 0 or undefined

Explanation

In awk, variables are created automatically and can be updated incrementally using operators like += and *=.
The ability to perform stream-style calculations without loops is one of its strengths.

Dynamic Variable Manipulation in Conditional Branching (if) and Loops (for|while)

Creating the File

cat << 'EOF' > input.txt apple 10 banana 20 orange 30 EOF

Command

awk '{ total += $2 if ($2 > 15) { count++ } } END { for (i = 1; i <= count; i++) { printf "loop %d\n", i } print "total =", total }' input.txt

Output

loop 1
loop 2
total = 60

How It Works

ElementDescriptionExplanation
$2Numeric fieldGets the second column of each row
total += $2Variable additionUpdates the running total per row
if ($2 > 15)Conditional branchProcesses only values greater than 15
count++Counter incrementIncrements when the condition is met
ENDPost-processing blockExecuted after all rows are processed
forLoopRepeats count times

Explanation

In awk, variables can be dynamically updated during record processing and then used collectively in the END block.
Combining if and for enables flexible aggregation and control.

Data Aggregation Techniques Using Associative Arrays (Maps) as Variables

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 apple 150 orange 300 banana 50 EOF

Command

awk '{ sum[$1] += $2 } END { for (k in sum) print k, sum[k] }' input.txt

Output

apple 250
banana 250
orange 300

How It Works

ElementDescription
Key$1 (product name)
Value$2 (numeric data)
Associative arraysum[key] += value
Aggregation timingAdded up during each row's processing
Output processingLoops through all keys in the END block for output

Explanation

awk variables can dynamically generate keys as associative arrays, making them very effective for category-based aggregation. Grouping and aggregation can be achieved simultaneously with simple syntax.

Reading Values into Variables from External Files Using the getline Function

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 cherry 300 EOF

Command

awk 'BEGIN { getline line < "input.txt"; split(line, a, " "); val=a[2] } { print $1, val }' input.txt

Output

apple 100
banana 100
cherry 100

How It Works

ElementDescription
getline line < "input.txt"Reads one line from an external file and stores it in the variable line
split(line, a, " ")Splits the read line by spaces and stores the parts in array a
a[2]Gets the second element (here, 100)
val=a[2]Assigns it to the awk internal variable val
{ print $1, val }Outputs the first column of each row along with the fixed value val

Explanation

Using getline allows you to retrieve values from a file other than the one being processed and use them as variables. This enables flexible incorporation of external data.

Managing Local Variables in Functions and Global Variables

Creating the File

cat << 'EOF' > input.txt apple 10 banana 20 apple 30 banana 40 EOF

Command

awk '{ total[$1] += $2; sum += $2 } END { for (k in total) print k, total[k]; print "GLOBAL_SUM", sum }' input.txt

Output

apple 40
banana 60
GLOBAL_SUM 100

How It Works

TypeVariable NameScopeExplanation
Local-style (per-key management)total[$1]Associative array (pseudo-local)Holds values per key (apple, banana unit)
GlobalsumShared across allHolds the total value of all records
Input fields$1, $2Per-rowReferences data of each row
END block-Global post-processingOutputs aggregated results

Explanation

In awk, function scope is weak, and "local-like" behavior is achieved by managing values with array keys. Single variables act as globals shared across the entire program.

Outputting Variable Values to Improve Debugging Efficiency

Creating the File

cat << 'EOF' > input.txt apple 100 banana 200 orange 300 EOF

Command

awk '{ total += $2; print "DEBUG: item=" $1 ", price=" $2 ", total=" total } END { print "SUM=" total }' input.txt

Output

DEBUG: item=apple, price=100, total=100
DEBUG: item=banana, price=200, total=300
DEBUG: item=orange, price=300, total=600
SUM=600

How It Works

ElementDescription
$1First column (product name)
$2Second column (numeric value)
totalVariable defined within awk (for accumulation)
total += $2Holds the value while adding to it
printSequentially outputs variable contents for debugging
ENDOutputs the final result after all rows are processed

Explanation

Since variables in awk can be used freely, outputting intermediate values with print improves debugging efficiency.
This is especially useful for verifying cumulative processing and conditional branching.

Log File Aggregation and Report Generation Using Variables

Creating the File

cat << 'EOF' > input.txt 2026-05-01 INFO 120 2026-05-01 ERROR 30 2026-05-02 INFO 200 2026-05-02 ERROR 50 2026-05-03 INFO 150 2026-05-03 ERROR 20 EOF

Command

awk '{ count[$2] += $3 } END { for (level in count) print level, count[level] }' input.txt

Output

INFO 470
ERROR 100

Command

awk -v threshold=100 '{ count[$2] += $3 } END { for (level in count) if (count[level] > threshold) print level, count[level] }' input.txt

Output

INFO 470

How It Works

ElementDescription
$2Log level (INFO / ERROR)
$3Numeric data (count or size)
count[$2] += $3Aggregates the total per log level
-v threshold=100Defines a variable to be used in awk from outside
ENDOutputs results after all rows are processed
for (level in count)Loops through all keys in the array

Explanation

By using awk variables and associative arrays, aggregation by log level and conditional reporting can be achieved concisely. The -v option also allows for flexible condition settings.

Key Points for Mastering awk Variables

Variables in awk are not merely containers for values — they play a central role in controlling the flow of processing.

A key characteristic is that they can be used without declaration, but that also means unexpected behavior can arise if you are not mindful of scope and initialization timing.

Understanding awk variables correctly and building up from small tasks is the fastest path to improving your skills.

Leave a Reply

Your email address will not be published. Required fields are marked *

©︎ 2025-2026 running terminal commands