Delimited text becomes easier to work with when each column, line range, or repeated value can be tested directly. The awk command in Linux reads input record by record, splits each record into fields, and runs compact pattern/action programs for filtering logs, summarizing inventories, comparing files, and formatting reports.
Use awk when the result depends on fields, totals, grouped counts, ranges, or calculated output. For simpler line matching, use the grep command in Linux; for stream substitutions and targeted text edits, use sed command examples. Awk is the middle ground for column-aware logic that is too specific for line matching but too small for a full script.
Understand the awk Command in Linux
An awk program usually contains a pattern, an action, or both. The pattern decides which records match, and the action inside braces decides what to print, calculate, store, or transform. When no pattern is provided, the action runs for every record; when no action is provided, awk prints each matching record.
Basic awk Syntax
awk 'pattern { action }' file
Awk also reads from standard input, which makes it useful after commands that generate text:
command | awk 'pattern { action }'
- pattern: A condition such as
NR > 1,$2 == "frontend", or/ERROR/. If the condition is true, the action runs. - action: One or more statements inside braces, such as
{ print $1 },{ total += $4 }, or{ printf "%s\n", $1 }. - file: One or more input files. If no file is named, awk reads standard input.
Records, Fields, and Separators
Awk reads input one record at a time. A record is usually one line, and fields are the columns inside that line. These built-in variables and field references appear throughout most awk one-liners:
| Reference | Meaning | Common Use |
|---|---|---|
$0 | The full current record. | Print or test the whole line. |
$1, $2 | The first field, second field, and so on. | Select columns from structured text. |
NF | The number of fields in the current record. | Skip blank lines or find the last field. |
$NF | The field whose number equals NF. | Print the last field without hardcoding its position. |
NR | The total record number across all input. | Skip headers, print line ranges, or add row numbers. |
FNR | The record number inside the current file. | Separate first-file setup from second-file processing. |
FS or -F | The input field separator. | Split records on colons, commas, tabs, or another pattern. |
OFS | The output field separator. | Control spacing or delimiters between printed fields. |
By default, awk splits fields on runs of whitespace. Use -F when the input has a clearer delimiter, such as a colon in account-style data, a comma in simple CSV, or a tab in TSV files. Because -F takes a regular expression, simple delimiters work well, but plain -F',' should stay limited to rows where commas do not appear inside quoted fields.
Portable scripts should start with POSIX awk features such as -F, NR, FNR, NF, BEGIN, END, arrays, printf, -v, and -f. The POSIX awk utility definition defines that baseline. The GNU Awk manual covers GNU extensions, including newer CSV handling, but portable awk programs should depend on those extensions only when the target systems are known to use GNU awk.
awk Quick Reference
Use these patterns as starting points for common awk tasks. Predictable input files make each pattern easier to adapt without mixing several awk concepts at once.
Field and Filter Patterns
| Task | Command Pattern | What It Does |
|---|---|---|
| Print one field | awk '{ print $1 }' file | Prints the first field from each input record. |
| Use a delimiter | awk -F':' '{ print $1 }' file | Splits records on colons instead of whitespace. |
| Print the last field | awk '{ print $NF }' file | Uses NF as the number of the final field. |
| Skip a header | awk 'NR > 1 { print }' file | Starts processing after the first record. |
| Filter by field | awk '$2 == "error" { print $0 }' file | Prints records where the second field matches exactly. |
| Filter by regex | awk '/WARN|ERROR/ { print }' file | Prints records matching either regular expression branch. |
| Print a line range | awk 'NR >= 10 && NR <= 20 { print }' file | Prints records by line number. |
| Skip blanks and comments | awk 'NF && $1 !~ /^#/ { print }' file | Ignores empty lines and lines whose first field starts with #. |
Reporting and Reuse Patterns
| Task | Command Pattern | What It Does |
|---|---|---|
| Add values | awk '{ total += $1 } END { print total }' file | Accumulates values and prints the final total after input ends. |
| Count values | awk '{ count[$1]++ } END { for (key in count) print key, count[key] }' file | Uses an associative array to count repeated values. |
| Remove duplicates | awk '!seen[$0]++' file | Prints the first occurrence of each unique input record. |
| Compare two files | awk 'FNR == NR { seen[$1]; next } $1 in seen { print }' file1 file2 | Builds an array from the first file, then checks the second file. |
| Format output | awk '{ printf "%-10s %s\n", $1, $2 }' file | Uses awk’s printf statement for aligned output. |
| Pass a shell value safely | awk -v limit=10 '$1 >= limit { print }' file | Passes an external value without shell-interpolating the awk program. |
| Run a script file | awk -f report.awk file | Loads awk logic from a reusable script file. |
Verify awk Availability and Build Example Files
Most general-purpose Linux distributions include awk. Confirm the command path before using the fixture commands.
command -v awk
A working system returns an executable path:
/usr/bin/awk
When the system uses GNU awk, the GNU-specific version option prints an implementation banner:
awk --version | head -1
A GNU awk version line begins with the implementation name and version:
GNU Awk 5.3.2, API 4.0, PMA Avon 8-g1, (GNU MPFR 4.2.2, GNU MP 6.3.0)
If awk --version fails, the command may still work through another implementation. A small BEGIN program proves awk can start without waiting for an input file:
awk 'BEGIN { print "awk works" }'
awk works
Create a disposable workspace for the examples. The files cover colon-delimited inventories, whitespace-delimited logs, simple CSV, two-file comparisons, section markers, comments, blank lines, and uneven spacing.
mkdir -p ~/awk-demo
cd ~/awk-demo
cat > inventory.txt <<'EOF'
hostname:role:cpu:memory:status
web01:frontend:4:8:active
web02:frontend:4:8:active
api01:backend:8:16:active
api02:backend:8:32:maintenance
db01:database:16:64:active
cache01:cache:4:16:active
EOF
cat > app.log <<'EOF'
2026-05-27T10:00:01Z INFO web01 request completed ms=42
2026-05-27T10:00:02Z WARN api01 retrying upstream ms=180
2026-05-27T10:00:03Z ERROR api01 upstream timeout ms=900
2026-05-27T10:00:04Z INFO db01 query completed ms=75
2026-05-27T10:00:05Z ERROR web02 disk full ms=310
2026-05-27T10:00:06Z WARN api02 cache stale ms=240
2026-05-27T10:00:07Z INFO cache01 request completed ms=35
EOF
cat > services.csv <<'EOF'
service,port,owner,enabled
ssh,22,ops,yes
http,80,web,yes
https,443,web,yes
postgres,5432,data,no
redis,6379,data,yes
EOF
cat > limits.txt <<'EOF'
frontend 12
backend 16
database 32
cache 8
EOF
cat > sections.conf <<'EOF'
[frontend]
web01
web02
[backend]
api01
api02
[database]
db01
[cache]
cache01
EOF
cat > settings.conf <<'EOF'
# local app settings
host = web01
port = 8080
mode = production
EOF
cat > messy.txt <<'EOF'
web01 frontend active
api01 backend maintenance
EOF
The fixture files keep each example focused on one awk behavior:
| File | Used For |
|---|---|
inventory.txt | Colon-delimited fields, headers, numeric comparisons, totals, groups, and output formatting. |
app.log | Whitespace fields, log-level filters, timing extraction, duplicates, counts, and pipelines. |
services.csv | Simple comma-separated rows without quoted commas. |
limits.txt | Two-file lookups with FNR and NR. |
sections.conf | Range patterns and marker-controlled section extraction. |
settings.conf and messy.txt | Comment skipping, blank-line handling, and whitespace normalization. |
Preview the first records with awk itself:
awk 'NR <= 4 { print }' inventory.txt
hostname:role:cpu:memory:status web01:frontend:4:8:active web02:frontend:4:8:active api01:backend:8:16:active
Print and Select Fields with awk
Print Specific Fields
Use -F':' to split the inventory file on colons, then print the first two fields. Awk separates comma-separated print arguments with a space unless you set a different output separator.
awk -F':' 'NR > 1 { print $1, $2 }' inventory.txt
web01 frontend web02 frontend api01 backend api02 backend db01 database cache01 cache
Print the Last Field with NF
NF stores the number of fields in the current record, so $NF means the last field. This is useful when records have a stable ending column but a changing number of earlier fields.
awk -F':' 'NR > 1 { print $1, $NF }' inventory.txt
web01 active web02 active api01 active api02 maintenance db01 active cache01 active
Add Line Numbers to Output
NR counts every input record read so far. Subtract one after skipping a header when the output should show data-row numbers instead of physical file line numbers.
awk -F':' 'NR > 1 { print NR - 1, $1 }' inventory.txt
1 web01 2 web02 3 api01 4 api02 5 db01 6 cache01
Append Text to Printed Fields
Awk concatenates adjacent strings and variables without a separate operator. This makes labels and units easy to attach to field values.
awk -F':' 'NR > 1 { print $1, $4 " GB" }' inventory.txt
web01 8 GB web02 8 GB api01 16 GB api02 32 GB db01 64 GB cache01 16 GB
Read Simple CSV Fields
-F',' works for simple comma-separated rows where commas never appear inside quoted fields. Use it for quick admin data, not for full CSV exports with quoted commas, embedded newlines, or escaped quotes.
awk -F',' 'NR > 1 && $4 == "yes" { print $1, $2 }' services.csv
ssh 22 http 80 https 443 redis 6379
Filter Rows and Ranges with awk
Match an Exact Field Value
Field comparisons are one of awk’s strongest everyday uses. Print only hosts where the role field equals frontend.
awk -F':' '$2 == "frontend" { print $1 }' inventory.txt
web01 web02
Compare Numeric Fields
Numeric comparisons work directly when the field contains a number. Keep the header guard in place so text labels do not take part in the comparison.
awk -F':' 'NR > 1 && $3 >= 8 { print $1, $3 " CPUs" }' inventory.txt
api01 8 CPUs api02 8 CPUs db01 16 CPUs
Skip Blank Lines and Comments
NF is zero on blank lines, so it can remove empty records. Pair it with a comment check when parsing simple configuration-style files.
awk 'NF && $1 !~ /^#/ { print }' settings.conf
host = web01 port = 8080 mode = production
Match Log Lines with a Regular Expression
A bare regular expression pattern checks the whole record. This form is useful when you want awk to keep only log severities or message classes before printing selected fields.
awk '/WARN|ERROR/ { print $1, $2, $3 }' app.log
2026-05-27T10:00:02Z WARN api01 2026-05-27T10:00:03Z ERROR api01 2026-05-27T10:00:05Z ERROR web02 2026-05-27T10:00:06Z WARN api02
Print Rows by Line Number
Use NR comparisons for line-number ranges. The command here prints physical input lines 3 through 5, including the header’s effect on numbering.
awk -F':' 'NR >= 3 && NR <= 5 { print NR, $1 }' inventory.txt
3 web02 4 api01 5 api02
Print Lines Between Markers
Awk range patterns print from the first matching record through the next matching record, including both marker lines. That behavior is useful when the markers belong in the output.
awk '/^\[backend\]$/, /^\[database\]$/' sections.conf
[backend] api01 api02 [database]
When you only want the records inside the section, set a flag at the start marker and clear it at the next section header:
awk '/^\[backend\]$/ { active = 1; next } /^\[/ { active = 0 } active { print }' sections.conf
api01 api02
Transform Text with awk
Extract Values with split
split() divides a string into an array. The log file stores timing as ms=value, so splitting field 6 on = extracts the numeric value without changing the original log line.
awk '$2 == "ERROR" { split($6, timing, "="); print $3, timing[2] }' app.log
api01 900 web02 310
Normalize Uneven Whitespace
Assigning to any field makes awk rebuild $0 with the output field separator. With the default OFS, this collapses uneven spacing to single spaces.
awk '{ $1 = $1; print }' messy.txt
web01 frontend active api01 backend maintenance
Replace Text in One Field with gsub
gsub() replaces every match in the target string. Give it a field target, such as $1, when the replacement should not touch the whole record. Set OFS first when the edited record should keep the original delimiter.
awk -F':' 'BEGIN { OFS=":" } NR == 1 { print; next } { gsub(/^web/, "frontend-", $1); print }' inventory.txt | awk 'NR <= 3 { print }'
hostname:role:cpu:memory:status frontend-01:frontend:4:8:active frontend-02:frontend:4:8:active
The second awk process limits the preview to three records. Remove that final pipe when you want the complete transformed stream.
Calculate Totals and Reports with awk
Add Values Across Records
Awk variables are created on first use. Add each memory value to total, then print the result in an END block after awk has read every record.
awk -F':' 'NR > 1 { total += $4 } END { printf "total memory: %d GB\n", total }' inventory.txt
total memory: 144 GB
Calculate an Average
Track both the total and the number of data records when you need an average. Skipping the header keeps the count aligned with real host rows; add an END guard when a filter might match no records.
awk -F':' 'NR > 1 { total += $4; count++ } END { printf "average memory: %.1f GB\n", total / count }' inventory.txt
average memory: 24.0 GB
Sum Values by Group
Associative arrays let awk group values by any field. Add memory by role, then pipe the result to sort so the report order stays stable.
awk -F':' 'NR > 1 { memory[$2] += $4 } END { for (role in memory) print role, memory[role] }' inventory.txt | sort
backend 48 cache 16 database 64 frontend 16
Count Log Levels
The same array pattern counts repeated values. This command counts how often each severity appears in the log file.
awk '{ count[$2]++ } END { for (level in count) print level, count[level] }' app.log | sort
ERROR 2 INFO 3 WARN 2
Calculate Percentages by Group
Percentages need both the grouped subtotal and the overall total. The doubled percent sign in printf prints a literal percent character.
awk -F':' 'NR > 1 { memory[$2] += $4; total += $4 } END { for (role in memory) printf "%-10s %5.1f%%\n", role, memory[role] * 100 / total }' inventory.txt | sort
backend 33.3% cache 11.1% database 44.4% frontend 11.1%
Show the Top Rows by Value
Awk can extract the sort key and label, then standard shell tools can order and limit the result. Put the numeric value first so sort -k1,1nr sorts by that column only.
awk -F':' 'NR > 1 { print $4, $1 }' inventory.txt | sort -k1,1nr | head -3
64 db01 32 api02 16 api01
Align Columns with printf
Use awk’s printf statement when spacing matters. The format string controls column width and data type; the separate printf command in Linux reference uses the same formatting ideas from the shell side.
awk -F':' 'BEGIN { printf "%-10s %-12s %8s\n", "HOST", "ROLE", "MEMORY" } NR > 1 { printf "%-10s %-12s %6s GB\n", $1, $2, $4 }' inventory.txt
HOST ROLE MEMORY web01 frontend 8 GB web02 frontend 8 GB api01 backend 16 GB api02 backend 32 GB db01 database 64 GB cache01 cache 16 GB
Set a Custom Output Separator
OFS controls the separator awk uses between comma-separated arguments to print. Set it in BEGIN before the first output record.
awk -F':' 'BEGIN { OFS="," } NR == 1 { print "host","role","memory_gb"; next } { print $1,$2,$4 }' inventory.txt
host,role,memory_gb web01,frontend,8 web02,frontend,8 api01,backend,16 api02,backend,32 db01,database,64 cache01,cache,16
Compare Files, Remove Duplicates, and Use Pipelines with awk
Remove Duplicate Values While Preserving First Seen Order
The expression !seen[value]++ prints only the first time a value appears. Use a field as the array key when uniqueness should be based on one column instead of the whole line.
awk '!seen[$3]++ { print $3 }' app.log
web01 api01 db01 web02 api02 cache01
Count Unique Values
Counting a repeated field is often more useful than removing duplicates. Sort the final output when array traversal order should not affect the report.
awk '{ hits[$3]++ } END { for (host in hits) print host, hits[host] }' app.log | sort
api01 2 api02 1 cache01 1 db01 1 web01 1 web02 1
Compare Two Files with FNR and NR
FNR == NR is true only while awk reads the first file. This lets you load lookup data into an array, then use that array while reading the second file. The field separator here matches either colons or runs of whitespace, so one command can read both fixture formats.
awk -F'[:[:space:]]+' 'FNR == NR { min[$1] = $2; next } FNR > 1 && $4 < min[$2] { print $1, $2, $4 " GB", "below", min[$2] " GB" }' limits.txt inventory.txt
web01 frontend 8 GB below 12 GB web02 frontend 8 GB below 12 GB
Use awk in a Pipeline
Pipeline input is useful when another command narrows the data first. Here, tail reads the newest log lines, then awk extracts only the timestamp, host, and timing field from error rows.
tail -n 5 app.log | awk '$2 == "ERROR" { print $1, $3, $6 }'
2026-05-27T10:00:03Z api01 ms=900 2026-05-27T10:00:05Z web02 ms=310
Reuse awk Logic with Variables and Scripts
Pass Shell Values with -v
Use -v to pass shell-controlled values into awk before awk starts reading input. This keeps shell values separate from the awk program, avoids quoting mistakes, and makes the condition easier to reuse.
awk -F':' -v min_mem=32 'NR > 1 && $4 >= min_mem { print $1, $4 }' inventory.txt
api02 32 db01 64
Move Reusable Logic into an awk Script
Short one-liners are convenient, but a script file is easier to review when the logic grows past a few statements. Keep shell setup separate from the awk program so each layer remains readable, and pass runtime values with -v instead of editing the script for each run.
cat > high-memory.awk <<'EOF'
BEGIN { FS = ":" }
NR == 1 { next }
$4 >= limit { printf "%s has %d GB\n", $1, $4 }
EOF
awk -v limit=32 -f high-memory.awk inventory.txt
api02 has 32 GB db01 has 64 GB
Use awk Safely in Shell Workflows
- Quote awk programs with single quotes so the shell does not expand
$1,$2, or other awk variables before awk sees them. - Use
-vfor shell-provided values instead of inserting variables directly into the awk program string. - Remember that
-Fis a regular expression. Use simple delimiters confidently, but do not treat plain awk splitting as a full parser for quoted CSV, JSON, YAML, XML, or other structured formats. - Do not overwrite files as the first step. Write transformed output to a separate file, inspect it, then replace the original only after the preview matches what you intended.
- Sort array-based reports when order matters. Normal awk does not promise the order of
for (key in array)traversal.
Rewrite the frontend memory values into a separate file and preview only the first records. The original inventory.txt remains unchanged.
awk -F':' 'BEGIN { OFS=":" } NR == 1 { print; next } $2 == "frontend" { $4 = 12 } { print }' inventory.txt > inventory.updated
awk 'NR <= 3 { print }' inventory.updated
hostname:role:cpu:memory:status web01:frontend:4:12:active web02:frontend:4:12:active
Replace a real source file only after reviewing the generated output. A safer pattern is to write to a sibling file, inspect or compare it, make a backup when the source matters, and then move the replacement into place.
Troubleshoot Common awk Errors
awk Command Not Found
If the shell cannot find awk, confirm the command lookup first:
command -v awk
No output means the current shell cannot find awk in $PATH. On Ubuntu, Fedora, and Rocky Linux systems, GNU awk is provided by the gawk package. Install it with the distribution package manager, then open a new shell if the command path still does not refresh.
awk Version Check Does Not Work
awk --version is a GNU awk option. Another awk implementation can reject that option while still running normal awk programs. Test the interpreter with a BEGIN action before assuming awk itself is broken.
awk 'BEGIN { print "awk works" }'
awk works
Double Quotes Print the Whole Record Instead of a Field
The shell expands $1 inside double quotes before awk receives the program. In a normal shell with no first positional parameter, $1 becomes empty, so awk receives { print } and prints whole records. In a strict shell using set -u, the shell can stop earlier with an unbound-variable error.
awk -F':' "{ print $1 }" inventory.txt | head -3
hostname:role:cpu:memory:status web01:frontend:4:8:active web02:frontend:4:8:active
Single quotes keep awk variables intact until awk parses them:
awk -F':' '{ print $1 }' inventory.txt | head -3
hostname web01 web02
Unexpected Newline or End of String
A missing quote or closing brace makes awk parse the program as incomplete. This broken command leaves the action unfinished:
awk '{ print $1 ' inventory.txt
Relevant GNU awk output includes:
awk: cmd. line:1: { print $1
awk: cmd. line:1: ^ unexpected newline or end of string
Close both the action brace and the shell quote before rerunning the command:
awk '{ print $1 }' inventory.txt
Fields Do Not Split Correctly
When $1, $2, or later fields do not contain what you expect, print each parsed field from one known record. This diagnoses the separator before you rewrite the main command.
awk -F':' 'NR == 2 { for (i = 1; i <= NF; i++) print i, $i }' inventory.txt
1 web01 2 frontend 3 4 4 8 5 active
If the whole line appears as field 1, the input probably uses a different delimiter than the one passed to -F. Preview one known record, confirm the real separator, then update the main command.
Averages Fail When No Rows Match
An average needs at least one matching record. Guard the END block so awk prints a clear result instead of dividing by zero or producing implementation-specific output.
awk -F':' '$2 == "worker" { total += $4; count++ } END { if (count) printf "%.1f\n", total / count; else print "no matching records" }' inventory.txt
no matching records
Grouped Output Appears in a Different Order
Awk arrays are associative, and normal awk does not promise a stable order when iterating with for (key in array). Sort the printed rows when report order matters:
awk -F':' 'NR > 1 { count[$2]++ } END { for (role in count) print role, count[role] }' inventory.txt | sort
backend 2 cache 1 database 1 frontend 2
Clean Up the awk Practice Files
Remove the disposable workspace after testing the examples. The command targets only the directory created earlier.
cd ~
rm -rf ~/awk-demo
Conclusion
Awk is ready for field-based filtering, line ranges, grouped counts, file comparisons, and formatted reports from plain text streams. Keep one-off programs single-quoted, pass shell values with -v, sort array reports when order matters, and write transformed data to a separate output file before replacing source data.


Formatting tips for your comment
You can use basic HTML to format your comment. Useful tags currently allowed in published comments:
<code>command</code>command<strong>bold</strong><em>italic</em><a href="https://example.com">link</a><blockquote>quote</blockquote>