Large logs, database exports, and backup archives get easier to move or process when you can cut them into predictable pieces without changing the original data. The split command in Linux handles that job from the terminal, using line counts, byte limits, record-aware size limits, or a fixed number of output chunks. GNU split also supports filters, custom suffixes, and round-robin distribution for workflows that need compressed pieces or balanced worker inputs.
Understand the split Command
GNU split is part of GNU coreutils and ships with most Linux distributions. Confirm that your system is using GNU split before relying on long options such as --filter or --additional-suffix:
split --version | head -n 1
Example output:
split (GNU coreutils) 9.4
The version number can differ by distribution. The important part is the GNU coreutils label; BusyBox implementations expose a smaller option set.
split Command Syntax
The basic syntax follows this pattern:
split [OPTION]... [INPUT [PREFIX]]
When INPUT is omitted, or when INPUT is -, split reads from standard input. Without options, GNU split writes 1000-line chunks, uses x as the prefix, and creates names such as xaa, xab, and xac. The GNU split manual documents the full GNU option set.
Use split when the split point is based on a line count, byte size, record size, or fixed number of output files. If the boundary depends on matching content, such as a section marker or header pattern, use csplit instead.
split Command Quick Reference
| Task | Command Pattern | What It Does |
|---|---|---|
| Split by line count | split -l 1000 access.log chunk- | Writes 1000 lines per output file. |
| Split by byte size | split -b 100M backup.tar.gz backup-part- | Writes chunks near the requested byte size. |
| Keep records under a size limit | split -C 50M app.log log-part- | Keeps complete records when possible while limiting each file size. |
| Create a fixed number of byte-balanced pieces | split -n 4 dataset.csv parts- | Creates four roughly equal byte ranges, which may split lines. |
| Create a fixed number of line-safe pieces | split -n l/4 dataset.csv parts- | Creates four chunks without splitting records between files. |
| Distribute records round robin | split -n r/4 requests.log worker- | Sends alternating records to four output files. |
| Use numeric suffixes | split -d -a 4 -l 100 data.csv chunk- | Creates names such as chunk-0000 and chunk-0001. |
| Add a file extension | split -d --additional-suffix=.log -l 500 app.log chunk- | Appends an extension after each generated suffix. |
| Filter each chunk through a command | split -b 50M --filter='gzip > "$FILE.gz"' data.bin chunk- | Passes each chunk to a shell command using the $FILE variable. |
| Use a custom record separator | split -t '\0' -l 100 records.bin record- | Uses NUL instead of newline as the record separator. |
GNU size values accept suffixes such as
K,M, andGfor powers of 1024, plusKB,MB, andGBfor powers of 1000. Binary forms such asKiBandMiBare also valid in GNU coreutils.
GNU and BusyBox split Compatibility
These examples use GNU split. Minimal containers and embedded Linux systems may provide BusyBox split instead, so check split --help before using GNU-only options:
| Feature | GNU split | BusyBox split |
|---|---|---|
Line chunks with -l | Supported | Supported |
Byte chunks with -b | Supported | Supported |
Suffix length with -a | Supported | Supported |
Fixed chunk count with -n | Supported | Not in the documented BusyBox applet |
Record-aware byte limit with -C | Supported | Not in the documented BusyBox applet |
| Numeric or hexadecimal suffixes | -d, --numeric-suffixes, -x, --hex-suffixes | Not in the documented BusyBox applet |
| Additional suffixes | --additional-suffix | Not in the documented BusyBox applet |
| Output filters | --filter=COMMAND | Not in the documented BusyBox applet |
| Custom record separator | -t, --separator | Not in the documented BusyBox applet |
| Size suffixes | Large GNU unit set, including K, M, G, KB, MB, and GB | Documented as k and m |
If you need a script to run on both GNU and BusyBox systems, limit the command to -l, -b, -a, and a simple prefix. Use GNU split when you need --filter, -n, -C, numeric suffix controls, or custom record separators.
Practical split Command Examples
Split a File by Line Count
Line-based splitting works well for logs, CSV files, and other text files where each record occupies one line. This command writes five lines per chunk and names the outputs with the log-chunk- prefix:
split -l 5 access.log log-chunk-
The generated files use alphabetic suffixes such as log-chunk-aa, log-chunk-ab, and log-chunk-ac. Verify the line counts with wc:
wc -l log-chunk-*
For a 12-line sample input, the output looks like this:
5 log-chunk-aa 5 log-chunk-ab 2 log-chunk-ac 12 total
For large live logs, the tail command can inspect the newest lines in the final chunk without opening the entire file.
Split a File by Byte Size
Byte-based splitting is better for archives, disk images, and backups where line boundaries do not matter. This example creates chunks of 100 MiB each, except for the final remainder file:
split -b 100M backup.tar.gz backup-part-
Check the logical size of each generated file:
stat -c '%n %s bytes' backup-part-*
For a 243 MiB input, example output is:
backup-part-aa 104857600 bytes backup-part-ab 104857600 bytes backup-part-ac 45088768 bytes
If you are splitting files to fit a disk quota, the du command examples for disk usage analysis help check real storage consumption before you move the chunks.
Split a File into N Equal Parts
The -n option divides a file into a fixed number of output files. Plain -n N works by byte range, so it can cut through the middle of a text line:
split -n 4 dataset.csv parts-
List the generated files:
ls parts-*
parts-aa parts-ab parts-ac parts-ad
Use the l/N form when every output file must contain complete lines:
split -n l/4 dataset.csv parts-
The line-safe form may create chunks with slightly different byte sizes because record boundaries rarely land on exact byte divisions.
Keep Lines Intact When Splitting by Size
The -C option sets a maximum byte count while trying to keep each record whole. It is useful for logs and line-oriented exports that need size-limited chunks without broken records:
split -C 50M app.log app-log-
GNU split keeps complete records when possible, but a single record longer than the requested size can still be split. Choose a size larger than the longest expected line when the input must remain strictly record-safe.
Use Numeric Suffixes for Split Files
Numeric suffixes make generated names easier to scan and sort in scripts. Combine -d with -a when you expect many output files:
split -l 1000 -d -a 4 auth.log auth-
ls auth-*
For a 4000-line input split into 1000-line chunks, the names look like this:
auth-0000 auth-0001 auth-0002 auth-0003
The -a 4 option reserves four suffix characters, which gives numeric output names from 0000 through 9999. Increase the value again if the input can produce more chunks.
Add a Custom File Extension to Split Output
By default, split output files have no extension. Add --additional-suffix when another tool or workflow expects a recognizable extension:
split -l 2000 -d --additional-suffix=.log server.log chunk-
ls chunk-*
For a 6000-line input, the output names look like this:
chunk-00.log chunk-01.log chunk-02.log
The extra suffix appears after the generated suffix. In this example, chunk-00 becomes chunk-00.log.
Use an Empty Prefix for Suffix-Only Output Names
GNU split accepts an empty string as the output prefix. This is useful only when suffix-only names are intentional, such as numbered objects inside a dedicated output directory:
split -b 10M -d --additional-suffix=.part archive.tar.gz ""
For a roughly 25 MiB input, the generated names look like this:
ls *.part
00.part 01.part 02.part
Use the empty prefix only in a clean directory or with an output filter that writes into a dedicated directory. Otherwise, short names such as 00 and 01 are easy to confuse with unrelated files.
Split and Archive a Directory with tar
The split command works on files and streams, not directories. To split a directory, create a tar stream first, then send that stream into split. The tar and gzip file guide covers the archive side in more detail.
tar -C "$HOME" -cf - project | split -b 100M -d - project-backup-
The tar -C "$HOME" -cf - project portion archives ~/project to standard output, and split writes numbered chunks such as project-backup-00 and project-backup-01. Restore the archive into a separate directory before replacing the original data:
mkdir -p ~/restore
cat project-backup-* | tar -C ~/restore -xf -
Combine split with gzip Compression
To compress first and split the compressed stream, pipe gzip -c into split. This keeps the original file in place and writes size-limited compressed pieces:
gzip -c app.log | split -b 25M -d - app-log.gz.part-
Reassemble the pieces in suffix order before decompressing:
cat app-log.gz.part-* | gunzip > app-restored.log
Reassemble Split Files with cat
Reassembly uses cat because split writes byte-for-byte pieces of the original input. Shell globbing sorts padded suffixes lexicographically, which matches the order GNU split creates by default:
cat backup-part-* > backup-restored.tar.gz
Confirm the restored file matches the original:
cmp -s backup.tar.gz backup-restored.tar.gz && echo "Files match"
Files match
For checksum-based verification, compare both files with sha256sum and make sure the hashes are identical.
Advanced split Techniques
Use Verbose Mode to Monitor split Progress
When a large split operation takes time, --verbose prints each output filename before GNU split opens it:
split -b 500M --verbose database-dump.sql db-part-
For a dump large enough to create four files, output looks like this:
creating file 'db-part-aa' creating file 'db-part-ab' creating file 'db-part-ac' creating file 'db-part-ad'
Pipe Data Directly into split
Use - as the input name when another command produces the data. This example builds a file list with find with -exec, then splits the list into 50-line batches:
find ~/logs -name "*.log" -type f -exec printf '%s\n' {} \; | split -l 50 -d - filelist-batch-
Each output file contains up to 50 paths. The pipeline is useful when another script or worker process needs a manageable batch list.
Use split --filter and the FILE Variable
GNU split --filter sends each chunk through a shell command instead of writing the chunk directly. During each run, GNU split sets the $FILE environment variable to the output name it would have used:
split -b 50M --filter='gzip > "$FILE.gz"' large-export.csv chunk-
For a 120 MiB input, the compressed chunk names look like this:
ls chunk-*.gz
chunk-aa.gz chunk-ab.gz chunk-ac.gz
Use single quotes around the filter in your parent shell so $FILE reaches GNU split. If you use double quotes, your shell can expand $FILE too early and create a wrong name such as .gz. The filter syntax uses $FILE, not {}, %f, or another placeholder.
An empty prefix pairs well with --filter when you want suffix-only values inside a controlled directory:
mkdir -p chunks
split -b 50M -d --filter='gzip > "chunks/$FILE.csv.gz"' large-export.csv ""
With the empty prefix, $FILE becomes 00, 01, 02, and so on. The filter then writes names such as chunks/00.csv.gz.
Use Environment Variables for Dynamic Splitting
Shell variables make split commands easier to reuse in scripts. Quote the variables so spaces or special characters in a prefix do not break the command line:
CHUNK_SIZE="50M"
PREFIX="export-chunk-"
split -b "$CHUNK_SIZE" -d --verbose data-export.csv "$PREFIX"
The same pattern works with -l, -C, and -n as long as the variable contains the complete value that the option expects.
Suppress Empty Files When Splitting into N Parts
When -n asks for more output files than the input can fill, GNU split may create zero-byte files. Add -e to write only non-empty chunks:
split -n 20 -e small-file.txt config-part-
For example, a five-byte input split into 20 byte ranges produces only five files with -e, instead of 20 files where most are empty.
Round-Robin Distribution Across Files
The -n r/N form distributes records across N files in a round-robin pattern. The first record goes to the first file, the second to the second file, and the sequence repeats after the Nth file:
split -n r/4 requests.log worker-
Round-robin mode can balance worker inputs better than sequential splitting when line lengths or record costs vary widely.
Troubleshoot Common split Errors
Output File Suffixes Are Exhausted
If the suffix space is too small, GNU split stops with this error:
split: output file suffixes exhausted
This usually happens when -a sets a suffix length that cannot hold all required names. Increase the suffix length before rerunning the split:
split -l 10 -d -a 4 huge-log.txt chunk-
A four-digit numeric suffix provides 10,000 names. Use a larger value if your input and chunk size can produce more files.
split Cannot Open the Input File
Permission or path problems produce an error like this:
split: cannot open '/var/log/secure' for reading: Permission denied
Check that the file exists and that your user can read it:
ls -l /var/log/secure
If the input requires elevated read access but the output should stay in the current directory, read the file with sudo cat and let split run as your normal user:
sudo cat /var/log/secure | split -l 5000 - secure-chunk-
split Creates Empty Files
Empty files commonly appear when -n asks for more chunks than the input can fill. Find zero-byte output files with this check:
find . -maxdepth 1 -name 'chunk-*' -size 0 -print
Example output:
./chunk-ad ./chunk-ae
Add -e to suppress empty outputs, or choose a smaller chunk count:
split -n 10 -e small-file.txt chunk-
Reassembled File Does Not Match the Original
If a restored file differs from the original, first print the exact names that your shell will pass to cat. Replace backup-part- with your actual split prefix:
printf '%s\n' backup-part-*
Then confirm the expected number of chunk files exists:
find . -maxdepth 1 -type f -name 'backup-part-*' | wc -l
Missing chunks, renamed files, or a glob that matches unrelated files can corrupt the reassembly. Re-split from the original source when any chunk is missing or damaged.
Conclusion
The split command can now divide text, binary data, archives, and streams into pieces that fit the job: line-safe chunks, byte-limited parts, fixed worker batches, compressed filter output, or suffix-only names. Keep GNU-only options separate from BusyBox-compatible scripts, and verify reassembled files before deleting the original input.


Formatting tips for your comment
You can use basic HTML to format your comment. Useful tags currently allowed in published comments:
<code>command</code>command<strong>bold</strong><em>italic</em><blockquote>quote</blockquote>