The split command in Linux divides large files into smaller, more manageable pieces. It handles text files, binary data, log files, and archives, splitting by line count, byte size, or number of output files. Whether you need to break up a multi-gigabyte log for parallel processing, fit a large backup onto size-limited storage, or distribute dataset chunks across servers, split paired with cat for reassembly gives you a reliable file-splitting workflow.
Understand the split Command
The split command is part of GNU coreutils and ships pre-installed on virtually all Linux distributions. Verify it is available on your system by checking its version:
split --version
split (GNU coreutils) 9.x
split Command Syntax and Options
The basic syntax follows this pattern:
split [OPTION]... [FILE [PREFIX]]
When no FILE is specified (or when FILE is -), split reads from standard input. The default behavior splits into 1000-line chunks with alphabetic suffixes, using x as the default prefix (producing xaa, xab, xac, and so on).
Quick Reference Table
| Option | Description | Example |
|---|---|---|
-l, --lines=N | Split every N lines | split -l 1000 access.log |
-b, --bytes=SIZE | Split into SIZE-byte chunks | split -b 100M backup.tar |
-n, --number=CHUNKS | Split into N equal-size files | split -n 5 dataset.csv |
-C, --line-bytes=SIZE | Max SIZE bytes per file, keeping lines intact | split -C 50M server.log |
-d | Use numeric suffixes (00, 01, 02) | split -l 500 -d data.txt |
-x | Use hexadecimal suffixes (00, 01, …, 0f) | split -b 10M -x archive.bin |
-a, --suffix-length=N | Generate suffixes of length N (default 2) | split -b 1M -a 4 huge.bin |
--additional-suffix=S | Append a file extension to output names | split -l 500 --additional-suffix=.log data.txt |
-e, --elide-empty-files | Suppress empty output files with -n | split -n 10 -e small.txt |
--verbose | Print a message before each file is created | split -b 50M --verbose backup.tar |
--filter=CMD | Write output through a shell command | split -b 50M --filter='gzip > $FILE.gz' data.bin |
-t, --separator=SEP | Use SEP instead of newline as record separator | split -t ',' -l 100 records.csv |
Size values accept suffixes: K (1024), M (1024^2), G (1024^3), or KB, MB, GB for powers of 1000.
Practical split Command Examples
Split a File by Line Count
Line-based splitting is the most common use case for text files and logs. This example divides a web server access log into chunks of 5000 lines each:
split -l 5000 /var/log/nginx/access.log log-chunk-
The -l 5000 flag sets the line count per file. The log-chunk- argument is the output prefix, producing files named log-chunk-aa, log-chunk-ab, and so on. Verify the result with the tail command or wc:
wc -l log-chunk-*
5000 log-chunk-aa 5000 log-chunk-ab 5000 log-chunk-ac 2340 log-chunk-ad 17340 total
Split a File by Byte Size
When transferring a large backup over a network or uploading to a service with file size limits, splitting by byte size is more practical than splitting by lines:
split -b 100M /home/user/backup.tar.gz backup-part-
Each output file is exactly 100 MB (except the last, which contains the remainder). Check the resulting sizes with the du command or ls:
ls -lh backup-part-*
-rw-r--r-- 1 user user 100M Feb 10 09:15 backup-part-aa -rw-r--r-- 1 user user 100M Feb 10 09:15 backup-part-ab -rw-r--r-- 1 user user 43M Feb 10 09:15 backup-part-ac
Split a File into N Equal Parts
The -n option divides a file into a specific number of roughly equal-sized parts, which is useful for distributing data across worker processes:
split -n 4 dataset.csv parts-
This creates exactly four output files (parts-aa through parts-ad), each containing approximately one quarter of the original bytes. Since -n splits by byte offset, it may cut lines in the middle. To avoid splitting mid-line, use l/N instead:
split -n l/4 dataset.csv parts-
The l/4 form distributes complete lines across four files. The sizes may vary slightly because line boundaries rarely align with exact byte divisions.
Keep Lines Intact When Splitting by Size
When splitting structured data like CSV or JSON-lines files, broken lines cause parsing errors. The -C (line-bytes) option splits by size while keeping each line whole:
split -C 50M /var/log/syslog syslog-chunk-
Each output file stays under 50 MB, but no line is split between two files. If a single line exceeds 50 MB, split places it in its own file. This is the safest option for log rotation and dataset partitioning where line integrity matters.
Use Numeric Suffixes for Split Files
The default alphabetic suffixes (aa, ab) can be confusing when many files are generated. Add -d for numeric suffixes that sort naturally:
split -l 1000 -d /var/log/auth.log auth-
auth-00 auth-01 auth-02 auth-03
For files with more than 100 parts, increase the suffix length with -a to avoid running out of names:
split -l 100 -d -a 4 large-dataset.csv chunk-
This produces chunk-0000, chunk-0001, and so on, supporting up to 10,000 output files.
Add a Custom File Extension to Split Output
By default, split output files have no file extension, which makes them harder to identify. The --additional-suffix option appends an extension to every output file:
split -l 2000 --additional-suffix=.log -d server.log chunk-
ls chunk-*
chunk-00.log chunk-01.log chunk-02.log
Combining -d with --additional-suffix produces clean, identifiable output names that are easy to work with in scripts.
Split and Archive a Directory with tar
The split command works on files, not directories. To split a directory, pipe it through tar and gzip first:
tar cf - /home/user/project/ | split -b 100M -d - project-backup-
The tar cf - streams the archive to stdout, and split reads from stdin (the - argument). This creates 100 MB chunks named project-backup-00, project-backup-01, and so on. To restore the original directory later:
cat project-backup-* | tar xf -
Combine split with gzip Compression
For large files that need both compression and splitting, pipe gzip output directly into split:
gzip -c /var/log/syslog | split -b 25M -d - syslog-compressed-
The gzip -c flag writes compressed data to stdout instead of replacing the original file. Each output chunk is 25 MB of compressed data. Reassemble and decompress with:
cat syslog-compressed-* | gunzip > /var/log/syslog-restored
Reassemble Split Files with cat
The cat command reassembles split files in alphabetical or numeric order. Shell globbing (*) handles the sorting automatically for both alphabetic and numeric suffixes:
cat backup-part-* > backup-restored.tar.gz
Verify the reassembled file matches the original by comparing checksums:
md5sum backup.tar.gz backup-restored.tar.gz
d41d8cd98f00b204e9800998ecf8427e backup.tar.gz d41d8cd98f00b204e9800998ecf8427e backup-restored.tar.gz
Matching checksums confirm a lossless split and merge cycle. This works identically for text files, binary files, and compressed archives.
Advanced split Techniques
Use Verbose Mode to Monitor split Progress
When splitting very large files, the --verbose flag prints a message each time a new output file is created, so you can monitor progress:
split -b 500M --verbose /home/user/database-dump.sql db-part-
creating file 'db-part-aa' creating file 'db-part-ab' creating file 'db-part-ac' creating file 'db-part-ad'
Pipe Data Directly into split
The split command can read from standard input by using - as the filename. This is useful for piping output from other commands. For example, use find with -exec to generate a file list and split it into batches:
find /var/log -name "*.log" -type f | split -l 50 -d - filelist-batch-
Each output file contains 50 file paths, ready for parallel processing by other tools or scripts.
Filter split Output Through a Command
The --filter option passes each output chunk through a shell command before writing. The $FILE variable holds the output filename. This example compresses each chunk individually:
split -b 50M --filter='gzip > $FILE.gz' /home/user/large-export.csv chunk-
Each chunk is compressed independently, producing chunk-aa.gz, chunk-ab.gz, and so on. This is more space-efficient than splitting first and compressing separately, since split never writes uncompressed data to disk.
Use Environment Variables for Dynamic Splitting
In shell scripts, you can parameterize the split size using variables. This makes the script adaptable without editing hardcoded values:
CHUNK_SIZE="50M"
PREFIX="export-chunk-"
split -b "$CHUNK_SIZE" -d --verbose data-export.csv "$PREFIX"
Quoting "$CHUNK_SIZE" and "$PREFIX" prevents unexpected behavior if the variables contain spaces or special characters.
Suppress Empty Files When Splitting into N Parts
When using -n to split into a specific number of parts, small files may produce empty output files if the piece count exceeds the file size. The -e (--elide-empty-files) flag prevents this:
split -n 20 -e small-config.txt config-part-
Without -e, splitting a 5-line file into 20 parts would create 15 empty files. With -e, only non-empty parts are written.
Round-Robin Distribution Across Files
The -n r/N form distributes lines across N files using round-robin assignment instead of sequential blocks. The first line goes to file 1, the second to file 2, and so on, cycling back after reaching N:
split -n r/4 requests.log worker-
This creates four files with an interleaved distribution of lines, which can produce more evenly balanced workloads than sequential splitting when line lengths vary significantly.
Troubleshoot Common split Errors
Output File Limit Exceeded
If a split operation generates more files than the suffix length can support, you see this error:
split: output file suffixes exhausted
The default suffix length of 2 supports 676 alphabetic files (aa through zz) or 100 numeric files (00 through 99). Increase the suffix length with -a:
split -l 10 -d -a 4 huge-log.txt chunk-
A suffix length of 4 with -d supports up to 10,000 output files.
Cannot Open Input File
Permission or path errors produce messages like:
split: cannot open '/var/log/secure' for reading: Permission denied
Verify the file exists and your user has read access:
ls -la /var/log/secure
If the file requires root access, prefix the split command with sudo. Alternatively, redirect through a pipe that uses the correct permissions:
sudo cat /var/log/secure | split -l 5000 - secure-chunk-
Split Files Appear Empty
Empty output files typically occur when using -n with a chunk count that exceeds the number of lines or bytes in the source file:
-rw-r--r-- 1 user user 0 Feb 10 09:30 chunk-ad -rw-r--r-- 1 user user 0 Feb 10 09:30 chunk-ae
Add the -e flag to suppress empty files, or reduce the chunk count to match the file size:
split -n 10 -e small-file.txt chunk-
Reassembled File Does Not Match Original
If the checksums of the original and reassembled file differ, the most common cause is incorrect file ordering during reassembly. Verify the glob expansion order matches the split order:
ls -1 chunk-* | sort
If the order looks correct, confirm no files are missing by comparing the expected count against the actual count:
ls chunk-* | wc -l
Missing or corrupted chunks require re-splitting from the original source.
Frequently Asked Questions About the split Command
Use split -l N filename prefix- where N is the number of lines per output file. For example, split -l 1000 access.log chunk- creates files of 1000 lines each named chunk-aa, chunk-ab, and so on.
Use split -n N filename prefix- to divide a file into N roughly equal-sized parts by byte count. To avoid breaking lines, use split -n l/N filename prefix- which distributes complete lines across N files.
Use the cat command with a glob pattern: cat prefix-* > restored-file. Shell globbing sorts the files alphabetically or numerically, which matches the order split created them. Verify integrity by comparing checksums with md5sum or sha256sum.
Yes. The split command handles binary files the same way it handles text files when using byte-based splitting (-b option). Use cat to reassemble binary files, and verify with checksums. Avoid line-based splitting (-l) on binary files since binary data does not have meaningful line boundaries.
Conclusion
The split command divides files by line count (-l), byte size (-b), or number of parts (-n), while -C keeps lines intact during size-based splits. Combine it with tar for directory archiving and gzip for compressed chunks. Reassemble the pieces with cat prefix-* and verify the result with md5sum to confirm a lossless round trip.
Formatting tips for your comment
You can use basic HTML to format your comment. Useful tags:
<code>command</code>command<pre>block of code</pre><strong>bold</strong><em>italic</em><a href="URL">link</a><blockquote>quote</blockquote>