split Command in Linux (With Examples)

Q: How do I split a file by lines in Linux?

Use split -l N filename prefix- where N is the number of lines per output file. For example, split -l 1000 access.log chunk- creates files of 1000 lines each named chunk-aa, chunk-ab, and so on.

Q: How do I split a file into N equal parts in Linux?

Use split -n N filename prefix- to divide a file into N roughly equal-sized parts by byte count. To avoid breaking lines, use split -n l/N filename prefix- which distributes complete lines across N files.

Q: How do I merge split files back together?

Use the cat command with a glob pattern: cat prefix-* > restored-file. Shell globbing sorts the files alphabetically or numerically, which matches the order split created them. Verify integrity by comparing checksums with md5sum or sha256sum.

Q: Does the split command work on binary files?

Yes. The split command handles binary files the same way it handles text files when using byte-based splitting (-b option). Use cat to reassemble binary files, and verify with checksums. Avoid line-based splitting (-l) on binary files since binary data does not have meaningful line boundaries.

The split command in Linux divides large files into smaller, more manageable pieces. It handles text files, binary data, log files, and archives, splitting by line count, byte size, or number of output files. Whether you need to break up a multi-gigabyte log for parallel processing, fit a large backup onto size-limited storage, or distribute dataset chunks across servers, split paired with cat for reassembly gives you a reliable file-splitting workflow.

On this page hide

Understand the split Command

Practical split Command Examples

Advanced split Techniques

Troubleshoot Common split Errors

Frequently Asked Questions About the split Command

Conclusion

Understand the split Command

The split command is part of GNU coreutils and ships pre-installed on virtually all Linux distributions. Verify it is available on your system by checking its version:

split --version

split (GNU coreutils) 9.x

split Command Syntax and Options

The basic syntax follows this pattern:

split [OPTION]... [FILE [PREFIX]]

When no FILE is specified (or when FILE is -), split reads from standard input. The default behavior splits into 1000-line chunks with alphabetic suffixes, using x as the default prefix (producing xaa, xab, xac, and so on).

Quick Reference Table

Option	Description	Example
`-l, --lines=N`	Split every N lines	`split -l 1000 access.log`
`-b, --bytes=SIZE`	Split into SIZE-byte chunks	`split -b 100M backup.tar`
`-n, --number=CHUNKS`	Split into N equal-size files	`split -n 5 dataset.csv`
`-C, --line-bytes=SIZE`	Max SIZE bytes per file, keeping lines intact	`split -C 50M server.log`
`-d`	Use numeric suffixes (00, 01, 02)	`split -l 500 -d data.txt`
`-x`	Use hexadecimal suffixes (00, 01, …, 0f)	`split -b 10M -x archive.bin`
`-a, --suffix-length=N`	Generate suffixes of length N (default 2)	`split -b 1M -a 4 huge.bin`
`--additional-suffix=S`	Append a file extension to output names	`split -l 500 --additional-suffix=.log data.txt`
`-e, --elide-empty-files`	Suppress empty output files with `-n`	`split -n 10 -e small.txt`
`--verbose`	Print a message before each file is created	`split -b 50M --verbose backup.tar`
`--filter=CMD`	Write output through a shell command	`split -b 50M --filter='gzip > $FILE.gz' data.bin`
`-t, --separator=SEP`	Use SEP instead of newline as record separator	`split -t ',' -l 100 records.csv`

Size values accept suffixes: K (1024), M (1024^2), G (1024^3), or KB, MB, GB for powers of 1000.

Practical split Command Examples

Split a File by Line Count

Line-based splitting is the most common use case for text files and logs. This example divides a web server access log into chunks of 5000 lines each:

split -l 5000 /var/log/nginx/access.log log-chunk-

The -l 5000 flag sets the line count per file. The log-chunk- argument is the output prefix, producing files named log-chunk-aa, log-chunk-ab, and so on. Verify the result with the tail command or wc:

wc -l log-chunk-*

  5000 log-chunk-aa
  5000 log-chunk-ab
  5000 log-chunk-ac
  2340 log-chunk-ad
 17340 total

Split a File by Byte Size

When transferring a large backup over a network or uploading to a service with file size limits, splitting by byte size is more practical than splitting by lines:

split -b 100M /home/user/backup.tar.gz backup-part-

Each output file is exactly 100 MB (except the last, which contains the remainder). Check the resulting sizes with the du command or ls:

ls -lh backup-part-*

-rw-r--r-- 1 user user 100M Feb 10 09:15 backup-part-aa
-rw-r--r-- 1 user user 100M Feb 10 09:15 backup-part-ab
-rw-r--r-- 1 user user  43M Feb 10 09:15 backup-part-ac

Split a File into N Equal Parts

The -n option divides a file into a specific number of roughly equal-sized parts, which is useful for distributing data across worker processes:

split -n 4 dataset.csv parts-

This creates exactly four output files (parts-aa through parts-ad), each containing approximately one quarter of the original bytes. Since -n splits by byte offset, it may cut lines in the middle. To avoid splitting mid-line, use l/N instead:

split -n l/4 dataset.csv parts-

The l/4 form distributes complete lines across four files. The sizes may vary slightly because line boundaries rarely align with exact byte divisions.

Keep Lines Intact When Splitting by Size

When splitting structured data like CSV or JSON-lines files, broken lines cause parsing errors. The -C (line-bytes) option splits by size while keeping each line whole:

split -C 50M /var/log/syslog syslog-chunk-

Each output file stays under 50 MB, but no line is split between two files. If a single line exceeds 50 MB, split places it in its own file. This is the safest option for log rotation and dataset partitioning where line integrity matters.

Use Numeric Suffixes for Split Files

The default alphabetic suffixes (aa, ab) can be confusing when many files are generated. Add -d for numeric suffixes that sort naturally:

split -l 1000 -d /var/log/auth.log auth-

auth-00
auth-01
auth-02
auth-03

For files with more than 100 parts, increase the suffix length with -a to avoid running out of names:

split -l 100 -d -a 4 large-dataset.csv chunk-

This produces chunk-0000, chunk-0001, and so on, supporting up to 10,000 output files.

Add a Custom File Extension to Split Output

By default, split output files have no file extension, which makes them harder to identify. The --additional-suffix option appends an extension to every output file:

split -l 2000 --additional-suffix=.log -d server.log chunk-

ls chunk-*

chunk-00.log
chunk-01.log
chunk-02.log

Combining -d with --additional-suffix produces clean, identifiable output names that are easy to work with in scripts.

Split and Archive a Directory with tar

The split command works on files, not directories. To split a directory, pipe it through tar and gzip first:

tar cf - /home/user/project/ | split -b 100M -d - project-backup-

The tar cf - streams the archive to stdout, and split reads from stdin (the - argument). This creates 100 MB chunks named project-backup-00, project-backup-01, and so on. To restore the original directory later:

cat project-backup-* | tar xf -

Combine split with gzip Compression

For large files that need both compression and splitting, pipe gzip output directly into split:

gzip -c /var/log/syslog | split -b 25M -d - syslog-compressed-

The gzip -c flag writes compressed data to stdout instead of replacing the original file. Each output chunk is 25 MB of compressed data. Reassemble and decompress with:

cat syslog-compressed-* | gunzip > /var/log/syslog-restored

Reassemble Split Files with cat

The cat command reassembles split files in alphabetical or numeric order. Shell globbing (*) handles the sorting automatically for both alphabetic and numeric suffixes:

cat backup-part-* > backup-restored.tar.gz

Verify the reassembled file matches the original by comparing checksums:

md5sum backup.tar.gz backup-restored.tar.gz

d41d8cd98f00b204e9800998ecf8427e  backup.tar.gz
d41d8cd98f00b204e9800998ecf8427e  backup-restored.tar.gz

Matching checksums confirm a lossless split and merge cycle. This works identically for text files, binary files, and compressed archives.

Advanced split Techniques

Use Verbose Mode to Monitor split Progress

When splitting very large files, the --verbose flag prints a message each time a new output file is created, so you can monitor progress:

split -b 500M --verbose /home/user/database-dump.sql db-part-

creating file 'db-part-aa'
creating file 'db-part-ab'
creating file 'db-part-ac'
creating file 'db-part-ad'

Pipe Data Directly into split

The split command can read from standard input by using - as the filename. This is useful for piping output from other commands. For example, use find with -exec to generate a file list and split it into batches:

find /var/log -name "*.log" -type f | split -l 50 -d - filelist-batch-

Each output file contains 50 file paths, ready for parallel processing by other tools or scripts.

Filter split Output Through a Command

The --filter option passes each output chunk through a shell command before writing. The $FILE variable holds the output filename. This example compresses each chunk individually:

split -b 50M --filter='gzip > $FILE.gz' /home/user/large-export.csv chunk-

Each chunk is compressed independently, producing chunk-aa.gz, chunk-ab.gz, and so on. This is more space-efficient than splitting first and compressing separately, since split never writes uncompressed data to disk.

Use Environment Variables for Dynamic Splitting

In shell scripts, you can parameterize the split size using variables. This makes the script adaptable without editing hardcoded values:

CHUNK_SIZE="50M"
PREFIX="export-chunk-"
split -b "$CHUNK_SIZE" -d --verbose data-export.csv "$PREFIX"

Quoting "$CHUNK_SIZE" and "$PREFIX" prevents unexpected behavior if the variables contain spaces or special characters.

Suppress Empty Files When Splitting into N Parts

When using -n to split into a specific number of parts, small files may produce empty output files if the piece count exceeds the file size. The -e (--elide-empty-files) flag prevents this:

split -n 20 -e small-config.txt config-part-

Without -e, splitting a 5-line file into 20 parts would create 15 empty files. With -e, only non-empty parts are written.

Round-Robin Distribution Across Files

The -n r/N form distributes lines across N files using round-robin assignment instead of sequential blocks. The first line goes to file 1, the second to file 2, and so on, cycling back after reaching N:

split -n r/4 requests.log worker-

This creates four files with an interleaved distribution of lines, which can produce more evenly balanced workloads than sequential splitting when line lengths vary significantly.

Troubleshoot Common split Errors

Output File Limit Exceeded

If a split operation generates more files than the suffix length can support, you see this error:

split: output file suffixes exhausted

The default suffix length of 2 supports 676 alphabetic files (aa through zz) or 100 numeric files (00 through 99). Increase the suffix length with -a:

split -l 10 -d -a 4 huge-log.txt chunk-

A suffix length of 4 with -d supports up to 10,000 output files.

Cannot Open Input File

Permission or path errors produce messages like:

split: cannot open '/var/log/secure' for reading: Permission denied

Verify the file exists and your user has read access:

ls -la /var/log/secure

If the file requires root access, prefix the split command with sudo. Alternatively, redirect through a pipe that uses the correct permissions:

sudo cat /var/log/secure | split -l 5000 - secure-chunk-

Split Files Appear Empty

Empty output files typically occur when using -n with a chunk count that exceeds the number of lines or bytes in the source file:

-rw-r--r-- 1 user user    0 Feb 10 09:30 chunk-ad
-rw-r--r-- 1 user user    0 Feb 10 09:30 chunk-ae

Add the -e flag to suppress empty files, or reduce the chunk count to match the file size:

split -n 10 -e small-file.txt chunk-

Reassembled File Does Not Match Original

If the checksums of the original and reassembled file differ, the most common cause is incorrect file ordering during reassembly. Verify the glob expansion order matches the split order:

ls -1 chunk-* | sort

If the order looks correct, confirm no files are missing by comparing the expected count against the actual count:

ls chunk-* | wc -l

Missing or corrupted chunks require re-splitting from the original source.

Frequently Asked Questions About the split Command

How do I split a file by lines in Linux?

Use split -l N filename prefix- where N is the number of lines per output file. For example, split -l 1000 access.log chunk- creates files of 1000 lines each named chunk-aa, chunk-ab, and so on.

How do I split a file into N equal parts in Linux?

Use split -n N filename prefix- to divide a file into N roughly equal-sized parts by byte count. To avoid breaking lines, use split -n l/N filename prefix- which distributes complete lines across N files.

How do I merge split files back together?

Use the cat command with a glob pattern: cat prefix-* > restored-file. Shell globbing sorts the files alphabetically or numerically, which matches the order split created them. Verify integrity by comparing checksums with md5sum or sha256sum.

Does the split command work on binary files?

Yes. The split command handles binary files the same way it handles text files when using byte-based splitting (-b option). Use cat to reassemble binary files, and verify with checksums. Avoid line-based splitting (-l) on binary files since binary data does not have meaningful line boundaries.

Conclusion

The split command divides files by line count (-l), byte size (-b), or number of parts (-n), while -C keeps lines intact during size-based splits. Combine it with tar for directory archiving and gzip for compressed chunks. Reassemble the pieces with cat prefix-* and verify the result with md5sum to confirm a lossless round trip.