Previous Table of Contents Next

Module 28
csplit (SV)


The external csplit command divides an input into smaller files based on context or line numbers. The context splitting is performed based on words and patterns you specify on the command line. The line numbers must also be specified as context strings. The input file remains unaltered.


Following is the general format of the csplit command.

     csplit [ -ks ] -fprefix ] - arg1 [arg2 ... ]  csplit [ -ks ]
     -fprefix ] file arg1 [ arg2 ... ]


The following list describes the options and their arguments that may be used to control how csplit functions.

-k All output files created are not removed if an error occurs. The standard procedure for csplit is to remove all files it has created if an error occurs.
-s Suppresses the displaying of character counts. Normally csplit displays the character counts for each file it creates.
-fprefix Replaces the xx prefix used for output filenames. Must be 12 characters or less in length. Some System V systems implement BSD file systems which allow for longer filenames.


The following list describes the arguments that may be passed to the csplit command.

- Forces csplit to read from the standard input. Useful if you need csplit to read the output of a pipe.
file The input file that is read by csplit and divided into smaller files.
arg1 The strings that are used by csplit to divide the input into smaller, more manageable output files. See the discussion in the following Context section.


Using csplit allows multiple ways to divide input into smaller files. The arguments provided on the command line may be any of the following:

Enclose all regular expressions in quotes to prevent interpretation by the shell.

/expr/ Creates a file containing lines from the current input line up to but not including the location of the regular expression expr. You may place a +n or -n after this argument to specify the number of lines before or after the expression to split the input. No spaces are allowed between the expression and this offset number. The current line is changed to be the new position defined by the expression. The regular expressions recognized by ed are used by csplit.
%expr% Do not create a file containing lines specified by the regular expression. Same as the /expr/ context but the output is NOT created.
lineno Creates a file from the current position up to but not including lineno. Lineno must be an integer number.
{num} Repeat preceding argument. You use this argument in conjunction with any of the above arguments. If you use {num} after an expr, csplit uses the argument num times. If {num} follows a lineno, the input is split every lineno lines for num times. For example, csplit file 100 {10} would split the file every 100 lines.


The csplit command is an enhanced split command. It lets you divide a file based on certain criteria. You are not restricted to an equal number of lines per output file as with the split command. By specifying regular expression strings as arguments you can divide the input into files based on meaningful sections. The splitting of the input can also be done based on line numbers. Thus csplit can perform equal file splitting and context file splitting.

The output is written to files with a prefix and an extension. You can specify the prefix using the -fprefix option. If you do not specify a prefix, csplit uses the xx prefix as a default. The suffix is also placed on the output files by the csplit command. The suffixes begin at 00 and continue to 99. The input is divided into these files based on the following concepts:

xx00 First file contains all lines from the beginning of the file up to but not including the line referenced by the first argument.
xx01 Contains all lines from arg1 (argument one) up to but not including arg2.
xxNN Contains all lines from argNN-1 to the end of the file, where NN is the number of arguments you specified on the command line.

One useful example of the csplit command is to divide a large troff or nroff document into smaller sections. One way to do this is to divide each level 1 heading into a file of its own. For instance, you might try the following command:

     cj> csplit -k document '/^.H 1/' {99}

This will split a file containing troff (nroff) level 1 headings into files containing an entire level 1 heading. The -k option keeps csplit from removing all the files it created if there are not 99 occurrences of the ^.H 1 string in the input file.


One of the messages returned by csplit is not very clear, it is:

arg - out of range This message is informing you that one of the arguments you supplied did not locate a position within the allowed range. The allowed range is from the current line to the end of the file.


Refer to the split command described in Module 124. Also see Module 39 on ed.


The csplit command writes its output to files with prefixes of xx and suffixes of 00, 01, and so on up to 99.


Using csplit, you can divide a file into smaller files using various line numbers and/or strings contained within the input file. This provides you with tremendous flexibility for easily breaking a large file into smaller, more controllable files.


In this activity you use the csplit command to split the names of users into three groups based on their login names. Begin at the shell prompt.

1.  Type cut -f1 -d: /etc/passwd | sort | csplit - /^k/ /^t/ and press Return. This will divide your system's passwd file into three smaller files. The filenames will be xx00, xx01, and xx02. The 01 file contains login names beginning with a through j, 01 has login names beginning with k through s, and 02 has t through z. If your csplit does not support the - argument, an error like "cannot open. . ." is displayed, cut and sort the usernames to a file, then execute csplit with the new filename as its file argument.
2.  If you wish to view any of the xxNN files created by csplit, type cat xx00 and press Return. Replace 00 with 01 or 02 to display the other files.
3.  Remove all files beginning with xx by typing rm xx* and pressing Return.
4.  Turn to Module 69 to continue the learning sequence.

Previous Table of Contents Next