Previous Table of Contents Next


Module 122
sort

DESCRIPTION

The external sort command sorts ASCII data. The input can be from files or the standard input. The output can be a file or the standard output. If multiple input files are given, the data from each file is merged during the sort.

COMMAND FORMAT

Following is the general format of the sort command.

     sort [ - ] [ -cmu ] [ -ooutput ] [ -ykmem ] [ -zrecsz ] [ -dfiMnr ] \
          [ -btx ] [ +pos1 [ -pos2 ] ] [ file_list ]

BSD (Berkeley)
sort [ - ] [ -cmu ] [ -ooutput ] [ -dfinr ] [ -btx ] \
[ +pos1 [ -pos2 ] ] [ -Tdirectory ] [ file_list ]

Options

The following list describes the options and their arguments that may be used to control how sort functions.

- Forces sort to read from the standard input. Useful for reading from pipes and files simultaneously.
-c Verifies that the input is sorted according to the other options specified on the command line. If the input is sorted correctly then no output is provided. If the input is not sorted then sort informs you of the situation. The message resembles this.
sort: disorder: This line not in sorted order.
-m Merges the sorted input. sort assumes the input is already sorted. sort normally merges input as it sorts. This option informs sort that the input is already sorted, thus sort runs much faster.
-ooutput Sends the output to file output instead of the standard output. The output file may be the same name as one of the input files.
-u Suppress all but one occurrence of matching keys. Normally, the entire line is the key. If field or character keys are specified, then the suppressing is done based on the keys.
-ykmem Use kmem kilobytes of main memory to initially start the sorting. If more memory is needed, sort automatically requests it from the operating system. The amount of memory allocated for the sort impacts the speed of the sort significantly. If no kmem is specified, sort starts with the default amount of memory (usually 32K). The maximum (usually 1 Megabyte) amount of memory may be allocated if needed. If 0 is specified for kmem, the minimum (usually 16K) amount of memory is allocated.
-zrecsz Specifies the record size used to store each line. Normally the recsz is set to the longest line read during the sort phase. If the -c or -m options are specified, the sort phase is not performed and thus the record size defaults to a system size. If this default size is not large enough, sort may abort during the merge phase. To alleviate this problem you can specify a recsz that will allow the merge phase to run without aborting.

Ordering Rule Options

-d Specifies dictionary sort. Only blanks, digits, and alphabetic characters are significant in the comparison.
-f Fold lowercase letters to uppercase. Ignores the significance of upper and lowercase ASCII characters.
-i Ignore characters outside the ASCII range of 040 (octal) and 0176 (octal). Only alphabetic characters, blanks, digits, and punctuation are used for comparison (printable characters). Control characters are ignored. This is only valid for nonnumeric sorts.
-M Compare fields as months. The first three nonblank characters are folded (see -i option) to uppercase and compared. Thus January is compared as JAN. JAN precedes FEB, and fields not containing months precede JAN. The -b option is placed in effect automatically.
-n Sorts the input numerically. The comparison is based on numerical value instead of alphabetic value. The number field used for comparison can contain optional blanks, optional minus signs, optional decimal point, and zero or more digits. The -b option is placed in effect automatically. Exponential numbers are not sorted correctly.
-r Reverse the order of the output.

Restricted Sort Key Options

+pos1 Specifies the beginning position of the input line used for field comparison. If pos1 is not specified then comparison begins at the beginning of the line. The pos1 position has the notation of f.c. The f specifies the number of fields to skip. The c specifies the number of characters to skip. For example, 3.2 is interpreted as skip three fields and two characters before performing comparisons. Omitting the .c portion is equivalent to specifying .0. Field one is referred to as position 0. If f is set to 0 then character positions are used for comparison.
-pos2 Specifies the ending position of the input line used for field comparison. If pos2 is not specified then comparison is done through the end of the line. The pos2 position has the notation of f.c. The f specifies to compare through field f. The c specifies the number of characters to compare through after field f. For example, -4.3 is interpreted as compare through three characters after the end of field four. Omitting the .c portion is equivalent to specifying .0.
-b Ignores leading blanks when using the restricted sort key positions (+pos1 and -pos2). If the -b option is placed before the first +pos1 argument, then it applies to all +pos1 arguments. The -b option can be attached to each pos string to affect only that field.
-tc Use character c as the field separator. Multiple c's may be specified to represent empty fields surrounded by separators.

Both these options and any of the ordering rule options (d, f, i, M, n, and r) can be attached to position arguments. For example,

     cj> sort -t: +2.0n -3.0 +0.0bdf -1.0 /etc/passwd

This command sorts the passwd file with the primary key field being field 3. The secondary key is based on field 1. Field 3 is sorted as numbers, while field 1 is sorted in dictionary order, ignoring uppercase versus lowercase characters.


BSD (Berkeley)
-Tdirectory Specifies the directory to store temporary files.

Arguments

The following arguments may be passed to the sort command.

file_list One or more input files to be read and sorted.
If no files are specified the standard input is read.

FURTHER DISCUSSION

The sort command can read from the standard input and write to the standard output. Thus it can be used in pipes or have its output redirected as desired. For example, sort can be used in the following combinations of commands.

     sort infile > outfile
     sort infile | lp
     who | sort
     sort infile -o infile

In the first command sort reads and sorts infile and displays the result on standard output, which has been redirected to outfile. In the second command sort reads and sorts infile and displays the results on standard output, which is piped to the lp command. The third command pipes the output of the who command to the input of the sort command. The sort then displays its output on the standard output (your screen). In the last command sort reads from the file infile and writes the sorted output to the file infile.

By default the sort command sorts based on entire lines. It compares the first character of two lines. If the first character is the same, then the second character is compared, and so on until the entire line is compared or the correct order is determined. Leading blanks (spaces and tabs) are considered valid characters to compare. Thus a line beginning with a space precedes a line beginning with the letter A. This is because the space is octal 040 in the ASCII sequence and an A is octal 101.

If the ordering rule options precede the sort key options, they are globally applied to all sort keys. For example,

     sort -r +2 -3 infile

reverses the order based on field 3. If the ordering rule options are attached to a sort key option, they override the global ordering options for only that sort key. For example,

     sort -r +2d -3d infile

sorts field 3 by dictionary comparison but sorts the rest of the line using reverse comparison.

Multiple sort keys may be used on the same command line. The first key is used to sort the lines. If two lines have the same value in the same field, sort uses the next set of sort keys to resolve the equal comparison. For example,

     sort +4 -5 +1 -2 infile

means to sort based on field 5 (+4 -5). If two lines have the same value in field 5, sort those two lines based on field 2 (+1 -2). If all the sort keys match, then the rest of the line is used for comparison.

DIAGNOSTICS AND BUGS

If equal keys are referenced sort does not necessarily use the relative order of the lines.

RELATED COMMANDS

Refer to the comm, join, and uniq commands described in modules 22, 69, and 149.

RELATED FILES

The sort command uses a temporary file /usr/tmp/stm???.

APPLICATIONS

You use the sort command to sort data alphabetically or numerically, in ascending or descending order. You can sort based on entire lines, fields, or character columns. You can merge files using sort and remove duplicate lines with it.

TYPICAL OPERATION

In this activity you use the sort command to sort the /etc/passwd file in using various sort keys and ordering rules. Begin at the shell prompt.

1.  Type sort -t: +2 /etc/passwd and press Return. Notice the order of the output is based on the third field of the passwd file, user IDs. But notice that the order of the numbers are not correct.
2.  Type sort -rn -t: +2 /etc/passwd and press Return. Notice the order of the numbers is now correct, but in reverse order. The -n option informs sort to compare the specified field as numbers, not ASCII characters. The r option reverses the order of the sort.
3.  Type sort -t: +5 -6 +0 -1 /etc/passwd and press Return. The output is now sorted by field 6, then by field 1 if necessary.
4.  Turn to Module 149 to continue the learning sequence.


Previous Table of Contents Next