Previous Table of Contents Next


If an example is given for syntax reasons only, the filename infile is used.

PROGRAMS

The nawk command interprets a set of commands known as the nawk program. The program can be specified as an argument on the shell command line or stored in a file. In either case the program consists of one or more pattern-action statements. The general form of the pattern-action statement is:

   pattern { action }

For example,

   nawk 'NF < 5 { print $1 }' fruits

prints the first field of a line if less than five fields exist on the line. Fields are space or tab separated words. The built-in variable NF contains the number of fields in the current line.

Nawk reads each line of input automatically. If the pattern matches the input line (record) in part or whole, then the following action is performed. If no pattern is specified, then the action is performed for every input record. The action may consist of multiple programming statements discussed later. The action must be enclosed by '{}' for nawk to function properly. The general format is:
   { action }

For example,

   nawk '{ print }' fruits

prints every line of the file infile. Thus you wrote a very simple version of the cat command. This is the default pattern-action statement of nawk.

If no action is specified, each matching input record is displayed on the standard output. The general format is:

   pattern

For example,

   nawk 'NF == 4' fruits

prints each line consisting of four fields.

WRITING PROGRAMS

It is very common to write small nawk programs in-line; that is, on the command line or even spread over a couple of lines. For example,

   nawk 'BEGIN { print "Annual Fruit Report" }
        { print }
        END { print "More fruit to Grow" }' fruits

prints the banner line Annual Fruit Report, every line of the fruits file, and a trailer line More fruit to Grow. This line can be typed from your terminal or placed in a shell script.


NOTE:  
In-line nawk programs are limited in length to the command line buffer of your system. This buffer varies; some systems are set to 512 characters of information while others are set to 10240 characters. The general rule of thumb for the length of in-line programs is that they should never exceed 2000 characters (approximately one screenful of characters).



The same nawk program written in a file format would be invoked by the command line:

   nawk -f fruitprog fruits

The fruitprog file would contain the following lines.

   BEGIN { print "Annual Fruit Report" }
   { print }
   END { print "More fruit to Grow" }

Notice the single quotes are removed; the nawk command itself is not needed for the input files.

PATTERNS

Patterns are used to select lines from the input. If a pattern matches a string on an input line, the associated action is performed. It is possible to write an nawk program that does not have a pattern, such as

   nawk '{ print $1 }' fruits

The default pattern selects all lines from the input.


NOTE:  
The BEGIN and END patterns both require an action.



The BEGIN Pattern

The BEGIN pattern does not match any input. Instead, the action associated with the BEGIN pattern is executed before any input is read by nawk. This allows for initialization of variables, the printing of headers, and other coding that needs to be done before the first line of input is read by nawk. The general format is:

   nawk 'BEGIN { initialization code; headers; etc. }' infile

The following example prints a header line.

   nawk 'BEGIN { print "Fruit Shipped From Primary State" }' fruits

The END pattern

The END pattern does not match any input. Instead, the action associated with the END pattern is executed after all of the input has been read by nawk. The general format is:

   nawk 'END { wrap up code }' infile

The following example prints the total number of lines read by nawk.

   nawk 'END { print NR }' infile

The built-in variable NR contains the Number of Records read.

Regular Expression Patterns

The following table contains each regular expression and the task that it performs when used inside a pattern. Regular Expressions are often referred to as REs in UNIX terminology. Thus we use the RE notation for uniformity and briefness.

Metacharacters

Metacharacters are the special characters used in regular expression patterns that have special meanings. Metacharacters are often referred to as special or magic characters.

Regular expression patterns must be enclosed in slashes. For example, to match any records (lines) containing the string cpu you would use /cpu/ as the pattern. The string inside the slashes may be any valid combination of the following list.


Special RE Description

c Matches the character c if c is not a special regular expression character.
\ Escapes the meaning of a metacharacter.
^ Matches the beginning of the line.
$ Matches the end of the line.
. Matches any single character other than the new-line.
[class] A character class. Matches any one character in the class.
[c1-c2] Matches any one of the ASCII characters in the range defined within the brackets.
[^class] Does NOT match any of the ASCII characters listed within the brackets. Ranges may be specified.
| Alternation of regular expressions. Matches either one or the other of the regular expressions provided.
(!) Concatenation operation. Normally the parentheses are omitted. Allows for control of precedence in the interpretation of the regular expressions specified.
RE* Matches zero or more occurrences of the preceding regular expression.
RE+ Matches one or more occurrences of the preceding regular expression.
RE? Matches zero or one occurrence of the preceding regular expression.
// The null RE refers to the last RE defined.


Previous Table of Contents Next