Previous Table of Contents Next

Module 6


The external awk program is an interpretive programming language. Interpretive means awk executes the program as it reads the code, unlike the C programming language which must be compiled before it can be executed. Awk is somewhat of a cross between the egrep command and the C programming language. Like egrep it searches for a regular expression in the input. Like C it can be programmed to perform almost any data manipulation processing.

The new version of awk is named nawk (new awk). Some vendors provide only the old version of awk, some provide nawk under the name of awk, while others provide awk and nawk. System V Release 4.0 provides awk and nawk. It is advisable to use nawk unless of course you have to consider portability to systems that do not support nawk. This module describes the nawk program.

Nawk lends itself well to generating reports, transforming data, retrieving information, and validating data. It is designed to handle data in rows and columns, but it is also useful in locating specific data within free formatted data streams. Its features include:
*  Structures, operators, and syntax resembling the C programming language
*  Regular expression pattern comparison
*  Multisource input. It can read from files, pipes, and the keyboard in the same program. It can be interactive.
*  Multidestination output. It can write to multiple files, pipes, and the terminal.
*  Formatted output using the C printf function
*  Automatically declares typeless variables (a variable may be string or number)
*  Automatically parses input lines into fields (columns) and records (lines)


Following is the general format of the nawk command.

   nawk [ -Fs ] program [ VAR=VAL ] [ file_list ] [ - ]
   nawk [ -Fs ] -f prog_file [ VAR=VAL ] [ file_list ] [ - ]


The following list describes the options and their arguments that may be used to control how nawk functions.

-Fs Set field separator to regular expression s; default is white space (spaces and/or tabs).
-f prog_file The -f informs nawk that the next argument is the name of the file containing the nawk commands. The file prog_file contains the nawk program (commands). This allows you to write large nawk programs without exceeding the shell's or nawk's input buffer limit on the command line. The input buffer on most systems is between 512 and 10240 bytes. To find out your system's limit, type grep "ARG_MAX" /usr/include/limits.h and press Return. On BSD systems, type egrep "NCARGS" /usr/include/sys/param.h and press Return.


The following list describes the arguments that may be passed to the nawk command.

'program' A single argument containing the nawk program (commands). The argument should be enclosed in single quotes to keep the shell from interpreting it.
VAR=VAL Command line variables. These variables are passed to the nawk program for internal use.
file_list One or more files containing data to be scanned and processed by nawk.
- The standard input is read as input. This may be intermixed with files in the file_list.
If no files are specified for input, nawk reads from the standard input.


The examples throughout this module assume you have a file named fruits in your HOME directory. The file should contain the following data.

   Apples     Washington  6,700,000   10.59
   Oranges    Florida     5,900,000   11.69
   Peaches    Texas       3,600,000   13.79    Incomplete
   Pears      California  3,100,000   12.89
   Raisins    California  2,300,000   15.00
   Cherries   Missouri    2,100,000   17.49
   Pineapple  Hawaii      3,900,000   12.99
   Coconuts   Hawaii      2,600,000   14.39    Incomplete

Previous Table of Contents Next