UNIX: Scripting

Table of Contents

What is Scripting?

Interpreters

Basic Script Creation

C Shell Constructs

Basic Output

Variables

Program Control

Making Your Scripts Interactive – Input and Arguments

Selecting from a List of Possibilities – switch…case

Making Your Code Readable – Adding Remarks

What is Scripting?

UNIX is popular for a variety of reasons. One of these is its “portability”, because UNIX can be run on a wider range of platforms than almost any other operating system. UNIX is also popular for another reason, however. When people work extensively in UNIX, they often find themselves performing certain tasks over and over again, tasks which may involve multiple commands and complicated syntax. Fortunately, the UNIX shells all offer extensive capabilities for automating repetitive tasks, including the ability to schedule jobs to run even after the user has logged out, to assign strings of commands to an alias, and to write custom programs using shell commands.

Automating Tasks

There are several ways to simplify the performance of complicated tasks. You may find yourself executing a set sequence of commands which all work as part of a larger process, a sequence which you need to execute repeatedly. In this case you may wish to create a shell script. A shell script is simply a text file containing a series of shell commands. UNIX allows you to enter the name of the script at the command prompt, which then causes each command in the script to be executed in sequence. In addition to writing shell programs, we will also look at several other ways of automating jobs, including running command loops directly from the prompt, aliases, and scheduling jobs to be executed automatically.

Interpreters

Each shell has its own “language”, the set of commands which the shell can understand. Some commands work regardless of the shell, but others may only be executed within a specific shell. You may, when writing a shell script, wish to use a construct available only in a shell different from the one you are currently using. In this instance the first line of your script must specify the interpreter to be used. An example of this is the following:

#!/bin/ksh

When a script is executed with this as its first line, the shell reads the #! as meaning, “use the following shell or interpreter to execute the rest of the script.” In this instance, I have specified the Korn shell. This mechanism allows users to use commands from, for example, the Korn shell even when they are working in the C shell. One note: if you specify an interpreter such as the Korn shell in the first line of your script, then the rest of the script can only use commands which the Korn shell is capable of interpreting.

Types of Interpreters

There are several interpreters available on most every UNIX system. Virtually all systems will have the C shell (/bin/csh), Bourne shell (usually /bin/sh), and Korn shell (bin/ksh). Another interpreter which has become popular is known as Perl (usr/local/bin/perl or /usr/bin/perl). Each of these interpreters has its own strengths and weaknesses: you will have to decide each time which one best fits the task at hand. In this handout we will deal almost exclusively with the C shell.

Basic Script Creation

Before we look at the types of constructs available in the C shell, let’s look at the basic method for creating a script. First, start up your favorite editor (you don’t need to be an emacs wizard to write scripts, pico is more than adequate for our purposes). Remember to specify the interpreter on the first line. After that, simply type in, one to a line, each command in the order in which you want them to execute. When you are finished, exit the editor, saving your script to a file. The last step you need to perform before running your program is turning the execute permissions on for the file, using the chmod command. Then type the name of the file at the prompt and hit the <return> key.

C Shell Constructs

The C shell provides an extensive command language similar to the C programming language. The C shell language contains constructs for input and output, conditional operations, file management, and variable definitions, among others. If you are already familiar with a low-level programming language, then shell programming will probably be seem relatively simple. If you are not, you will nonetheless find it easy to simplify a wide range of different tasks. The most basic constructs used in shell scripts are ones with which you are already familiar. Why? Because virtually any command you can type in directly from the prompt may also be included in a shell script. This means that you can write scripts which move files, remove them, create directories, or even change file modes. In addition to these, we will examine a number of commands which will make your scripts even more powerful.

Basic Output

The basic command for printing output to the screen is the echo command. Its syntax is quite simple. Echo will simply print out whatever string follows it to the standard output (usually your monitor). By default, echo will end its output with a newline character. You can prevent the appending of the newline by adding a -n flag between echo and its argument. For example, the script

#!/bin/csh
echo  &quot;Hello  World!&quot;
echo  &quot;How  are  you  today?&quot;

will produce the output

Hello World!

How are you today?

But the script

#!/bin/csh
echo  -n  &quot;Good  morning,  &quot;
echo  &quot;Beatrice.&quot;

produces

Good morning, Beatrice.

As we shall see later, this is quite useful in cases where we want to prompt the user for input.

Variables

In the preceding examples, we followed the echo command with a quoted string. The string was printed exactly as it appeared between the quotes. This is not always the case. If we precede a word with a $, then the shell will interpret the word as the name of a variable, a flag which is assigned a value, but whose value need not remain constant. For example, we could declare a variable name to stand for the value “Frank.” Later in our script, everywhere we would normally type “Frank,” we could instead insert $name. The shell would perform variable expansion, arriving at the value “Frank.” One of the principle advantages of using variables is the ease of alteration they lend your code. To change the program to use “Mary” instead of “Frank,” we need only change the one line where we originally defined the variable.

Types of Variables

For our purposes, there are three main types of variable definitions. The most basic and most common is a string variable, one whose value is simply some string of characters. An example would be name in the code above. Sometimes it is necessary to have a single variable which can contain multiple separate values. Suppose you wanted to store a list of users; it would be confusing and inefficient to have a list of variables such as user1, user2, and so forth. Each time you added a user you would have to add a new variable. In cases such as this you may wish to use an array or wordlist variable. A variable of this type may contain several values; each one is accessed by using the variable name plus a bracketed index, e.g. $users[1]. There is yet a third type of variable which is important for us, the integer variable. These contain integer data. This allows for a variety of operations to be performed involving such variables which may not be performed with string variables, although string variables may contain numeric data.

Initializing Variables

The basic command for declaring a C shell variable is the set command. For example:

set name = “Henri”

will initialize the string variable name to contain the value “Henri.” The method for initializing a wordlist variable is slightly different. The set command is used, but the list of values is enclosed in parentheses, as in

set users = (George Frank Mary Heloise Hartsell)

We can now access each of these values individually by their indices. The syntax for initializing an integer variable is different yet again. Instead of using set, the declaration begins with the @ character:

@ count = 0

Note: the set command is only used for shell variables. Environment variables, such as DISPLAY and EDITOR, must be declared using the setenv command.

Program Control

Included in the C shell language are several constructs which allow you to control the execution of instructions. These allow execution to loop, to be made conditional (execution only takes place if certain criteria are met), or to cycle through a list of files, among other things.

if…then

One of the most important and most powerful constructs in the C shell is the if…then construct. This allows the user to force a command or group of commands to execute only if certain conditions are met. The basic syntax for this construct is as follows:

if (condition(s)) then

command(s)

endif

The condition in parentheses is first evaluated, returning a value of either 0 (false) or 1 (true). Only when the condition is true does the following command or sequence of commands execute. Each if routine must finish with an endif keyword, which has to appear on a line by itself.

if…then…else

Use of the if…then…else construction allows the user to specify a “default” command group, which will only execute if the condition after the if keyword is false; otherwise, as in the description above, the command group between the if condition and the else keyword will execute.

if (condition(s)) then

command group 1

else

command group 2

endif

Note that the else statement and default command set precede the concluding endif keyword.

Multiple Conditions – Logical AND/OR

if…then…else constructions can test for multiple criteria, linking them with logical AND and OR operators. The AND operator is &&; a compound condition incorporating it will only be evaluated as true if both (or all) the simple conditions thus linked are individually true. The OR operator is ||, and compound OR conditions will be true if either of the simple conditions is true.

Adding Punch to Conditions – File Inquiry Operators

Most conditions perform numeric or string evaluations – they check to see if a certain variable or numeric expression does indeed contain the value it is supposed to. File inquiry operators allow the user to expand the range of conditions by making easy certain types of inquiries about a file’s status. Each file inquiry operator takes the form of a hyphen followed by a single character, e.g. -z. A condition containing a file inquiry operator takes this form: (operator filename). For example, to run commands based on whether a file “mail.log” exists or not, one might use the following code:

if (-e mail.log) then

cat new.log >> mail.log

endif

In using file inquiry operators, remember that you may use a variable in place of the filename, as in

set file_to_remove = .pine-interrupted-mail

if (-z $file_to_remove) then

rm $file_to_remove

endif

This routine will check to see if the filename contained in the variable has a length of zero (is a null file), and if it does, the file is removed. Here is a partial listing of the file inquiry operators supported by the C shell:

File Inquiry Operator Checks to See if:
-d file is a directory
-e file exists
-f file is a plain file
-o user owns file
-r user has read permission
-w user has write permission
-x user has execute permission
-z file has length of zero

To reverse the value of any of these operators, i.e. check to see if the condition is false, precede the operator with ! within the parentheses, such as: (! -z filename).

Looping Execution – The foreach Statement

The foreach statement allows you to execute a command or set of commands once for each file whose name matches a pattern. An example would be a situation in which you created a script to delete zero length files and dumped core files from your home directory. Each file’s path might be /home/users1/hansel/filename The code for this script might be as follows:

#!/bin/csh
foreach  dudfile(/home/users1/hansel/*)
if  (-z  $dudfile  ||  $dudfile  ==  "core")  then
rm  $dudfile
endif
end

The first line of this script is the interpreter specification. The next line contains the foreach keyword along with the filename criteria: each file must be have an absolute path of /home/users1/hansel/. Any file immediately in the tailing directory of the path will be examined. All the commands between the foreach statement and the end keyword will be executed once for each matching filename. Within these, the if statement checks each file for zero length (the -z operator) or a filename of “core”; when either of these conditions is met, the file is removed.

Conditional Loops – the while Statement

A times you may need to execute a command set repeatedly until certain conditions are met. An if…then construction will not suffice, because its condition is only evaluated once. Placing such a condition inside a foreach loop is also insufficient, or at least clumsy, because there is no finite number of iterations – we simply want the loop to cycle indefinitely until a condition evaluates as false. In cases like this, we use a while statement. These take the form of

while (condition)

statements

end

On each repetition of the construction the condition in parentheses is evaluated. If the condition is true, then the statements inside the construction (up to the end statement) execute. The condition is then reevaluated. If the condition is false, then the loop exits and the statements within do not execute. Program execution will continue with the statement immediately following the end statement. One note: if the condition is initially false, i.e. it is never true, then the loop immediately exits without executing the internal statements.

Making Your Scripts Interactive – Input and Arguments

We have already examined the echo command, which prints lines to the standard output, allowing your scripts to communicate messages to the user. It stands to reason that there should be some mechanism for the user to communicate with the program, right? In the C shell, there are two which we will look at. The first of these involves getting input from the user during execution. The uses for this type of construct are numerous – examples include inputting a choice from a menu or getting system information during an install script. The second mechanism is the argument variable. This is a special variable mechanism which provides a fairly easy way to allow a script to take command line arguments, input at the time the script is called.

User Input During Execution

If you want the user to be able to answer queries from the program while it runs, you will need to initialize variables to hold the input. This is just how the input mechanism works. Instead of initializing a variable to a string, initialize it to hold the special variable $<, e.g.

set  uinput  =  $&lt;

This will assign everything the user types up to hitting <return> to the variable uinput, and we can then refer to this value later in the script. For example:

#!/bin/csh
echo  "Please  input  your  name:  "
set  uname  =  $&lt;
echo  "Why,  Good  Morning,  $uname!"

If you are writing a menu program, when you write the echo statement prompting the user for his or her choice, it will often make the interface more elegant to use the echo -n option, which does not send the terminating newline after the output string; this will leave the input prompt on the same line as the output string. One more note: if you will need to perform certain types of numeric operations on the input value later in the program, such as incrementing, remember to initialize it with the @ construction, which will declare the variable as an integer.

Command Line Arguments

If you want your scripts to behave like many other UNIX commands, allowing the user to pass filenames or other strings as arguments from initial command line for manipulation within the program, then you will need to make use of the C shell argument capability. Within the C shell there is the special variable argv. This variable is a predefined variable of wordlist type; each successive word on the command line (each element separated by whitespace) is one element in the array. The argv variable is special in certain respects. Let’s suppose we have written the script wrap, which allows us to perform certain functions on files to ready them for mailing (gzipping, uuencoding). We call the script with the syntax

%  wrap  infile  outfile

In this script, the value of argv[1] will be the filename of infile, and the value of argv[2] will be the filename of outfile. We can access these values through the normal variable expansion mechanism, as in

#!/bin/csh
if  (!-e  $argv[1])  then
echo  "Error:  file  $argv[1]  does  not  exist."
exit  2
endif

This checks to make certain we are attempting to wrap a file which actually exists. Most arrays begin their indexing sequences with the index 1; but argv is different, in that there is always an argv[0]. This element of the array will contain the first word on the command line, which will always be the name of the command itself. This would allow you to write, for example, boilerplate code for generating error messages, which would print out the name of the command with its proper usage. But there is a catch to using this particular value: on many systems (most of the ones I¢ € ™ve used fall into this category) the following is not legal syntax:

echo  $argv[0]

So how, then, does one access this value? Fortunately, there is yet another special characteristic of argv. Anywhere one would normally use $argv[1], $argv[2]…$argv[n], it is perfectly legal to use $1, $2…$n, where n is simply the element index within the array. If we use this method of accessing the array elements, then we would also type $0; this is in fact the only legal way to access the value of $arvg[0].

Selecting from a List of Possibilities – switch…case

Let¢ € ™s suppose that you are writing a menu program. The user is presented with a menu of possible selections, numbered from one to six (1 to 6 in the actual program). The user makes a selection, and the program, depending on the selection, performs some action. How to accomplish this last task? We could use a series of if…then…else if…then…else if…then… statements, but that would be clumsy and difficult to read, if we ever had to debug the program. Instead, we might use the much more elegant (and easy to debug) switch…case function. Basically, this type of construction uses the switch keyword to specify the criterion to be evaluated; one case statement is then provided for each possibility. In case the value input is not among those specified as one of the cases, this construction also allows the specification of one command set as the default, and the keyword which marks it as such is indeed default. Here is an example of such a construction:

#!/bin/csh
echo  -n  &quot;Please  enter  your  first  name:  &quot;
set  uname  =  $&lt;
switch  ($uname)
case  [Gg]eorge:
cat  /messages/George
breaksw
case  [Mm]ary:
cat  /messages/Mary
breaksw
case  [Ss]andy:
cat  /messages/Sandy
breaksw
default:
cat  /messages/Goodbye
exit  1
endsw

This script will take as input a user¢ € ™s name, and then print a personal message. In line four is the switch statement, which sets the input variable uname as the test value; each subsequent case statement then checks for a specific possible value. Note that one can use a pattern after the case keyword: this allows the user to input his or her name with an upper- or lowercase first character. It is also very important to note that each case routine must end with the breaksw keyword. If this is left out, then execution will continue through the commands following the next case statement without stopping until a breaksw is encountered. The entire construction, as with others we have already looked at, must end with a keyword, in this case endsw.

Making Your Code Readable – Adding Remarks

If you ever plan to revise your programs or share them with other users, you will rapidly discover the difficulty of trying to read undocumented code. While this is not so much of a burden in twenty line shell scripts as it is in seven-hundred line file of C source code, it is never much fun. It is always considered good programming practice (regardless of the language used) to liberally document your code with remark statements. Thsese are statements which will never be executed; they simply serve to explain what each line or routine is supposed to do and how it does it. To insert a remark statement, simply precede the remark with the # symbol. Everything following the # on the same line will be considered part of the remark, regardless of whether it contains legitimate shell commands or not. The # does not need to occur at the beginning of the line – it is perfectly good practice to write code like this:

if  (-e  $file_to_remove)  then #  checks  to  see  of  the  file  exists

The first part of the line is an actual instruction which will execute; the second part is simply a remark telling what the first part does. You should include a remark at any point in your program where it is not absolutely clear what the code is supposed to do.