Purchase | Copyright © 2002 Paul Sheer. Click here for copying permissions. | Home |
This chapter introduces you to the concept of computer programming. So far, you have entered commands one at a time. Computer programming is merely the idea of getting a number of commands to be executed, that in combination do some unique powerful function.
To execute a number of commands in sequence, create a file with a .sh extension, into which you will enter your commands. The .sh extension is not strictly necessary but serves as a reminder that the file contains special text called a shell script. From now on, the word script will be used to describe any sequence of commands placed in a text file. Now do a
|
chmod 0755 myfile.sh |
which allows the file to be run in the explained way.
Edit the file using your favorite text editor. The first line should be as follows with no whitespace. [Whitespace are tabs and spaces, and in some contexts, newline (end of line) characters.]
|
#!/bin/sh |
The line dictates that the following program is a shell script, meaning that it accepts the same sort of commands that you have normally been typing at the prompt. Now enter a number of commands that you would like to be executed. You can start with
|
echo "Hi there" echo "what is your name? (Type your name here and press Enter)" read NM echo "Hello $NM" |
Now, exit from your editor and type ./myfile.sh. This will execute [Cause the computer to read and act on your list of commands, also called running the program. ] the file. Note that typing ./myfile.sh is no different from typing any other command at the shell prompt. Your file myfile.sh has in fact become a new UNIX command all of its own.
Note what the read command is doing. It creates a pigeonhole called NM, and then inserts text read from the keyboard into that pigeonhole. Thereafter, whenever the shell encounters NM, its contents are written out instead of the letters NM (provided you write a $ in front of it). We say that NM is a variable because its contents can vary.
You can use shell scripts like a calculator. Try
5 |
echo "I will work out X*Y" echo "Enter X" read X echo "Enter Y" read Y echo "X*Y = $X*$Y = $[X*Y]" |
The [ and ] mean that everything between must be evaluated [Substituted, worked out, or reduced to some simplified form. ] as a numerical expression [Sequence of numbers with +, -, *, etc. between them. ]. You can, in fact, do a calculation at any time by typing at the prompt
|
echo $[3*6+2*8+9] |
[Note that the shell that you are using allows such [ ] notation. On some UNIX systems you will have to use the expr command to get the same effect.]
The shell reads each line in succession from top to bottom: this is called program flow. Now suppose you would like a command to be executed more than once--you would like to alter the program flow so that the shell reads particular commands repeatedly. The while command executes a sequence of commands many times. Here is an example ( -le stands for less than or equal):
5 |
N=1 while test "$N" -le "10" do echo "Number $N" N=$[N+1] done |
The N=1 creates a variable called N and places the number 1 into it. The while command executes all the commands between the do and the done repetitively until the test condition is no longer true (i.e., until N is greater than 10). The -le stands for less than or equal to. See test(1) (that is, run man 1 test) to learn about the other types of tests you can do on variables. Also be aware of how N is replaced with a new value that becomes 1 greater with each repetition of the while loop.
You should note here that each line is a distinct command--the commands are newline-separated. You can also have more than one command on a line by separating them with a semicolon as follows:
|
N=1 ; while test "$N" -le "10"; do echo "Number $N"; N=$[N+1] ; done |
(Try counting down from 10 with -ge (greater than or equal).) It is easy to see that shell scripts are extremely powerful, because any kind of command can be executed with conditions and loops.
The until statement is identical to while except that the reverse logic is applied. The same functionality can be achieved with -gt (greater than):
|
N=1 ; until test "$N" -gt "10"; do echo "Number $N"; N=$[N+1] ; done |
The for command also allows execution of commands multiple times. It works like this:
5 |
for i in cows sheep chickens pigs do echo "$i is a farm animal" done echo -e "but\nGNUs are not farm animals" |
The for command takes each string after the in, and executes the lines between do and done with i substituted for that string. The strings can be anything (even numbers) but are often file names.
The if command executes a number of commands if a condition is met ( -gt stands for greater than, -lt stands for less than). The if command executes all the lines between the if and the fi (``if'' spelled backwards).
5 |
X=10 Y=5 if test "$X" -gt "$Y" ; then echo "$X is greater than $Y" fi |
The if command in its full form can contain as much as:
5 |
X=10 Y=5 if test "$X" -gt "$Y" ; then echo "$X is greater than $Y" elif test "$X" -lt "$Y" ; then echo "$X is less than $Y" else echo "$X is equal to $Y" fi |
Now let us create a script that interprets its arguments. Create a new script called backup-lots.sh, containing:
|
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do cp $1 $1.BAK-$i done |
Now create a file important_data with anything in it and then run ./backup-lots.sh important_data, which will copy the file 10 times with 10 different extensions. As you can see, the variable $1 has a special meaning--it is the first argument on the command-line. Now let's get a little bit more sophisticated ( -e test whether the file exists):
5 10 |
#!/bin/sh if test "$1" = "" ; then echo "Usage: backup-lots.sh <filename>" exit fi for i in 0 1 2 3 4 5 6 7 8 9 ; do NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **warning** $NEW_FILE" echo " already exists - skipping" else cp $1 $NEW_FILE fi done |
A loop that requires premature termination can include the break statement within it:
5 10 |
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **error** $NEW_FILE" echo " already exists - exitting" break else cp $1 $NEW_FILE fi done |
which causes program execution to continue on the line after the done. If two loops are nested within each other, then the command break 2 causes program execution to break out of both loops; and so on for values above 2.
The continue statement is also useful for terminating the current iteration of the loop. This means that if a continue statement is encountered, execution will immediately continue from the top of the loop, thus ignoring the remainder of the body of the loop:
5 10 |
#!/bin/sh for i in 0 1 2 3 4 5 6 7 8 9 ; do NEW_FILE=$1.BAK-$i if test -e $NEW_FILE ; then echo "backup-lots.sh: **warning** $NEW_FILE" echo " already exists - skipping" continue fi cp $1 $NEW_FILE done |
Note that both break and continue work inside for, while, and until loops.
We know that the shell can expand file names when given wildcards. For instance, we can type ls *.txt to list all files ending with .txt. This applies equally well in any situation, for instance:
|
#!/bin/sh for i in *.txt ; do echo "found a file:" $i done |
The *.txt is expanded to all matching files. These files are searched for in the current directory. If you include an absolute path then the shell will search in that directory:
|
#!/bin/sh for i in /usr/doc/*/*.txt ; do echo "found a file:" $i done |
This example demonstrates the shell's ability to search for matching files and expand an absolute path.
The case statement can make a potentially complicated program very short. It is best explained with an example.
5 10 15 20 |
#!/bin/sh case $1 in --test|-t) echo "you used the --test option" exit 0 ;; --help|-h) echo "Usage:" echo " myprog.sh [--test|--help|--version]" exit 0 ;; --version|-v) echo "myprog.sh version 0.0.1" exit 0 ;; -*) echo "No such option $1" echo "Usage:" echo " myprog.sh [--test|--help|--version]" exit 1 ;; esac echo "You typed \"$1\" on the command-line" |
Above you can see that we are trying to process the first argument to a program. It can be one of several options, so using if statements will result in a long program. The case statement allows us to specify several possible statement blocks depending on the value of a variable. Note how each statement block is separated by ;;. The strings before the ) are glob expression matches. The first successful match causes that block to be executed. The | symbol enables us to enter several possible glob expressions.
So far, our programs execute mostly from top to bottom. Often, code needs to be repeated, but it is considered bad programming practice to repeat groups of statements that have the same functionality. Function definitions provide a way to group statement blocks into one. A function groups a list of commands and assigns it a name. For example:
5 10 15 20 25 |
#!/bin/sh function usage () { echo "Usage:" echo " myprog.sh [--test|--help|--version]" } case $1 in --test|-t) echo "you used the --test option" exit 0 ;; --help|-h) usage ;; --version|-v) echo "myprog.sh version 0.0.2" exit 0 ;; -*) echo "Error: no such option $1" usage exit 1 ;; esac echo "You typed \"$1\" on the command-line" |
Wherever the usage keyword appears, it is effectively substituted for the two lines inside the { and }. There are obvious advantages to this approach: if you would like to change the program usage description, you only need to change it in one place in the code. Good programs use functions so liberally that they never have more than 50 lines of program code in a row.
Most programs we have seen can take many command-line arguments, sometimes in any order. Here is how we can make our own shell scripts with this functionality. The command-line arguments can be reached with $1, $2, etc. The script,
|
#!/bin/sh echo "The first argument is: $1, second argument is: $2, third argument is: $3" |
can be run with
|
myfile.sh dogs cats birds |
and prints
|
The first argument is: dogs, second argument is: cats, third argument is: birds |
Now we need to loop through each argument and decide what to do with it. A script like
|
for i in $1 $2 $3 $4 ; do <statments> done |
doesn't give us much flexibilty. The shift keyword is meant to make things easier. It shifts up all the arguments by one place so that $1 gets the value of $2, $2 gets the value of $3, and so on. ( != tests that the "$1" is not equal to "", that is, whether it is empty and is hence past the last argument.) Try
|
while test "$1" != "" ; do echo $1 shift done |
and run the program with lots of arguments.
Now we can put any sort of condition statements within the loop to process the arguments in turn:
5 10 15 20 25 30 |
#!/bin/sh function usage () { echo "Usage:" echo " myprog.sh [--test|--help|--version] [--echo <text>]" } while test "$1" != "" ; do case $1 in --echo|-e) echo "$2" shift ;; --test|-t) echo "you used the --test option" ;; --help|-h) usage exit 0 ;; --version|-v) echo "myprog.sh version 0.0.3" exit 0 ;; -*) echo "Error: no such option $1" usage exit 1 ;; esac shift done |
myprog.sh can now run with multiple arguments on the command-line.
Whereas $1, $2, $3, etc. expand to the individual arguments passed to the program, $@ expands to all arguments. This behavior is useful for passing all remaining arguments onto a second command. For instance,
|
if test "$1" = "--special" ; then shift myprog2.sh "$@" fi |
$0 means the name of the program itself and not any command-line argument. It is the command used to invoke the current program. In the above cases, it is ./myprog.sh. Note that $0 is immune to shift operations.
Single forward quotes ' protect the enclosed text from the shell. In other words, you can place any odd characters inside forward quotes, and the shell will treat them literally and reproduce your text exactly. For instance, you may want to echo an actual $ to the screen to produce an output like costs $1000. You can use echo 'costs $1000' instead of echo "costs $1000".
Double quotes " have the opposite sense of single quotes. They allow all shell interpretations to take place inside them. The reason they are used at all is only to group text containing whitespace into a single word, because the shell will usually break up text along whitespace boundaries. Try,
|
for i in "henry john mary sue" ; do echo "$i is a person" done |
compared to
|
for i in henry john mary sue ; do echo $i is a person done |
Backward quotes ` have a special meaning to the shell. When a command is inside backward quotes it means that the command should be run and its output substituted in place of the backquotes. Take, for example, the cat command. Create a small file, to_be_catted, with only the text daisy inside it. Create a shell script
|
X=`cat to_be_catted` echo $X |
The value of X is set to the output of the cat command, which in this case is the word daisy. This is a powerful tool. Consider the expr command:
|
X=`expr 100 + 50 '*' 3` echo $X |
Hence we can use expr and backquotes to do mathematics inside our shell script. Here is a function to calculate factorials. Note how we enclose the * in forward quotes. They prevent the shell from expanding the * into matching file names:
5 10 |
function factorial () { N=$1 A=1 while test $N -gt 0 ; do A=`expr $A '*' $N` N=`expr $N - 1` done echo $A } |
We can see that the square braces used further above can actually suffice for most of the times where we would like to use expr. (However, $[] notation is an extension of the GNU shells and is not a standard feature on all varients of UNIX.) We can now run factorial 20 and see the output. If we want to assign the output to a variable, we can do this with X=`factorial 20`.
Note that another notation which gives the effect of a backward quote is $(command ), which is identical to `command `. Here, I will always use the older backward quote style.