Chapter 6 : Miscellany

We now fill in a variety of missing details before moving to further fundamentally new facilities of C++. Some of these extra details are just included for completeness, and will not be studied extensively.

6.1. Program layout

6.2. Basic types

6.3. Lazy programming

6.4. Lazy operators

6.5. Bit-wise operators (keenies only)

6.6. Declarations

6.7. Goto's and assignments

6.8. Error output

6.9. Strings

6.10. The "system" function

6.11. File inclusion

6.12. Formatted output

6.1. Program layout

It is vitally important that programs are laid out carefully. One of the most important considerations in programming is the readability of the program. In industry and commerce one would want a readable style, and each company will have its own style so that programs written in one department can be read by users in another. There are programs ("program formatters" or "beautifiers") which can lay out a given program to comply with certain rules. Such programs cannot, of course, insert comments, or ensure that identifiers are meaningful, but the can ensure that the indentation is correct, and that there is a reasonable amount of spacing.

There are also "program editors" or "structure editors" which enforce program layout as you create the program. They operate in different ways, I may describe some in the lecture.

A number of ordinary text editors such as "vi" and "emacs" have optional facilities for automatically laying out programs.

When marking programs, we will look for

In the content of the program, we will look for

For the moment (see later in this document) we also recommend

6.2. Basic types

A slightly longer list of basic variable types in C++ is given below, together with their sizes in bytes on some implementations.

type	PDP	68000	VAXVMS

char	1	1	1
int	2	4	4
long int	4	8	4
short int	2	2	2
float	4	4	4
double	8	8	8
unsigned int	2
unsigned long	4
unsigned short	2
unsigned char	1
(pointers	2	4	4)

The "unsigned" types are exactly what they say. They are used to store positive quantities, and can (in general) store a quantity twice a large as the corresponding signed type. You will be aware of the techniques used for the binary representations of integers. A 2-byte "int" variable would be able to store values ranging from

(approx -30,000) to

(approx 30,000). A 2-byte "unsigned int" will be able to store values from 0 to

(approx 60,000). Many quantities in real applications are know to be positive (the number of items in stock, the address of a block on disc, the length of a queue) so that unsigned variables are ideal.

sizeof

The compile-time operator "sizeof" gives the size of a supplied object in bytes, as in the example

main() {
  int i;
  cout << "Size is an int is "
    << sizeof i << "\n";
} // end main

By "compile-time" we mean that the calculated value is substituted into the program at the appropriate point by the compiler, not calculated at run-time.

Further, if in a program we had the expression

20 * sizeof i

(an operator with fixed values as its operands) we would expect the compiler to perform the multiplication once only at compile time; there would be no multiplications in the run-time of the program.

6.3. Lazy programming

In C++ you may use a numeric value wherever a logical (Boolean) condition would be expected, because "zero" is always interpreted as FALSE, and "non-zero" as TRUE. If "n" is an "int", then the code

if ( n ) ...
while ( ! n ) ...

is exactly equivalent to

if ( n != 0 ) ...
while ( n == 0 ) ...

You will often see this terse version in C++ code. The two versions should be exactly equivalent when compiled.

6.4. Lazy operators

The word "lazy" has a proper meaning in the compilation of programs. The "and" and "or" logical operators ("&&" and "||") are lazy, i.e. the operands are evaluated from left-to-right, and stop as soon as the result is determined.

i >= 0 && i < 10 && funct( i ) == 0
// evaluated left to right "lazy"
// stop as soon as "false" is encountered

ok( "Overwrite file?" ) && ok( "Sure?" )
// Second prompt only if first ok

x < 0 || x >= 10 || ...

The "and" operator stops as soon as a false expression is encountered; the "or" stops as soon as a true expression is encountered.

6.5. Bit-wise operators (keenies only)

This facility is just for the freaks, all normal mortals can ignore it. The following operators act upon a variable considered as a pattern of bits.

a << 2	  // shifted left
a >> 3	  // shifted right
a & 0177  // bit-by-bit AND
a | 0100  // bit-by-bit OR
a ^ 0100  // bit-by-bit excl OR
~ a	  // bit-by-bit negation

Such operations are normally performed on "unsigned" objects.

6.6. Declarations

Declarations within a main program or function in C programs had to be placed at the start of the program/function, before any executable statements. That is the pattern I have followed in all my examples so far. This restriction has been relaxed in C++.

We refer to any section of the program or of a function from an opening curly bracket to the corresponding closing curly bracket as a "block".

C++ allows declarations to occur (almost) anywhere; each declaration is valid until the closing curly bracket matching the most recent previous opening one, i.e. to the end of the current block.

For example

main() {
// start outer block
    int counter;
    cin >> counter;
    float recip = 1.0 / counter;
    float total = 0.0;
    while ( counter > 0 ) {
// start new block
      float x = recip / counter;
      total += x;
      cout << counter << x << total << "\n";
      counter--;
// x exists up to here
    } // end while
    cout << total;
// other variables exist up to here
} // end main

Each declaration is valid from the declaration until the closing curly bracket ending the block in which it was declared. By the word "valid" above we mean two things.

(i) The program can refer to the variable within this part of the code; and
(ii) the variable will exist at run-time until the enclosing block is left.

If an identifier is declared which is the same as an identifier in any outer enclosing block, the outer declaration becomes temporarily inaccessible until after the end of the inner block.

The declarations of "recip" and "x" above can be made "const" declarations, since the variable is not reassigned a different value. The declaration of "total" cannot be made constant, since its value is reassigned each time round the loop. If a value is not going to be reassigned, it is good practice for safety to use a "const" declaration to avoid it being accidentally overwritten. If you did this, you cannot accidentally write

if ( x = value )

since assignment to "x" would not be permitted.

Global declarations

You can also declare variables outside the main program and all of the functions. Such declarations are valid from the declaration right through to the end of the file. They can be referred to from anywhere in program or functions from the declaration to the end of the file. They will exists for the whole duration of the program.

If any identifier is re-declared in the program or a function (or in an inner block) the outer declaration again becomes inaccessible for that block.

Global variables are a convenient but dangerous way of communicating between functions, or between a function and the main program. For the moment, we recommend against their use. All information should be passed into and out of functions through their parameters.

6.7. Goto's and assignments considered harmful

This paragraph can be ignored by those who have not done programming before, and/or those who do not know what the word "GOTO" means (I certainly haven't mentioned it); you should skip to the following paragraph.

The preceding paragraph brings us to an interesting challenge. It has long been accepted that, generally speaking, a "goto" type of instruction is messy. I have not mentioned that C++ has a "goto" statement, and even if you realise that it has, you are not permitted to use it! There may however be a few occasions where it is helpful (it occurs several times in the huge amount of UNIX source code) but generally it encourages messy and tangled programming. The appearance of "goto"s generally indicates a poorly thought-out program. The use of "goto"s certainly makes the theoretical analysis of programs more difficult. For more discussion see Dijkstra's letter to the Editor of the Communications of the ACM, published March 1968, entitled "GO TO Statement Considered Harmful".

It spawned other articles such as "The Case Against the GOTO" by W A Wulf, (Proceedings of the 25th National ACM Conference, 1972, pp 791--), and "The Case for the GOTO" by M E Hopkins at the same conference. Most modern computer languages are designed to minimise the use of GOTO.

All readers start here! There are those in the real world who suggest that the use of assignment is equally harmful. You can perhaps appreciate that assignment makes proving things about programs more difficult, since the value of a given variable may change as we proceed.

The C++ feature that enables us to put declarations anywhere, and to use constant declarations, encourages a healthy train of thought to write complete programs using only constant declarations. The only other form of assignment permitted will be the use of "return" in a function, and, of course, the initialisation of constants is really a form of assignment.

The "Halberstam Function" of an integer value is defined as follows:

The Halberstam value of an integer value "n" is the number of times this operation needs to be repeated before the value 1 (one) is reached. It is guaranteed that you will eventually reach the value 1. If we start with the value 3, for example, the sequence goes 10, 5, 16, 8, 4, 2, 1; 7 steps. If we start with the value 7, the sequence goes 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1; 16 steps.

The old-fashioned solution declared as a function might typically be something like this:

int halberstam ( int n ) {
  int value = n, count = 0;
  while ( value != 1 ) {
    if ( value % 2 == 0 ) { // Even
      value = value / 2;
    } else { // "value" is odd
      value = 3 * value + 1;
    }
// Count the times round the loop
    count++;
  } // while value not yet 1
  return count;
} // end halberstam

This has two internal variables whose value is re-assigned each time round the loop.

A cleaner way might be on the following lines:

int halberstam( int n ) {
  if ( n <= 1 ) {
    return 0;
  }
  if ( n % 2 == 0 ) {
    return 1 + halberstam( n / 2 );
  }
  return 1 + halberstam( 3 * n + 1 );
} // end halberstam

The above leads naturally to the concept of "recursion". "Recursion" occurs when you use a function within its own definition. Obviously you must be careful not to let the definition recurse indefinitely (infinitely), so a recursive definition will usually have an "escape clause" such as the "if ( n <= 1 ) ... " above. We may study the ideas of recursion in more detail later.

6.8. Error output

In ordinary UNIX commands, there are two distinct output streams for each command, the "standard output" and the "standard error" streams. When you divert output using redirection to a file or down a pipe, the error stream is not diverted; this ensures that error messages still come to the screen, to make sure the user is aware of the error.

In C++ programs, any output sent to "cout" is standard output, any sent to "cerr" is error output.

int total = 0, number;
while (
  cin >> number,
  number != 0
) {
  if ( number < 0 ) {
    cerr << "Invalid data "
      << number << "\n";
    continue;
  }
  total += number;
} // end while number non zero
cout << "Total of valid data "
  << total << "\n";

Both of "cout" and "cerr" are declared in the standard "iostream.h" (or is it "stream.h"?) file.

Some later Ceilidh exercises may require that you print certain messages on the error stream, and other on the standard output. Don't forget to end "cerr" messages with a newline character. Output (on "cout" or "cerr") may not be printed if the terminating newline is missing.

6.9. Strings

Although we have not yet studied enough about arrays (collections of many objects of identical type, vectors) and pointers (addresses of objects) it will be useful to describe briefly how to use strings. When we study arrays later, all of this should make more sense.

Declaration

The declaration

char name[] = "Eric Foxley";

gives an object called "name" initialised to the value given between quotes. You might print it using its identifier as in

cout << "My name is " << name << "\n";

If you declare an uninitialised string you must give it a size as in

char name[50];

which allows for a string of maximum length 49 characters; the fiftieth is used to contain a terminating null character. You can now read a string in using

cin >> name;

This will read characters from the input into the named string until white space (a space or tab or newline) is encountered. No checks are made to confirm that the length of the declared string is sufficient, so beware of chaos if the string read in is longer than the declared length.

The string handling library

There are library functions to do most of the things you might wish to do with strings. A few of the most useful are as follows.

strlen
This gives the length of a string given as parameter.

int strlen( char [] );
char name[50];
cin >> name;
cout << "length " << strlen(name) << "\n";

strcmp
The function "strcmp" is for comparing strings. It is given two strings as arguments, and returns zero if they are identical. If the strings are not identical, it returns a positive value if the second string is alphabetically before the first, and negative if it is after it.

int strcmp( char[] , char[] );
char command[50];
while(
  cout << "Type a command: ",
  cin >> command,
  1
) {
  if (
      strcmp( command, "edit" ) == 0
  ) {
    ... call editor ...
  } else if (
      strcmp( command, "print" ) == 0
  ) {
    ... call printer ...
  } else if (
      strcmp( command, "quit" ) == 0
  ) {
    cout << "Bye!\n";
    exit( 0 );
  }
}

If you wish just to look at the leading characters of the string, use

strncmp

to limit the number of characters which are compared as in the following example.

int strncmp( char[], char[], int );
if (
  strncmp( command, "edit", 2 ) == 0
) {
    ... call editor ...
} else if (
  strncmp( command, "quit", 1 ) == 0
) {
    ....;
}

The additional parameter limits the number of characters which will be compared. There is no need to put the full string into the program, but it helps readability.

strcpy
This copies one string to another, as in

char one[50], two[50];
cin >> one
// copy string "one" into "two"
strcpy( two, one );

It copies from the second parameter to the first.

strcat
This is similar to strcpy but concatenates the right hand parameter string to the end of whatever is already in the first parameter. The strings must all be declared long enough to store the resulting strings.

For details of all such string functions, type

man string

at your terminal.

The compiler needs to know the declarations (as opposed to the definitions) of any string library functions which you use. Insteading of inserting these yourself, you should add the line

#include <string.h>

at the head of your program.

Strings as function parameters

To pass strings to a function, the function declaration might be

void error( char[], int );

and the corresponding definition

void error( char message[], int number ) {
  cerr << "Error " << message
    << " number " << number << "\n";
  exit( 1 );

This function would be called as

error( "Negative number", 5 );

The parameter type given in the declaration is "char[]", and in the definition the parameter is specified as "char <name> []".

6.10. The "system" function

For keenies only, and please do not abuse it! You can use the system function in C++ to execute any UNIX command while your program is executing. If you write

system( "who" );

the program will execute the who command at this point just as if you had typed it from the keyboard.

The parameter can, of course, be the identifier of a string variable in which you have stored an appropriate command.

6.11. File inclusion

This is a feature which we will need on the near future.

The " #include " line which you already use is a general facility. The complete contents of the named file are substituted into your program instead of the "#include" line.

If the filename given is enclosed in "<" and ">" signs, the named file is searched for in a special system directory associated with the compiler and its function libraries. If the filename is enclosed in quote symbols, as in

#include "myheader.h"

then the named file is found in your directory and inserted in the text at this point in your program. The name can include a full path, as in

#include "/staff/eric/ceilidh.h"

6.12. Formatted output

To print your output more neatly, you may wish to use formatted output techniques. This corresponds to the use of "printf" in C. The function "form" takes as arguments a format string, and the values of expressions to be inserted in it. It is simplest to explain by example.

To print decimal values:

int i, j;
cout << form( "%d times %d is %d\n", i, j, i*j );

The values following the format string are picked up and inserted for each "%" in order in the format. The rest of the format string is printed as it stands. The letter after the "%" specifies the type of format required.

To print floating values:

int i;
float x;
cout << form( "%d times %f is %f\n", i, x, i*x );

To print characters as such:

char ch;
cout << form( "Char %c has ASCII code %d\n", ch, ch );

To print an integral value in octal:

int i;
cout << form( "%d in octal is %o\n", i, i );

Possible other field definitions are

%3d	// decimal, allow for 3 digits
%03d	// zero fill on left
%-5d	// allow 5 columns but left justify
%10.3f  // float, width 10, 3 DPs
%s	// string, see later
%30s	// string, 30 spaces, right justified
%-30s	// string, 30 spaces, left justified

Further examples:

cout << form(
    "hours %d\nmins %d\nsecs %d\n",
    h, m, s );

To print the time nicely:

cout << form( "time %2d:%02d:%02d",
    h, m, s );