There are many programming applications in which we have to read text a character at a time to analyse it in some way.
spell -b file
to check the words in the named file against British spelling.
The simple use of the standard input "cin" to read characters as in
char ch; while( cin >> ch, cin !='.' ) { ...
has the logical but inconvenient habit that all white space is ignored. Any spaces, tab characters or newlines in the input are completely invisible to the program. The loop in the above example read characters from the input until a specific character (a full stop / period) is encountered.
In all of the applications discussed above, we need to be aware of white space characters. We must use the function
cin.get( ch )
to read each character, and we need a new technique to detect the end of the data. To indicate the end of data from a terminal, you type "control-D". This is referred to as "end of file" or "EOF" whether we are reading from the terminal (and encounter a control-D) or from a genuine file and encounter the end of the file. The C++ feature for this purpose is the function
cin.eof()
which delivers TRUE if we are at the end of the data, and FALSE otherwise. This detects end-of-file when reading from "cin" as in
prog99 < data_file
If you are reading from an file which you have opened from within the program, you may need to use "fin.eof()" instead.
We thus have the program outline
while ( ! cin.eof() ) { cin.get( ch ); if ( ch == ' ' || ch == '\t' // tab || ch == '\n' // newline ) { ... ....; } // end if white space ....; } // end while not EOF
or
while ( ! cin.eof() ) { cin.get( ch ); switch( ch ) { case 'p' : ....; break; case 'q' : ....; break; default : ....; } // end switch ....; } // end while not EOF
or
while ( ! cin.eof() ) { cin.get( ch ); if ( ch >= 'a' && ch <= 'z' ) { ....; } ....; } // end while not EOF
or
While reading characters, many of the applications above require us to split the input text into words. If we are word processing (laying out text on a page), each word includes its terminating punctuation. If a word and its punctuation cannot be fitted onto a line, it must all be moved to the start of the next line. We determine the end-of-word by looking for white space.
If we are writing a spelling checker, we split the text into words and compare them with words in a dictionary. In this case, the terminating punctuation is not part of the word. We determine end-of-word by finding a non-alpha character, i.e. one which is not a letter.
The definition of a word thus depends on the application.
Note the possible codings for reading a word at a time from the data. One possibility often seen is
while( ! cin.eof() ) { while( get char, it isn't a letter ) { skip it; } while( get char, it is a letter ) { store it; } process the word just read; } // end while ! eof loop
The inner loops should both contain checks for end of file. The problem just does not need nested loops.
A much better and safer solution is
while( ! cin.eof() ) { cin.get( ch ); if ( ch is a letter ) { store it; } else if ( ch is the first non-letter ) { process the word now stored; } else { skip it; } } // end while ! eof loop
It is often useful to have a number of strings stored as an array, so that we can print the i-th string of a set, or search for a command name among a set of alternatives.
We declare (in global, because it initialises an array)
char *months[] = { "January", "February", "March", "April", "" };
This gives us
months[0] | is the string | "January" |
months[1] | is the string | "February" |
months[2] | is the string | "March" |
months[3] | is the string | "April" |
months[4] | is the string | "" |
We could thus print the name of the i-th month using
cout << months[i] << "\n";
The first character of the name of the the i-th month is
months[i][0]
We could search through the strings for a particular string
char word[20]; cin >> word; for( i = 0; months[i][0]; i++ ) { if ( strcmp( word, months[i] ) == 0 ) { // found it ... } }
Although this study properly belongs to the area of pointers (which is covered properly in the next course, but is summarised without exercises in an extra unit at the end of this course) some useful applications are described below.
When you type a UNIX command as in
prog99 this that other
the system generates two arguments which the program can access if it wishes. To access the given parameters, the main program should start
main( int argc, char *argv[] ) { ....
The program is then supplied by the system on startup with two arguments.
Thus in the above example, the arguments are set up as if we had included
int argc = 4; char *argv[] = { "prog99", "this", "that", "other", 0 };
The convention is to use "argc" (argument count) and "argv" (argument values) as the argument names, although such names are purely local to your main program.
Thus we now have the value of argv[0] as the string "prog99", of argv[1] as the string "this", etc.
We can now check that there is at least one argument using
if ( argc > 1 ) { ...
(the "cp" command for example always checks that it has at least two arguments) and we can access its value by
cout << argv[1] ...
Each of the strings in the array will have a terminating zero on it; and the array of strings finishes with a zero.
The program can loop through all the parameters in turn with
int argno; for( argno = 1; argno < argc; argno++ ) { cout << argv[ argno ] << " "; } cout << "\n";
This will print the arguments on a single line separated by spaces. This correspond to the "echo" command in UNIX.
Note that these are not global variables; they are parameters to the main program. They can therefore be accessed only from within the main program, or by being passed as parameters to other functions.
Note that if you type, for example
prog99 *.C
then the UNIX shell first expands the "*.C" into the names of all files in the current directory ending ".C", and passes all of these over to the program. The program may thus find large numbers of parameters being passed to it. The asterisk generally will not appear in the program's parameter. Thus the command
echo *.C
echoes the names of all files ending ".C".
If there is NO file matching the requested pattern, the Bourne shell and its derivatives pass over the parameter as a string containing the asterisk.
A program can detect flag arguments (arguments starting with a '-') by a construct such as
int argno; for ( argno = 1; argv[ argno ][ 0 ] == '-'; argno++ ) { // argument "argno" is a flag switch( argv[ argno ][ 1 ] ) { case 'l' : ...; break; case 't' : ...; break; ....; } // end switch } // for all arguments
The arguments starting with a '-' are examined in turn, and actions taken depending on the letter following the '-'. We leave the loop with "argno" indicating the first argument NOT starting with a '-'.
It is possible to read numeric values from arguments. For example, you may wish to give the rate of pay and hours worked as integer arguments, and type
prog32 152 45
instead of typing the values as data) the above arrangements would set "argv[1]" to the string "152", and "argv[2]" to the string "45".
The program could then use the library function "atoi" (ASCII to integer) and write
if ( argc > 2 ) { rate = atoi( argv[1] ); hours = atoi( artv[2] ); }
There is a similar function delivering floating point values "atof".
Always check that there are enough arguments before trying to access them.
There is a third argument available if you wish, containing details of the program's running environment.
If you write
main( int argc, char *argv[], char *envp[] ) {
as the program heading, the additional third parameter to the main program is another array of strings, this time set to a value such as
char *envp[] = { "USER=ef", "HOME=/staff/ef", "TERM=vt100", "EDITOR=emacs", "SHELL=bash", 0 }
Each string in the array is of the form
<environment variable>=<value>
You could search this array to find the settings for any variable in which you are interested.
To make life easier, you can declare
char *getenv( char *);
and use the library function "getenv" as in
char *term = getenv( "TERM" ); char *edit = getenv( "EDITOR" );
This is straying into the territory of pointers, which properly belongs to the next course.
Every large company has its conventions for choosing identifiers. We have not enforced any particular convention. Some companies have conventions with which we violently disagree (such as "identifiers for integer variables must start with "i" or "j" or "k" ...).
Identifier names should certainly be meaningful. They will therefore consist of several words. Some users compose with capitalised initial letter for each word (such as TotalCostPerHour for example), while other prefer total_cost_per_hour with underscores separating the components.
Older compilers sometimes limited the length of identifiers to eight significant characters (if the identifier was longer, only the first eight characters were significant) but there is no known C++ compiler with such a restriction. That is an advantage of a modern language.
There are occasions where the "#define" facility of the C++ preprocessor can be useful.
If you write
#define MAX 150
near the top of a program, then everywhere that the identifier "MAX" occurs in the text, it is substituted by the string "150" before the program is passed to the compiler.
The string which is substituted can be absolutely any string of characters. Thus if you write
#define NU "The University of Nottingham"
the given string (including the quote symbols) will be substituted at every occurrence. This will result in may occurrences of the actual string. For a specific value such as this, you would normally use a global constant in C++.
If you define
#define TOTAL (n_small + n_medium + n_large)
then every occurrence of "TOTAL" will be substituted by the given expression, which will be compiled and re-evaluated at every run-time encounter in the program.
Consider
#define EVER ;;
and
for ( EVER ) { ...
What can I say? This is a skill you must develop yourself.
Develop your program in small steps. Don't write 200 lines of code and expect it all to work and be easy to debug. Create the program in stages, testing each stage as you develop it. You will learn with experience how much it is wise to add at a time.
Put additional printing statements into the program until you are sure that it works. This way you can check intermediate results, and convince yourself that results are correct.
Re-use previously developed and test code wherever possible. This may mean your own code, but more often means the use of library code. It is usually worth the effort of looking up library functions for many operations.
Your programs are intended to useful, and to be economic to develop and run. In this context
Copyright Eric Foxley Thu Dec 5 10:46:26 GMT 1996
I have appended for interest a summary of the C++ programming standards used by one industrial company, "Ellemtel". The actual description of the standards occupies 82 pages, this is just a one-line summary of each of them. We hope that one day Ceilidh will enforce ALL of them!
They are divided into "Rules" (general application), "Recommendations" (recommended, not mandatory) and "Portability Rules" (only needed for applications which need to be portable, which for any large organisation would mean all applications). Summary of Rules
Notes converted from troff to HTML by an Eric Foxley shell script, email errors to me!