Chapter 5 : Arrays and structures

Chapter 5 : Arrays and structures

So far we have declared variables one at a time. We will often need many variables in a program, and "array"s are a technique for declaring many variables of the same type in one statement, and for being able to consider the whole collection as a single object for appropriate operations.

A "structure" is a means for combining several related items which may be of different types as a single object, while still being able to refer to the separate individual components if we wish to.

Contents

5.1. Arrays

5.1.1. Examples of the requirement for arrays

We are analysing the annual rainfall over a period of 90 years; we require 90 float variables to store the information.

We are studying a piece of English text; we require 10000 char variables to store the characters.

We are recording the numbers of students in each department of the university; we need (depending on how many departments there are) 100 int variables.

5.1.2. Declaration of arrays

The above examples would be declared as

The number of elements requested in the declaration must be fixed at compile time.

5.1.3. Calls of array elements

To get at a particular variable (element) in an array, we write the identifier of the array followed by the subscript in square brackets. In C if we declare

then the subscripts run from 0 (zero) to 89 inclusive; this gives us the required number (90) of variables.

To access the 23rd of these variables, we would put the subscript in square brackets after the array identifier, and use

The subscript can be any integer expression.

To store a value in this variable we may use

or perhaps

and to use the value stored there

The real use of arrays is when we refer to each element of the array not by a specific constant subscript, but access all elements in turn or choose a particular element dynamically. To access the variables in turn, for example to read 90 values from the data into the 90 locations, we would use

The 90 values would have to appear in the data stream in the correct order, separated by "white space", i.e. spaces, tabs or newlines.

Having read the 90 values in, we may wish to calculate the total and average rainfall over these 90 years. For this we would use

In the above examples, the constant value 90 keeps appearing. We should really take this out as a constant, so that the actual value appears at only one point in the program. The program now becomes as follows.

We will not now have problems when the number of years duration of the rainfall analysis needs to be changed; all values of the duration will change in step together when we change the value of the global constant. If we had not done this, we might have forgotten to change some of the occurrences of "90" to another value.

Note that the value of the "#define" denoting the number of elements in the array must be a genuine integer constant! When the compiler is digesting your program must know exactly how many array elements are required. The constant must not be one which is determined as the result of some calculation while the program is running.

Note also the typical C loop, starting at zero, and limited by "counter strictly less than number of elements". This gives us the range of values 0, ..., 89 if there are 90 elements. There is NO element with subscript 90.

5.1.4. Other array examples

To count the number of occurrences of the letter 'e' in a piece of text, we will write a program to read the characters of the text into an array of characters, and then search the array for occurrences of the letter 'e'. We will need an array of char elements. We will read characters from the input until we encounter a full stop.

We should declare the array long enough to hold all likely sentences, and check that the data does not overflow it.

If we ran the program, and typed

at the terminal, the code would set

and so on.

To search through the array once it has been read in looking for occurrences of the letter 'e', we use

We again control the loop by looking for the full stop '.'. It would be possible instead to count the total number of characters as they are read in, and use this count to control the upper limit of the loop.

Sorting numbers

We wish to read integers into an array (of int variables), sort them into ascending order by swapping, and then print them out. We will assume that the data is terminated by a zero value. First the declarations:

Then we read the data in:

Then we sort the numbers into ascending order by swapping:

Then we might print the results:

To print the results ten entries per line

The complete program

The above program when put together becomes as follows.

5.1.5. Array elements and indexes

Always be careful to distinguish between the operation

and

  • find the position of the largest element ...

    To find the maximum value in an array you may write

    We start by assuming that the first element is the largest, and compare each of the remaining elements (subscripts from 1 upwards) with it in turn.

    We now have the largest value in the "float" variable "max". The type of the variable "max" will be the same as the type of the array elements.

    To find the position (the "index") of the largest element, write

    We assume the position of the largest element is zero, and compare each other element with it in turn. We now have the position of the largest element in the "int" variable "maxpos". The type of the variable "maxpos" will always be int.

    Note that there is always a unique maximum value for an element of an array; there may not be a unique position of the largest element if the are several equal largest values.

    5.1.6. Searching for words

    To search for the occurrences of a particular word such as "the" in the text, we must search for occurrences of the 3 characters 't', 'h' and 'e' in adjacent positions in the array.

    Note that in this case we start the subscript at 2 rather than zero.

    We could move the subscript from zero upwards using

    In either case, we must be careful not to run off either end of the array.

    If we were actually looking for the word "the" for serious linguistic reasons, we would need to check that there were non-letters at both ends of the string "the" wherever we find it, using perhaps

    There is a standard library function is_alpha to check whether a character given as parameter represents a letter. In addition we would probably accept either a leading upper case 'T' or lower case 't'. using a construction such as

    Instead of looking for a specific word such as "the", we may wish to look for a general word (string). We would then store the word we are looking for in a second character array. To search our stored sentence for a word of length "l_word" stored in a character array "word" (assuming that the value in l_word has been set up to equal the length of the word stored in the character array word ) the second part of the program would have to be turned into a loop to compare characters in the word with characters in the array. The code might be:

    The above example would be a little more easily readable if we declared

    and use the values TRUE or FALSE later in the program.

    String functions

    Any operation like this will be much more easily performed by using the string library functions, perhaps "strncmp" in this case:

    For serious programming, always use library functions whenever they are available.

    All of the string library functions assume that the characters stored in the array are terminated by a null (zero) character. Such arrays could be printed by "printf", which will print characters from a character array until a zero element is encountered.

    5.1.7. Initialised arrays

    If you wish the values of an array to be initialised, this can be done ONLY FOR GLOBAL DECLARATIONS before the line containing "main". You can write

    with the initial values separated by commas, within curly braces. You need not put a value for the length of the array between the square brackets, since the compiler can count how many elements you have declared! This would set up an array of six elements starting at suffix zero with

    If you do put a value between the square brackets, as in

    an array of 10 elements will be declared, with the remaining elements (4 in this case) not initialised. The number given between the square brackets must be greater than or equal to the number of initialising values given.

    To initialise a character array, you could write of course

    An alternative notation has been devised because initialised strings are required so often. You can also write

    This actually initialises a 5-character array, with an additional zero element at the end; this is provided so that programs using the array can keep looping until they find the zero element. The form of a loop using this array of characters (an array with a terminating zero element) is now

    We could, of course, omit the "!= 0".

    Notice that "word" is a straightforward array of character elements. If we execute

    the word becomes "Joan".

    The size of an initialised array

    To operate on the elements of an initialised array you will need to know how many elements it has. It is bad practice to have a separate global constant giving the number of elements, declared separately from the initialised array declaration, since there is always a possibility that the length value might not agree with the actual length.

    One possibility is to use the compile-time operator sizeof (deliver the size in bytes of an object) which was mentioned earlier, and write

    Alternatively you can put a special marker element at the end of the array, and loop through the elements until you encounter that special value. For example

    Adding extra prime numbers into the declaration at a later stage will not affect the correct functioning of the loop, as long as the last element is always a zero.

    This is exactly the way that strings are scanned by library functions; you may well see the lazy version written as

    where the " "!= 0 " " is omitted.

    Efficiency of array initialisation

    To initialise a globally declared array take NO time while the program is running. The appropriate locations have already been initialied to the correct values in your executable file.

    Arrays and variables declared in global, and not otherwise initialised, are all initialised to zero.

    5.1.8. Two-dimensional arrays

    Declaration

    The above arrays are one-dimensional, and can store a single "row" or "column" or "vector" of values. Suppose we wish to store a table (two-dimensional) of values; these might be the rainfall for each of 12 months for each of 90 years, or the IQs of each of the 11 people in each of 22 football teams, or the marks for each of 20 exercises for each of 100 students, or the names (30 characters long) of each of 100 students. We now need two subscripts for each element, and would write

    Use of 2-dimensional arrays

    To read rainfall data in, we might use

    The numbers in the data must be given in the correct order required by the program; if the loops are nested as above, we would require the twelve numbers for the first year, then the twelve numbers for the second year, ...

    If the loops had been nested the other way round (just interchange the two "for" lines) the data would have had to consist of the 90 January figures, then the 90 February figures, ...

    Accessing the elements

    To calculate the annual totals for each of the 90 years we might write

    We could have used "annual_rain[ year ]" instead of "total" for our addition above; it would be marginally slower on the computer, since it would have to look up the array subscript each time.

    To print the average rainfall over the 90 years for each of the 12 months separately, we might write

    To find the first student whose name starts with the letter 'S', we might use

    leaving student set to -1 if no such student was found.

    Initialising 2-dimensional arrays

    Arrays can be initialised only when declared in global. You could write for example

    This would give us a 3 by 5 array of integers.

    There is an alternative "flattened" form

    In the flattened form we have to insert the bounds, since the compiler could not know whether we intended a 3 by 5 array or a 5 by 3 array.

    The particular case of initialised two-dimensional character arrays which will be explained more fully later, but is introduced here since you may find it useful. We are concerned with initialising an array of words. When looking at C programs, we may wish to search for all keywords. We would declare

    This is effectively a 2-dimensional array of characters. Each row of this array is of a different length, the length of that keyword + one for the terminating zero which is always added to a string.

    We could now have code to look for each keyword in turn, the element "keyword[i][j]" is the j-th character of the i-th keyword. We search along each keyword until we encounter the terminating zero.

    This subject is dealt with more fully later under the subject of pointers.

    5.1.9. Higher dimensional arrays

    You can, of course, declare and use 3-dimensional, 4-dimensional and higher dimensioned arrays by extending the above notation.

    In three dimensions

    Beware that it is very easy to eat up huge amounts of storage (the above example declares 1000 variables) with many-dimensioned arrays.

    5.2. More array examples

    5.2.1. Reflect lines

    We read text from standard input until end-of-file is encountered, printing each line as it is read in reflected from left to right. We use the function "getchar()" to read characters which reads spaces and newline characters as such. The "getchar()" function returns a negative value when it reaches end-of-file, so that the loop is controlled by a "while ... > 0" mechanism.

    Observe that we have a single loop to read the characters, which takes a special action when it encounters a newline character.

    Observe that, if the compiled program is called "reflect", then typing the command

    or

    should show the original file.

    5.3. Structures

    The purpose of structures is to group together a number of related items. The items do NOT need to be of the same type.

    5.3.1. Examples of the requirement for structures .N( We are dealing with payroll in a company. For each person employed we need their


      name (string)
      address (string)
      works number (integer)
      tax code (integer?)
      rate of pay / hour (integer)
      hours worked this week (float?)
      total pay so far this year (integer)

    .N( We are analysing the properties of a proposed bridge design. For each component of the structure we need its


      strength (float)
      dimensions (3 floats?)
      weight (float?)
      cost (integer)
      name (string)

    .N( We are working with geographical data. Each item of data consists of


      a data value of type integer
      two position co-ordinates of type float

    .N( We are working with train timetables. For each entry in the timetable we need


      departure time (int)
      arrival time (int)
      special features (Fridays only?, Buffet car?)

    5.3.2. Declarations of structure types

    We will declare first the layout (components) required of a particular type of structure; the declaration of the structure objects themselves will come later elsewhere.

    A bridge component structure type might need

    Note that this merely defines a structure type. It does not define an object of any sort.

    A general purpose structure for handling dates might be

    A structure for integer data values each of which is associated with an (x,y) co-ordinate (the x and y could represent geographical latitude and longitude or Ordnance survey co-ordinates) might be

    A structure for railway timetable entries might be

    The separate components of the structure are called its fields.

    5.3.3. Declaration of structure objects

    The above declarations are of structure types, not of structure variables for storing data values. To actually declare objects of the above structures, we might write

    Here we have declared two single structures called this and that and an array of 200 structures; the whole array is called result .

    Do not confuse the structure type declaration (uses no space, gives an identifier to the type of structure) and the variable declarations (they occupy space in the program's memory when the program is running). Typically the structure type declarations would go into a header file.

    5.3.4. Accessing elements of a structure

    In order to access individual fields of a structure we use the structure variable identifier, followed by a full stop, then the field identifier from the structure type declaration.

    The compiler will not be confused by the fact that "stock" is used both as a structure field identifier, and as an array identifier. The field identifier will always follow a full stop.

    To copy a whole structure use:

    or if you require merely to copy certain fields use:

    5.3.5. Initialising structures

    To initialise structures use a similar format to that for arrays with values between curly braces, as in

    For an array of structures, use either the nested braces as in

    or use the flattened form as in

    The length of (the number of elements in) an initialised array of structures can be calculated (a compile-time constant) by using the sizeof function to divide the total size of the array by the size of one element, as in

    Do not forget always to divide by the size of a single element of the array.

    For a simplified BritRail timetable, use perhaps

    for the structure type, and

    for the initialised declaration. You will also need

    In this case the structure type "t_table" represents a single timetable entry, whereas the array of structures "t_table" represents a whole timetable.

    5.3.6. Bit-fields in structures (keenies only)

    Ordinary mortals do not need to know that structure notation can be used to break a single computer word down into small bit-fields for storing small data items.

    The integers against each field represent the number of bits allocated. All the fields must be of unsigned type.

    5.3.7. The use of arrays of structures

    Whenever you have related items, they should be grouped in a structure. If a program contains two arrays of the same length, it can usually be inferred that the corresponding elements are related.

    You should replace

    by

    and use by

    Program style checkers look for arrays of the same length, and recommend that they be turned into a structure.

    5.4. More suggestions for #defines

    You could try

    These can make code more readable.

    5.5. The use of Typedef

    Typedef is like a "#define" for data types. To define a type `addr' which is equivalent to `int' on this machine use

    and then declare

    On another machine, if could be typedef'd to another type.

    Copyright Eric Foxley 1996


    Notes converted from troff to HTML by an Eric Foxley shell script, email errors to me!