Chapter 7 : Arrays and structures

Chapter 7 : Arrays and structures

So far we have declared variables one at a time. We will often need many variables in a program, and "array"s are a technique for declaring many variables of the same type in one statement, and for being able to consider the whole collection as a single object for appropriate operations.

A "structure" is a means for combining several related items which may be of different types as a single object, while still being able to refer to the separate individual components if we wish to.

Contents

7.1. Arrays

7.1.1. Examples of the requirement for arrays

We are analysing the annual rainfall over a period of 90 years; we require 90 float variables to store the information.

We are studying a piece of English text; we require 10000 char variables to store the characters.

We are recording the numbers of students in each department of the university; we need (depending on how many departments there are) 100 int variables.

7.1.2. Declaration of arrays

The above examples would be declared as

The number of elements requested in the declaration must be fixed at compile time.

7.1.3. Calls of array elements

To get at a particular variable (element) in an array, we write the identifier of the array followed by the subscript in square brackets. In C++ if we declare

then the subscripts run from 0 (zero) to 89 inclusive; this gives us the required number (90) of variables.

To access the 23rd of these variables, we would put the subscript in square brackets after the array identifier, and use

The subscript can be any integer expression.

To store a value in this variable we may use

or perhaps

and to use the value stored there

The real use of arrays is when we refer to each element of the array not by a specific constant subscript, but access all elements in turn or choose a particular element dynamically. To access the variables in turn, for example to read 90 values from the data into the 90 locations, we would use

The 90 values would have to appear in the data stream in the correct order, separated by "white space", i.e. spaces, tabs or newlines.

Having read the 90 values in, we may wish to calculate the total and average rainfall over these 90 years. For this we would use

In the above examples, the constant value 90 keeps appearing. We should really take this out as a constant, so that the actual value appears at only one point in the program. The program now becomes as follows.

We will not now have problems when the number of years duration of the rainfall analysis needs to be changed; all values of the duration will change in step together when we change the value of the global constant. If we had not done this, we might have forgotten to change some of the occurrences of "90" to another value.

Note that the value of the "const" denoting the number of elements in the array must be a genuine integer constant! When the compiler is digesting your program must know exactly how many array elements are required. The constant must not be one which is determined as the result of some calculation while the program is running.

Note also the typical C++ loop, starting at zero, and limited by "counter strictly less than number of elements". This gives us the range of values 0, ..., 89 if there are 90 elements. There is NO element with subscript 90.

7.1.4. Other array examples

To count the number of occurrences of the letter 'e' in a piece of text, we will write a program to read the characters of the text into an array of characters, and then search the array for occurrences of the letter 'e'. We will need an array of char elements. We will read characters from the input until we encounter a full stop.

We should declare the array long enough to hold all likely sentences, and check that the data does not overflow it. Another "safe" way to code the loop might be

If we ran the program, and typed

at the terminal, the code would set

and so on.

Note that "cin >> character" ignores all white space (spaces, tabs and newlines), so that we would have set

even though we typed a space after "The".

To search through the array once it has been read in looking for occurrences of the letter 'e', we use

Or we could use

to control the loop

We again control the loop by looking for the full stop '.'. It would be possible instead to count the total number of characters as they are read in, and use this count to control the upper limit of the loop.

Beware of looking for a pair of consecutive letters with, for example

to look for the string "he". The "i+1" suffix means that you must terminate the loop one position earlier. If you terminate the loop by

you will miss the terminator on an empty sentence.

Sorting numbers

We wish to read integers into an array (of int variables), sort them into ascending order by swapping, and then print them out. We will assume that the data is terminated by a zero value. First the declarations:

Then we read the data in:

Then we sort the numbers into ascending order by swapping:

Then we might print the results:

To print the results ten entries per line

The complete program

The above program when put together becomes as follows.

7.1.5. Array elements and indexes

Always be careful to distinguish between the operation

and

  • find the position of the largest element ...

    To find the maximum value in an array you may write

    We start by assuming that the first element is the largest, and compare each of the remaining elements (subscripts from 1 upwards) with it in turn.

    We now have the largest value in the "float" variable "max". The type of the variable "max" will be the same as the type of the array elements.

    To find the position (the "index") of the largest element, write

    We assume the position of the largest element is zero, and compare each other element with it in turn. We now have the position of the largest element in the "int" variable "maxpos". The type of the variable "maxpos" will always be int.

    Note that there is always a unique maximum value for an element of an array; there may not be a unique position of the largest element if the are several equal largest values. The above code gives us the position of the first of the equal maxima; if we replaced the

    by

    we would get the position of the last of the equal maxima.

    7.1.6. Searching for words

    To search for the occurrences of a particular word such as "the" in the text, we must search for occurrences of the 3 characters 't', 'h' and 'e' in adjacent positions in the array.

    Note that in this case we start the subscript at 2 rather than zero.

    We could move the subscript from zero upwards using

    In either case, we must be careful not to run off either end of the array.

    If we were actually looking for the word "the" for serious linguistic reasons, we would need to check that there were non-letters at both ends of the string "the" wherever we find it, using perhaps

    There is a standard library function is_alpha to check whether a character given as parameter represents a letter. In addition we would probably accept either a leading upper case 'T' or lower case 't'. using a construction such as

    Instead of looking for a specific word such as "the", we may wish to look for a general word (string). We would then store the word we are looking for in a second character array. To search our stored sentence for a word of length "l_word" stored in a character array "word" (assuming that the value in l_word has been set up to equal the length of the word stored in the character array word ) the second part of the program would have to be turned into a loop to compare characters in the word with characters in the array. The code might be:

    The above example would be a little more easily readable if we declared

    and use the values TRUE or FALSE later in the program.

    String functions

    Any operation like this will be much more easily performed by using the string library functions, perhaps "strncmp" to compare strings in this case:

    For serious programming, always use library functions whenever they are available. Type

    for details.

    All of the string library functions assume that the characters stored in the array are terminated by a null (zero) character. Such arrays could be printed by "cout", which will print characters from a character array until a zero element is encountered.

    If you read a string, you should use

    to insert the terminating null.

    7.1.7. Initialised arrays

    If you wish the values of an array to be initialised, you can write ONLY IN GLOBAL

    with the initial values comma-separated, between curly braces. You need not put a value for the length of the array between the square brackets, since the compiler can count how many elements you have declared! This would set up an array of six elements starting at suffix zero with

    If you do put a value between the square brackets, as in

    an array of 10 elements will be declared, with the remaining elements (4 in this case) not initialised. The number given between the square brackets must be greater than or equal to the number of initialising values given.

    To initialise a character array, you could write of course

    An alternative notation has been devised because initialised strings are required so often. You can also write

    This actually initialises a 5-character array, with an additional zero element at the end; this is provided so that programs using the array can keep looping until they find the zero element. The form of a loop using this array of characters (an array with a terminating zero element) is now

    We could, of course, omit the "!= 0".

    Notice that "word" is an ordinary array of character elements. If we execute

    the word becomes "Joan".

    The size of an initialised array

    To operate on the elements of an initialised array you will need to know how many elements it has. It is bad practice to have a separate constant giving the number of elements, declared separately from the initialised array declaration, since there is always a possibility that the length value might not agree with the actual length.

    One possibility is to use the compile-time operator sizeof (deliver the size in bytes of an object) which was mentioned earlier, and write

    and then

    Alternatively you can put a special marker element at the end of the array, and loop through the elements until you encounter that special value. For example

    Adding extra prime numbers into the declaration at a later stage will not affect the correct functioning of the loop, as long as the last element is always a zero.

    This is exactly the way that strings are scanned by library functions; you may well see the lazy version written as

    where the " "!= 0 " " is omitted.

    Efficiency of array initialisation

    To initialise a globally declared array take NO time while the program is running. The appropriate locations have already been initialied to the correct values in your executable file.

    To initalise an array declared inside a function requires

  • 1. a copy of the values for initialisation must be kept somewhere, to be used each time the function is entered; and
  • 2. on each entry to the function, the values must be copied into the newly declared array. This can take a lots of time if the function is called many times, and if the array is large.

    Large initialised arrays are usually seen only in global declarations.

    Arrays and variables declared in global, and not otherwise initialised, are all initialised to zero.

    7.2. Printing large arrays

    For large arrays, you will not wish to

  • (i) print all values on separate lines, or
  • (ii) print one value per line.

    Use a construct such as

    Entries 0 to 9 will appear on one line, entries 10 to 19 on the next ...

    To make the values appear in neat columns, you would need to format the output as in

    7.2.1. Two-dimensional arrays

    Declaration

    The above arrays are one-dimensional, and can store a single "row" or "column" or "vector" of values. Suppose we wish to store a table (two-dimensional) of values; these might be the rainfall for each of 12 months for each of 90 years, or the IQs of each of the 11 people in each of 22 football teams, or the marks for each of 20 exercises for each of 100 students, or the names (30 characters long) of each of 100 students. We now need two subscripts for each element, and would write

    Use of 2-dimensional arrays

    To read rainfall data in, we might use

    The numbers in the data must be given in the correct order required by the program; if the loops are nested as above, we would require the twelve numbers for the first year, then the twelve numbers for the second year, ...

    If the loops had been nested the other way round (just interchange the two "for" lines) the data would have had to consist of the 90 January figures, then the 90 February figures, ...

    Accessing the elements

    To calculate the annual totals for each of the 90 years we might write

    We could have used "annual_rain[ year ]" instead of "total" for our addition above; it would be marginally slower on the computer, since it would have to look up the array subscript each time.

    To print the average rainfall over the 90 years for each of the 12 months separately, we might write

    To find the first student whose name starts with the letter 'S', we might use

    leaving student set to -1 if no such student was found.

    Initialising 2-dimensional arrays

    You could write (only in global) for example

    This would give us a 3 by 5 array of integers.

    There is an alternative "flattened" form

    In the flattened form we have to insert the bounds, since the compiler could not know whether we intended a 3 by 5 array or a 5 by 3 array.

    The particular case of initialised two-dimensional character arrays which will be explained more fully later, but is introduced here since you may find it useful. We are concerned with initialising an array of words.

    Keywords

    When looking at C++ programs, we may wish to search for all keywords. We would declare

    This is effectively a 2-dimensional array of characters. Each row of this array is of a different length, the length of that keyword + one for the terminating zero which is always added to a string.

    We could now have code to look for each keyword in turn, the element "keyword[i][j]" is the j-th character of the i-th keyword. We search along each keyword until we encounter the terminating zero.

    This subject is dealt with more fully later under the subject of pointers.

    7.2.2. Higher dimensional arrays

    You can, of course, declare and use 3-dimensional, 4-dimensional and higher dimensioned arrays by extending the above notation.

    In three dimensions

    Beware that it is very easy to eat up huge amounts of storage (the above example declares 1000 variables) with many-dimensioned arrays. The declaration

    would occupy 100,000,000 floats. For your programs you may assume a megabyte of space, say 250K integers or 100k floats.

    7.3. More array examples

    7.3.1. Reflect lines

    We read text from standard input until end-of-file is encountered, printing each line as it is read in reflected from left to right. We use the function "getchar()" to read characters instead of "cin", since it reads spaces and newline characters. The "getchar()" function returns a negative value when it reaches end-of-file, so that the loop is controlled by a "while ... > 0" mechanism.

    Observe that we have a single loop to read the characters, which takes a special action when it encounters a newline character.

    Observe that if the compiled program is called "reflect", then typing

    or

    should show the original file.

    7.3.2. Arrays as function parameters

    Arrays can be passed as function parameters, subject to a few new ideas.

    When the identifier of an array is used as a parameter, the identifier (which is a single value representing the location of the array in memory) is passed over; this reference to where the array is stored is passed "by value" in terms of our earlier jargon. This means that, inside the function, individual elements of the array can be modified. Thus in the program

    the call of "functn" in the main program will have the effect of swapping over the values in "eric[0]" and "eric[1]". It is the identifier of the array which is passed over, not the contents of the whole array. To do the latter would cause great inefficiency in terms of storage space and processor time if the arrays were large.

    If you wish to ensure (for safety reasons) that the array elements cannot be accidentally overwritten, then the function declaration becomes something like

    The "const" keyword should ensure that any attempt to overwrite array elements from within the function is prevented (reported as a compile-time error) by the compiler. The "g++" compiler appears merely to report a warning, not an error.

    If the function needs to know the size of the array, you must pass it as an integer parameter, as in

    7.4. Structures

    The purpose of structures is to group together a number of related items. The items do NOT need to be of the same type.

    7.4.1. Examples of the requirement for structures .N( We are dealing with payroll in a company. For each person employed we need their


      name (string)
      address (string)
      works number (integer)
      tax code (integer?)
      rate of pay / hour (integer)
      hours worked this week (float?)
      total pay so far this year (integer)

    .N( We are analysing the properties of a proposed bridge design. For each component of the structure we need its


      strength (float)
      dimensions (3 floats?)
      weight (float?)
      cost (integer)
      name (string)

    .N( We are working with geographical data. Each item of data consists of


      a data value of type integer
      two position co-ordinates of type float

    .N( We are working with train timetables. For each entry in the timetable we need


      departure time (int)
      arrival time (int)
      special features (Fridays only?, Buffet car?)

    .N( We are working with a number of countries. For each country we need


      name (char array)
      population (int)
      area (float)

    7.4.2. Declarations of structure types

    We will declare first the layout (components) required of a particular type of structure; the declaration of the structure objects themselves will come later elsewhere.

    A bridge component structure type might need

    A general purpose structure for handling dates might be

    A structure for handling geographical data might be

    A structure for integer data values each of which is associated with an (x,y) co-ordinate (the x and y could represent geographical latitude and longitude or Ordnance survey co-ordinates) might be

    A structure for railway timetable entries might be

    A structure for a country might be

    A structure for a string might be

    where the integer field tells us how long the stored string is.

    The separate components of a structure are called its fields.

    7.4.3. Declaration of structure objects

    The above declarations are of structure types, not of structure variables for storing data values. To actually declare objects of the above structures, we might write

    Here we have declared two single structures called this and that and an array of 200 structures; the whole array is called result .

    Do not confuse the structure type declaration (uses no space, gives an identifier to the type of structure) and the variable declarations (they occupy space in the program's memory when the program is running) . Typically the structure type declarations would go into a header file.

    This is similar in some ways to the distinction between a function declaration and definition. The function declaration merely informs the compiler of the details of how function calls are to be handled, the definition includes the actual code to be executed.

    7.4.4. Accessing elements of a structure

    In order to access individual fields of a structure we use the structure variable identifier, followed by a full stop, then the field identifier from the structure type declaration. Given

    declared earlier, we may write

    The compiler will not be confused by the fact that "stock" is used both as a structure field identifier, and as an array identifier. The field identifier will always follow a full stop.

    To declare an array of countries use

    ("country" is the structure type, "asia" is the object identifier) and to access fields

    To copy a whole structure use:

    or if you require merely to copy certain fields use:

    7.4.5. Initialising structures

    To initialise structures we use a notation similar to that for initialising arrays, with the values between curly braces, as in

    For an array of structures, use either the nested braces as in

    or use the flattened form as in

    Geographic example

    For geographical data use

    The length of (the number of elements in) an initialised array of structures can be calculated (a compile-time constant) by using the sizeof function to divide the total size of the array by the size of one element, as in

    Do not forget always to divide by the size of a single element of the array. You may then use

    to control various loops.

    For a simplified BritRail timetable, use perhaps

    for the structure type, and

    for the initialised declaration. You will also need

    In this case the structure type "t_table" represents a single timetable entry, whereas the array of structures "t_table" represents a whole timetable.

    7.4.6. Bit-fields in structures (keenies only)

    Ordinary mortals do not need to know that structure notation can be used to break a single computer word down into small bit-fields for storing small data items.

    The numbers after each field indicate the number of bits allocated. All fields must be of type unsigned.

    7.4.7. The use of arrays of structures

    Whenever you have related items, they should be grouped in a structure. If a program contains two arrays of the same length, it can usually be inferred that the corresponding elements are related.

    You should replace

    by

    and use by

    Program style checkers look for arrays of the same length, and recommend that they be turned into a structure.

    8. Marking

    Your program layout should now treat square brackets like any others; if the open and close fit on the same line, that's OK. If they don't, use the same rules as for curly braces (but of course no comment necessary after the close).

    Copyright Eric Foxley Tue Dec 10 10:09:20 GMT 1996


    Notes converted from troff to HTML by an Eric Foxley shell script, email errors to me!