Chapter 6: Expandable ADTs

6.1. Introduction

6.2. Intarr - a dynamic array

6.2.1. Definition of the Intarr class

6.2.2. Implementation of the Intarr class

6.2.3. Testing the Intarr class

6.3. The String class

6.3.1. Definition of the String class

6.3.2. Implementation of the String class

6.4. Overloading the assignment operator

6.1. Introduction

This chapter shows how to use pointers to overcome the fixed size limitations inherent in data types such as arrays. The chapter is based around two example C++ classes. The first, Intarr, is a dynamically expandable array class which implements bounds checking on subscripts. The second is a much more complex String class for manipulating text. Its functionality includes comparison of strings, concatenation, substring matching and input and output.

In order to build these classes some new C++ concepts are introduced. These include:

overloading the array subscript operator ( [] );
providing destructors for a class;
The need to provide a copy constructor for a class.
The problem of memberwise assignment and the need to overload the assignment operator ( = );

These techniques turn out to be generally applicable to a range of new C++ classes.

6.2. Intarr - a dynamic array

The goal of the Intarr class is to provide a resizable array of integers which supports bounds checking.

First, it is necessary to remember the general properties of the in-built C++ array data type. An array is an ordered collection of elements all of which are of the same base type. Elements are accessed by their position in the array through a subscript using the [] operator. There are two major drawbacks to C++ arrays.

Once defined, they have a fixed size. This makes them unsuitable for storing highly dynamic kinds of information such as general text (the length of a sentence can vary greatly).

The [] operator doesn't check whether the given subscript falls within the array bounds. This makes arrays dangerous to use.

6.2.1. Definition of the Intarr class

The new Intarr class will provide an array of integers with all of the usual functionality of arrays and with the above drawbacks removed. The interface to Intarr supports the following functions:

        Constructor: set up an array with an initial specified
                     length and all elements initialised to zero.
        Access: return a specified element from the array.
                return the current size of the array
                reset the size of the array (either increase or decrease)
        I/O: print out the elements in the array

This leads to the following class definition.

// C++ class definition for a dynamically resizable array of
// integers with bounds checking
class Intarr {
    private:
        int *a, len;
    public:
        //constructor
        Intarr(int length);
        // destructor
        ~Intarr();
        // return nth element
        int& operator[](int n);
        // return size
        int size();
        // reset the size
        void resize(int length);
        // print
        void print();
};

Notice the inclusion of the method ~Intarr. This is called the destructor for the class and is automatically called when an object of the class is destroyed (i.e. "goes out of scope"). This might occur at the end of the program, at the end of a function (for a local object) or when the user explicitly deletes an object using the delete operator. Any class may have a destructor and the destructor may take any action the user deems necessary or useful. One common use of destructors is to print out a message saying that the object has been destroyed in order to aid the debugging process. For classes which allocate their own memory for data members, a destructor is definitely required. The name of the destructor is always the name of the class with the ~ character in front of it.

This example shows that it is also possible to overload the array subscript operator []. This allows users of the class to write code such as the following:

    Intarr a(6);
    a[1] = 2;
    cout << a[1];

The argument to this operator is the array subscript (an integer number). The result is an element of the array. This result MUST be returned using the reference mechanism! This is because we wish to include elements on both the right and left hand sides of assignment statements. Appearing on the right hand side means that we wish to change the original element (e.g. a[1] = 2). In other words we want the function to deliberately cause a side effect on the Intarr object and so we must have a reference result type.

6.2.2. Implementation of the Intarr class

Our implementation of this class requires two data methods - an array of integers to store its elements and a count of how big the array currently is. The latter is to be used in bounds checking and resizing. The array is declared as an integer pointer (remember that pointers and arrays are inter-changeable). Thus we have:

      int *a, len;

The constructor function needs to allocate enough memory to store the requested number of elements and set the pointer a to point at this memory. It also needs to record the initial length of the array and to initialise all the elements to the value zero. This is achieved by the following code:

Intarr::Intarr(int length) {
        
        // allocate memory for the array
        a = new int[length];
        
        // note the length
        len = length;
        // initialise all elements to 0
        int i;
        for(i = 0; i < len; i++)
                a[i] = 0;
}

Notice that we can use the operator new to allocate memory for an array of objects in one go by telling it how many elements there are between square brackets.

The destructor is responsible for freeing the memory allocated by the class whenever an Intarr object is destroyed. It is necessary to do this because we explicitly allocated the memory ourselves in the first place.

// destructor frees the memory
Intarr::~Intarr() {
        
        // free memory for len elements
        delete [len]a;
}

Notice that in order to free memory for an array of objects, we have to tell delete how many objects there are.

Operator[] first checks that the given subscript is in bounds and, if it is, simply returns the specified element of the array a.

// return reference to nth element
int& Intarr::operator[](int n) {
        // check bounds (remember arrays start at 0)
        if( 0 <= n && n < len)
                // if in bounds return reference
                return a[n];
        else {
                // indicate error and stop the program
                cout << "Error in Intarray operator[]: subscript "
                     << n << " is out of bounds\\n";
                exit(-1);
        }
}

The size() function merely returns the value of the data member len.

// return size of array
int Intarr::size() {
        return len;
}

The resize function is a little trickier. Its first action is to set up a temporary new array with the correct new length. Next it copies across as many elements of the old array as it can into the new one. If the new array is longer, it sets the additional elements to zero. If it is shorter, it truncates the old array. Then it frees the memory for the old array using delete. Finally, is sets the data member a to point at the new array and resets len to the new length.

This technique of setting up new memory and copying values across will also be used in several later classes.

// resize array to 'length' elements
void Intarr::resize(int length) {
        
	// first check that the new length is not the same as the old
	if ( length == len )
		return;
	
        // allocate new memory and new length
        int *newa = new int[length];
        int newlen = length;
        
        // note the shorter of the new and old lengths
        int min;
        if (newlen < len)
                min = newlen;
        else
                min = len;
        // copy across old elements
        // (truncate if new length is shorter)
        for(int i = 0; i < min; i++)
                newa[i] = a[i];
        // set any remaining elements to 0
        for(i = min; i < newlen; i++)
                newa[i] = 0;
        // free old memory
        delete [len]a;
        // set new memory and new length
        a = newa;
        len = newlen;
}

Finally, the print method prints out the array in the form a,b,c ...z.

// print out array
void Intarr::print() {
        int i;
        for (i = 0; i < len; i++)
                cout << a[i] << ",";
        cout << "\\n";
}

6.2.3. Testing the Intarr class

The following simple program demonstrates the syntax for using the Intarr class. Notice how the overloaded [] operator makes the class appear much more like in-built arrays. You might like to think about other possible overloaded operators for this class (e.g. overloading >> and <<).

// test the Intarr class
#include <stream.h>
#include "Intarr.h"
main() {
        Intarr a(3); 
        a[0] = 0;
        a[1] = 1;
        a[2] = 2;
        a.print();
        cout << "size of a is" << a.size() << "\\n";
        a.resize(5);
        a.print();
        a[3] = 3;
        a[4] = 4;
        // this next one would cause an error
        // a[5] = 5;
        a.print();
        a.resize(2);
        a.print();
}

The following briefly summarises the key new points introduced through the Intarr class:

The constructor needs to allocate memory using the new operator.

The destructor needs to free this memory using the delete operator.

Both new and delete can deal with arrays of objects provided they are told how big the arrays are.

The overloaded [] operator must return a reference result so that it can be used on the left hand as well as the right hand side of assignment statements.

The resize method works by setting up new memory, copying across data from old memory and then deleting the old memory.

6.3. The String class

Manipulating text is an important part of many programs. Text might come in the form of words, sentences, names, paragraphs or entire documents. C++ provides the basic char* type for manipulating text. However, this is extremely limited. In effect, we can only:

Point to a string (e.g. char *s = "Hello").

Index individual letters of a string (e.g. if ( s[0] == 'H') ....

Display a string (e.g. cout << s).

There are many additional facilities that might be included in a general purpose string class.

The ability to copy strings to each other.

The ability to concatenate (i.e. join-together) strings to form new strings.

The ability to compare strings based on dictionary ordering (e.g. <, >, == etc).

The ability to read strings using the >> operator and print them using <<.

A string class that supported such functionality would be extremely useful for many applications. Our goal is to design and implement such a class. There are many additional functions that might be useful that we won't consider in our class. These include case conversion, encryption and reversing strings.

6.3.1. Definition of the String class

My analysis of the String class led to the following interface design.

    Constructors:  build an empty string; a blank string of a given length;
                   a string from an existing char*; a string from another
                   string.
    Combination:   concatenate two strings to form a new one (overload
                   operator +).
    Tests:         Test for empty string (!) as well as compare strings
                   ( <, >, <=, >=, ==, !=). Also does one string contain
                   another?
    Access:        Return the length of the string.
                   Access individual letters (overloaded []).
                   
    Input/output:  Overload >> and << to read and write strings

The analysis above leads to the following class definition.

// definition of a character string class
// Steve Benford Nov 1990, Revised Nov 1991
#include <stream.h>
class String {
    private:
        char* str;                      // pointer to character array
        int len;                        // length of the string;
    public:
        // constructors
        String(char *s);                // allocate and initialise
        String(int i);                  // allocate length and set to blanks
        String();                       // allocate to zero length
        String(const String& s);        // copy constructor
        // destructor
        ~String();                      // destructor;
        // combination
        void operator=(String& s);      // assignment between two strings
        String operator+(String& s);    // concatenation
        
        // tests
        int operator!();                // test whether 0
        int operator==(String& s);      // test for equality
        int operator!=(String& s);      // test for inequality
        int operator<(String& s);       // test alphabetically less than
        int operator>(String& s);       // test alphabetically greater than
        int operator<=(String& s);      // test alphabetically less than/equals
        int operator>=(String& s);      // test alphabetically greater/equals
        int contains(String& s);        // does one string contain another?
        // access
        char& operator[](int n);        // return char in position n
        int length();                   // return length of the string
        // I/O
        // overloaded output operator
        friend ostream& operator<<(ostream& os, String &s);
        // overloaded input operator
        // read a sequence of characters ending in a newline into parameter s
        // s will be set to NULL if the line length is greater than 
        // a specific length (BUF_LEN = 200)
        friend istream& operator>>(istream& is, String &s);
};

The class provides four constructors including the copy constructor String(const String& s) which builds one string from another. Note the word "const" in front of the argument declaration. Remember that we pass the String argument by reference for reasons of efficiency, but that this can cause side-effects (i.e. our constructor could change the string object that is its argument). The use of const prohibits these side-effects. In general, you can use const arguments when using references for increased efficiency both where you want to stop possible side-effects.

The String class provides a destructor and also makes extensive use of operator overloading, including overloaded << and >> operators. This also includes overloading the assignment operator, void operator=(String& s), something that we haven't seen before. This will be fully explained below.

6.3.2. Implementation of the String class

Internally, the String class resembles the Intarr class in that there are two data members - a char* pointer to store the string and an integer to record its current length.

The first step of the implementation is to define two generally useful functions strlength and strcopy which calculate the length of a char* and copy one char* to another respectively. In fact, there are standard versions of these functions available in the string library called strcpy and strlen. However, it is worth understanding them for yourselves. It is important to note that strcopy assumes that the right amount memory has already been allocated for the new string before the function is called. If it hasn't, the function will fail and will probably cause the program to crash.

// generally useful functions 
// calculate length of char*
int strlength(char* c) {
        int i = 0;
        // set up a new pointer to the front of the string
        char *p = c;
        // keep going till you reach the end of the string (i.e. null)
        while(*p++ != '\\0')
                i++;
        return i;
}
// copy one char* to another char* (assume space allocated)
void strcopy(char *s, char *t) {
        // while not at end of string
        while(*t != '\\0')  {
                // copy the current characters
                *s = *t;
                // move the pointers one notch along the strings
                s++; t++;
        }
        // put the null character at the end
        *s = '\\0';
}

The first two constructors set up an empty string and a string of n blanks (space characters).

// construct an empty String
String::String() {
        // length is currently zero
        len = 0;
        // set up array with end of string in it
        str = new char[1];
        str[0] = '\\0';
}
// construct a String initialised to n blanks
String::String(int n) {
        len = n;
        // allocate enough space for the string (+ null)
        str = new char[len+1];
        // copy blanks into the memory
        for(int i=0; i<n; i++)
                str[i] = ' ';
        // set end of string marker
        str[n] = '\\0';
}

Constructing a string from a char* involves recording the length of the string, allocating memory for the internal str pointer and then copying the char* pointed at by the argument s to this memory. Note that we must allocate anough memory for the null character ('\\0') as well (i.e. len+1 chars).

// construct a String from a char*
String::String(char *s) {
        len = strlength(s);
        // allocate enough space for the string (+ null)
        str = new char[len+1];
        // copy the string into this memory
        strcopy(str,s);
}

The copy constructor is similar, except that we copy the internal char* given by s.str.

// copy constructor' one string from another
String::String(const String& s) {
        // set the length
        len = s.len;
        // allocate space
        str = new char[len + 1];
        // fill it up
        strcopy(str, s.str);
}

The destructor is responsible for freeing the memory allocated to the string.

// destructor
String::~String() {
        // delete the memory associated with the string
        delete str;
}

Concatenation involves setting up a new temporary string whose length is the combined length of the strings to be joined. The contents of both strings are then copied into this new memory.

// concatenation 
String String::operator+(String& s) {
        // initialise result string of correct length
        String res(len + s.len);
        // copy in first part
        strcopy(res.str, str);
        // copy in second part
        for (int i = 0; i< s.len; i++)
                res.str[len + i] = s.str[i];
        // set end of string marker
        res.str[res.len] = '\\0';
        
        // return result
        return res;
}

The following code implements the various test methods in the string class. Notice that when testing equality we first compare the lengths of the strings before each character in turn.

// test for empty string
int String::operator!() {
        return !len;
}
// test for inequality
int String::operator!=(String& s) {
        // first, are strings the same length?
        if(len != s.len)
                return 1;
        // if they are, test individual characters
        for(int i=0; i < len; i++) 
                if (str[i] != s.str[i])
                        return 1;
        // if we get here they are the same
        return 0;
}
// test for equality
int String::operator==(String& s) {
        // are strings the same length?
        if(len != s.len)
                return 0;
        // test individual characters
        for(int i=0; i < len; i++) 
                if (str[i] != s.str[i])
                        return 0;
        // if we get here they are the same
        return 1;
}
// test whether alphabetically less than
int String::operator<(String& s) {
        // set two pointers to the start of each string
        char *p = str, *q = s.str;
        
        // compare each character in turn while not at end of either string
        while((*p != '\\0') && (*q != '\\0')) {
                if (*p < *q)
                        return 1;
                if (*p > *q)
                        return 0;
                // move pointers along the string
                p++; 
                q++;
        }
        
        // is q longer than p
        if ((*p == '\\0') && (*q != '\\0'))
                return 1;
        else 
                return 0;
} 
// test alphabetically greater than
int String::operator>(String& s) {
        // set two pointers to the start of each string
        char *p = str, *q = s.str;
        
        // compare each character in turn while not at end of either string
        while((*p != '\\0') && (*q != '\\0')) {
                if (*p > *q)
                        return 1;
                if (*p < *q)
                        return 0;
                // move pointers along the string
                p++; 
                q++;
        }
        
        // is p longer than q?
        if ((*p != '\\0') && ( *q == '\\0') )
                return 1;
        else 
                return 0;
}
// test alphabetically less than/equals
int String::operator<=(String& s) {
        // set two pointers to the start of each string
        char *p = str, *q = s.str;
        
        // compare each character in turn while not at end of either string
        while((*p != '\\0') && (*q != '\\0')) {
                if (*p < *q)
                        return 1;
                if (*p > *q)
                        return 0;
                // move pointers along the string
                p++; 
                q++;
        }
        
        // we are at the end of string p?
        if ((*p == '\\0'))
                return 1;
        else 
                return 0;
}
// test alphabetically greater/equals
int String::operator>=(String& s) {
        // set two pointers to the start of each string
        char *p = str, *q = s.str;
        
        // compare each character in turn while not at end of either string
        while((*p != '\\0') && (*q != '\\0')) {
                if (*p > *q) 
                        return 1;
                if (*p < *q) 
                        return 0;
                        // move pointers along the string
                p++; 
                q++;
        }
        
        // are we at the end of string q?
        if ((*q == '\\0')) 
                return 1;
        else 
                return 0;
}
// does one string contain another
int String::contains(String& s) {
        
        int i,j,match;
        // test lengths first to see if contained is longer than container
        if(s.len > len)
                return 0;
        // go down string looking to match first character
        // note: we don't need to go to the end of the array
        for(i = 0; i < len - s.len; i++)
                // do characters match?
                if(str[i] == s.str[0]) {
                        // if so ASSUME a positive match for the whole word
                        match = 1;
                        // now check a character at a time to see
                        // if the assumption was true
                        for(j = 0; j < s.len; j++)
                                // oops, it wasn't
                                if(str[i+j] != s.str[j])
                                        match = 0;
                        // do we still have a positive match?
                        if(match)
                                return 1;
                }
        // if we got here there was no match 
        return 0;
}

Like in the Intarr class, the overloaded [] operator allows access to individual elements of an array object.

// index an element of the String using the array [] operator
char& String::operator[](int n) {
        // check bounds
        if(n > len) {
                cout << "String error: reference out of bounds\\n";
                exit(-1);
        }
        else
                return str[n];
}
                
// return the length of the string
int String::length() {
        return len;
}

Here are the overloaded input and output operators.

// overloaded output operator (friend function)
ostream& operator<<(ostream& os, String &s) {
        if(s.len > 0)
                os << s.str;
        return os;
}
// overloaded input operator (friend function)
// the strategy is to read into an array one character at a time
// (watching for overflow of the array bounds)
// and then to use the array to initialise a string object
istream& operator>>(istream& is, String &s) {
        // temporary array to hold input
        const int BUF_LEN = 200;
        char buffer[BUF_LEN];
        // read a character at a time until end of line or buffer overflow
        int i = 0;
        // keep going till overflow
        while(i < BUF_LEN) {
                // read next character (use get function)
                is.get(buffer[i]);
                // check for end of input (i.e. newline)
                if(buffer[i] == '\\n') {
                        // put in end of string character
                        buffer[i] = '\\0';
                        // initialise the string object s
                        s = String(buffer);
                        // return
                        return is;
                }
                // increment i
                i++;
        }
        // the buffer has overflowed if we get here
        // so set s to be the empty String
	String temp;
        s = temp;
        return is;
}

6.4. Overloading the assignment operator

In this section we discuss overloading the assignment operator ( = ). First we find out the motivation for doing this. Then we look at how it is achieved.

C++ allows us to assign objects of user-defined classes directly to each other. Thus, the following is valid.

        Point a(1,0), b(2,1);
        b = a;

When the assignment takes place, the computer simply copies the values of the data members of b into those of a. In other words,

        b.X = a.X;
        b.Y = a.Y

This simple approach is called memberwise copying and is the default technique for copying objects. It works fine for simple classes with automatically allocated memory (e.g. Point, Counter and Money). However, it fails for classes which allocate their own memory due to the effect of copying pointers. Consider the following.

        String a("Hi"), b("Yo");
        b = a;

This has the effect of:

        b.len = a.len;
        b.str = a.str;

This produces the following effect:

There are now two major problems:

a.str points at the wrong place. What happens if b changes or is deleted?

The memory containing "Yo" is now hanging.

The solution to this problem is that we need to replace the memberwise copying assignment mechanism with something more sophisticated. This is achieved by overloading the assignment operator (operator=). This operator is called when ever two objects are copied - THIS INCLUDES PASSING FUNCTION ARGUMENTS AND RESULTS BY VALUE!

We want our version to do the following:

Free the old memory pointed at by b.str.

Allocate sufficient new memory for b.str.

Copy across the contents of a.str into b.str.

This is achieved by the following code.

// assignment
void String::operator=(String& s) {
        // first delete the old memory
        delete str;
        // copy across new string
        len = s.len;
        str = new char[len + 1];
        strcopy(str, s.str);
}

It is crucial to note that this operator will be called for argument and result passing in functions as well as for direct assignment.

It will be necessary to overload the assignment operator for any class which allocates and frees its own memory.

In summary, the String class has had to consider memory allocation and deletion in the following places:

Constructors

Destructors

Concatenation

Assignment

It is possible to think of many interesting extensions to the String class:

Reformating - case conversion, encryption, reversal, compression

Matching - wildcards, case-ignore

Combination - splitting, selecting parts

Notes converted from troff to HTML by an Eric Foxley shell script, email errors to me!

Chapter 6: Expandable ADTs

Contents