Chapter 8 : File input/output

It is comparatively easy to perform simple serial input and output using files in C++. We will teach only the straightforward aspects.

8.1. Introduction

8.2. The basics of file access

8.3. Files in C++

8.4. The use of files in general

8.1. Introduction

Perhaps we should first remember that input and output can be diverted from and to files in operating systems such as UNIX or MS-DOS without any program changes, provided that we are talking about all of the input or all of the output. Thus we can write

prog32 < data_file

to run the program prog32 taking all of its input from the file data_file. This is the basis of the way your programs are tested for dynamic correctness in the Ceilidh command against various sets of test data.

Conversely, we can write

prog32 > output_file

to store all of the (standard) output in the named file. This is how your program output is saved when Ceilidh tests the program. The dynamic testing system then searches the saved output for the keywords or phrases which the teacher has specified.

The error output (anything using the cerr stream) would still come to the terminal, and would not be diverted to the file.

This simple approach for diverting program input or output will satisfy some simple situations where we wish to use files instead of the terminal for input or output. However, for most realistic applications, we will need to communicate with standard input and output (the user's terminal) as well as one or more files. We will generally need to interact with a user (print a prompt such as "When does the person arrive at Nottingham station?" on the terminal, and read the reply from the terminal) as well as interacting with the files of data (reading from a file containing the actual train timetables, and perhaps writing to a file of seat reservations).

8.2. The basics of file access

The approach to file input/output in most modern shared computer systems involves the program (i.e. any process wishing to access a file) in the following sequence of activities.

(i) Open the file. The name of the file is given, and the type of access required (e.g. read, write, both, append, seek, ...). When the program runs, the system checks that the type of access requested is permitted; in UNIX this involves the ownership of the file, the access permissions for that file defined by its owner (perhaps modified using the chmod command, and defined in terms of "read" or "write" or "execute" for the "owner", their "group" and the "others"), the identity of the user running the program, and perhaps even in special cases (look up the concept of SUID facilities) the owner of the program. In more secure computer systems, access permission may be a much more complicated concept related to the user's management seniority and areas of responsibility.
(ii) After opening, the file can be accessed by the program as required. The program will not be allowed to read the file if it has asked only for "write" permission. The program now refers to the file not by name, but by a special code returned by the opening process. Information is now read from the file, or written to it. There is usually no consideration of the fact that the permissions on the file could be changed after the open has taken place; once the file has been successfully opened, access is permitted until the file is closed or the process terminates.
(iii) The file is closed when access is no longer required. This is often built into the program termination activity, so that the user does not need to take any closing action. There may be upper limits on the number of files simultaneously open on certain systems, so this may necessitate closing files as soon as access to them is no longer required.
8.3. Files in C++

Files are made under UNIX and MS-DOS to appear very similar to any other form of serial input/output device. Thus in the ordinary command shell, output can be directed to the terminal, or to a file, or to another process, with no fundamental change to the process producing the output. The operating system is hiding from the user what are in reality great differences between the physical processes of sending characters to a terminal and sending them to file.

A typical C++ program to send output to a file is

#include <stream.h>
#include <fstream.h>
#include <stdlib.h>
// The name of the file
const char * f = "/tmp/data_out_file";
main() {
    ofstream fout;
// Open the file
    fout.open ( f );
    if ( fout.fail() ) {
	cerr << "Cannot open file"
	    << f << "\n";
	exit( 1 );
    }
    int i;
// Write to the file
    for ( i = 6; i >= 0 ; i-- ) {
	fout << i << "\n";
    }
// Close it
    fout.close();
    if ( fout.fail() ) {
	cerr << "Close file error\n";
	exit( 1 );
    }
} // end main

The declaration of an ofstream requests an "output file stream", to be called fout. The call of fout.open then asks that it be connected to the file "/tmp/data_out_file". If the request is successful, then "fout" will appear to us just like "cout" for performing output, but output will go to the file, not the terminal. The particular identifier chosen ("fout" in this case) is entirely up to the programmer. Before proceeding any further with the program, we must check that the named file was opened successfully; this is done with the

if ( fout.fail() )

part of the code. There is no point in the program continuing if the open was not successful. At this point we would typically print a message to cerr and exit with non-zero status.

Having opened our output file stream successfully, we then write data and strings to the file using exactly the same techniques as we have been using for cout, but using the output stream identifier fout instead.

We do not need to close the stream when we have finished with the file, but can leave closing to performed automatically during the end of program activities. It is however generally good practice to close files no longer in use. The status of the close should also be checked with the fout.fail() command.

Reading from a file

If we had wanted to read from the file, we might have written

#include <stream.h>
#include <fstream.h>
#include <stdlib.h>
const char * f = "data"; // name of file
main() {
    ifstream fin;
// Open for reading
    fin.open( f );
// Check as before
    if ( fin.fail() ) {
	cerr << "Error opening "
	    << f << "\n";
	exit(1);
    }
    int total = 0, number;
// Read from "fin"
    while ( fin >> number, number != 0 ) {
	total += number;
    }
// Close the file
    fin.close();
    cout << "Total: " << total << "\n";
} // end main

e start by declaring an ifstream (input file stream) and using "fin.open( f )" to connect if to file f. We again use fin.fail() to check that the open was successful. If the request is successful, then fin becomes an input stream to treated just like cin. In this case the file is presumed to contain a series of integer values terminated by a zero. It could have been created using a text editor such as "vi" or "emacs", or as the output of an earlier program using

prog8xxx > outfile

or by a program using it as an output file stream directly.

The choice of identifier "fin" is up to you.

Simultaneously open files

We can have a need for several files to be open at the same time, typically at least one for reading and one for writing. There will usually be an upper limit imposed by the system, perhaps 20 files.

In UNIX the file close takes places automatically when the program terminates if the file has not previously been closed. In other systems, information being written to a file may be lost if the file is not closed properly.

8.4. The use of files in general

Updating files is the essence of commercial programming. A file will contain details of

all personnel, pay to date, tax to date, tax codes, etc

all stock in the warehouse, current and minimum levels, etc

all bank accounts, the owner, the balance, the maximum debt, etc

all flights by the airline, booked and free seats, destination, timing, etc

In commerce, each set of related data (one person's record, the data for one type of stock item in the warehouse) is referred to as a "record". Each complete set of related data and the means for accessing it from within a program would be represented by a structure inside a C++ program.

Each day or week or month (or instantly on receipt of an interactive transaction) the file will be updated, and a new file produced. For security reasons, a firm will keep a limited number of old copies of the file together with details of all subsequent transactions, so that the latest file can be re-created if it gets corrupted.

The information inside most files will be held in a definite order, e.g. ordered by personnel works numbers, warehouse stock number, bank customer account number, flight departure time, etc.

Updating small files

A typical program to update a file would, if the file is small,

1 read the whole of the latest master file into an array (open for reading, read the entire file, and then close it)

2 interact with the user (or use information stored in a data file) to update the various entries in the array (interact using "cin" and "cout") (to add this week's pay, to decrement or increment the current warehouse stock values, to change the current credit in the bank accounts, to reserve a seat on a flight)

3 write the updated information stored in the array into a new file (open for writing, write the entire file, and close).

If all has gone smoothly, the new file is now the master copy, the previous master file becomes the backup copy.

The program sequence might be

Then interact with the user using:

while (
  Ask "Any more updates? ",
  Reply isn't no
) {
  Ask "person? ", read person
  Find array subscript for this person
  Ask "details? ", read details
  Amend entry values in array of structures
} // end while more updates

Now finish off with

Updating large files

For larger files, it may not be possible to read the whole file into memory. The program would first order the transactions so that they are in the same order as the entries in the master file; we would assume that the transactions are now held in a file rather than input from a keyboard. We then read the existing master file one entry at a time, see if that entry needs updating, and write that entry to the new master file. In this case both old and new files (and the file of transactions) are open, and only one record is held in the program at a time. The program outline might be as follows.

Declare a structure type for each record
Declare one structure variable
Open existing master file for reading
    say "fin"
Open new master file for writing
    say "fout"

while (
  Not at end of transaction file
) {
  Read next transaction from transaction file
  Read records from master file "fin",
    copying to new master file "fout"
    until this person's record found
  Check transaction details
  Amend record values
  Write this person's new record to
    the new master file "fout"
}
Copy the remainder of old master
  file "fin" to new master file "fout"
Both files should be closed here
Print any summaries ...

Alternatively, the while loop could be controlled by the reading from the input file, as in

while (
  Not reached end-of-input-file
) {
  Read record from existing file
  if ( not person we're looking for ) {
    Write record to output file
    continue
  }
  Ask "details? ", read details
  Amend record values
  Write this person's record to output file
  Ask "Next person to search for? "
} // end loop to end of file

Interactive transactions

For interactive transactions (such as airline bookings) there must be a way of locking an individual record; is must not be possible for two customers to simultaneously request a spare seat, find that there is one, and attempt to both occupy the same single remaining seat! We are then into a new level of complexity.

Notes converted from troff to HTML by an Eric Foxley shell script, email errors to me!

Chapter 8 : File input/output

Contents

8.1. Introduction

8.2. The basics of file access

8.3. Files in C++

Reading from a file

Simultaneously open files

8.4. The use of files in general

Updating small files

Updating large files

Interactive transactions