MindView Inc.
[ Viewing Hints ] [ Revision History ] [ Free Newsletter ]
[ Seminars ] [ Seminars on CD ROM ] [ Consulting ]

Thinking in C++, 2nd ed., Volume 2, Revision 2

©2000 by Bruce Eckel

[ Previous Chapter ] [ Short TOC ] [ Table of Contents ] [ Index ] [ Next Chapter ]

4: STL Containers & Iterators

Container classes are the solution to a specific kind of code reuse problem. They are building blocks used to create object-oriented programs – they make the internals of a program much easier to construct.

A container class describes an object that holds other objects. Container classes are so important that they were considered fundamental to early object-oriented languages. In Smalltalk, for example, programmers think of the language as the program translator together with the class library, and a critical part of that library is the container classes. So it became natural that C++ compiler vendors also include a container class library. You’ll note that the vector was so useful that it was introduced in its simplest form very early in this book.

Like many other early C++ libraries, early container class libraries followed Smalltalk’s object-based hierarchy, which worked well for Smalltalk, but turned out to be awkward and difficult to use in C++. Another approach was required.

This chapter attempts to slowly work you into the concepts of the C++ Standard Template Library (STL), which is a powerful library of containers (as well as algorithms, but these are covered in the following chapter). In the past, I have taught that there is a relatively small subset of elements and ideas that you need to understand in order to get much of the usefulness from the STL. Although this can be true it turns out that understanding the STL more deeply is important to gain the full power of the library. This chapter and the next probe into the STL containers and algorithms.

Containers and iterators

If you don’t know how many objects you’re going to need to solve a particular problem, or how long they will last, you also don’t know how to store those objects. How can you know how much space to create? You can’t, since that information isn’t known until run time.

The solution to most problems in object-oriented design seems flippant: you create another type of object. For the storage problem, the new type of object holds other objects, or pointers to objects. Of course, you can do the same thing with an array, but there’s more. This new type of object, which is typically referred to in C++ as a container (also called a collection in some languages), will expand itself whenever necessary to accommodate everything you place inside it. So you don’t need to know how many objects you’re going to hold in a collection. You just create a collection object and let it take care of the details.

Fortunately, a good OOP language comes with a set of containers as part of the package. In C++, it’s the Standard Template Library (STL). In some libraries, a generic container is considered good enough for all needs, and in others (C++ in particular) the library has different types of containers for different needs: a vector for consistent access to all elements, and a linked list for consistent insertion at all elements, for example, so you can choose the particular type that fits your needs. These may include sets, queues, hash tables, trees, stacks, etc.

All containers have some way to put things in and get things out. The way that you place something into a container is fairly obvious. There’s a function called “push” or “add” or a similar name. Fetching things out of a container is not always as apparent; if it’s an array-like entity such as a vector, you might be able to use an indexing operator or function. But in many situations this doesn’t make sense. Also, a single-selection function is restrictive. What if you want to manipulate or compare a group of elements in the container?

The solution is an iterator, which is an object whose job is to select the elements within a container and present them to the user of the iterator. As a class, it also provides a level of abstraction. This abstraction can be used to separate the details of the container from the code that’s accessing that container. The container, via the iterator, is abstracted to be simply a sequence. The iterator allows you to traverse that sequence without worrying about the underlying structure – that is, whether it’s a vector, a linked list, a stack or something else. This gives you the flexibility to easily change the underlying data structure without disturbing the code in your program.

From the design standpoint, all you really want is a sequence that can be manipulated to solve your problem. If a single type of sequence satisfied all of your needs, there’d be no reason to have different kinds. There are two reasons that you need a choice of containers. First, containers provide different types of interfaces and external behavior. A stack has a different interface and behavior than that of a queue, which is different than that of a set or a list. One of these might provide a more flexible solution to your problem than the other. Second, different containers have different efficiencies for certain operations. The best example is a vector and a list. Both are simple sequences that can have identical interfaces and external behaviors. But certain operations can have radically different costs. Randomly accessing elements in a vector is a constant-time operation; it takes the same amount of time regardless of the element you select. However, in a linked list it is expensive to move through the list to randomly select an element, and it takes longer to find an element if it is further down the list. On the other hand, if you want to insert an element in the middle of a sequence, it’s much cheaper in a list than in a vector. These and other operations have different efficiencies depending upon the underlying structure of the sequence. In the design phase, you might start with a list and, when tuning for performance, change to a vector. Because of the abstraction via iterators, you can change from one to the other with minimal impact on your code.

In the end, remember that a container is only a storage cabinet to put objects in. If that cabinet solves all of your needs, it doesn’t really matter how it is implemented (a basic concept with most types of objects). If you’re working in a programming environment that has built-in overhead due to other factors, then the cost difference between a vector and a linked list might not matter. You might need only one type of sequence. You can even imagine the “perfect” container abstraction, which can automatically change its underlying implementation according to the way it is used.

STL reference documentation

You will notice that this chapter does not contain exhaustive documentation describing each of the member functions in each STL container. Although I describe the member functions that I use, I’ve left the full descriptions to others: there are at least two very good on-line sources of STL documentation in HTML format that you can keep resident on your computer and view with a Web browser whenever you need to look something up. The first is the Dinkumware library (which covers the entire Standard C and C++ library) mentioned at the beginning of this book section (page XXX). The second is the freely-downloadable SGI STL and documentation, freely downloadable at http://www.sgi.com/Technology/STL/. These should provide complete references when you’re writing code. In addition, the STL books listed in Appendix XX will provide you with other resources.

The Standard Template Library

The C++ STL[16] is a powerful library intended to satisfy the vast bulk of your needs for containers and algorithms, but in a completely portable fashion. This means that not only are your programs easier to port to other platforms, but that your knowledge itself does not depend on the libraries provided by a particular compiler vendor (and the STL is likely to be more tested and scrutinized than a particular vendor’s library). Thus, it will benefit you greatly to look first to the STL for containers and algorithms, before looking at vendor-specific solutions.

A fundamental principle of software design is that all problems can be simplified by introducing an extra level of indirection. This simplicity is achieved in the STL using iterators to perform operations on a data structure while knowing as little as possible about that structure, thus producing data structure independence. With the STL, this means that any operation that can be performed on an array of objects can also be performed on an STL container of objects and vice versa. The STL containers work just as easily with built-in types as they do with user-defined types. If you learn the library, it will work on everything.

The drawback to this independence is that you’ll have to take a little time at first getting used to the way things are done in the STL. However, the STL uses a consistent pattern, so once you fit your mind around it, it doesn’t change from one STL tool to another.

Consider an example using the STL set class. A set will allow only one of each object value to be inserted into itself. Here is a simple set created to work with ints by providing int as the template argument to set:

//: C04:Intset.cpp
// Simple use of STL set
#include <set>
#include <iostream>
using namespace std;

int main() {
 set<int> intset;
  for(int i = 0; i < 25; i++)
    for(int j = 0; j < 10; j++)
      // Try to insert multiple copies:
      intset.insert(j);
  // Print to output:
 copy(intset.begin(), intset.end(),
   ostream_iterator<int>(cout, "\n"));
} ///:~

The insert( ) member does all the work: it tries putting the new element in and rejects it if it’s already there. Very often the activities involved in using a set are simply insertion and a test to see whether it contains the element. You can also form a union, intersection, or difference of sets, and test to see if one set is a subset of another.

In this example, the values 0 - 9 are inserted into the set 25 times, and the results are printed out to show that only one of each of the values is actually retained in the set.

The copy( ) function is actually the instantiation of an STL template function, of which there are many. These template functions are generally referred to as “the STL Algorithms” and will be the subject of the following chapter. However, several of the algorithms are so useful that they will be introduced in this chapter. Here, copy( ) shows the use of iterators. The set member functions begin( ) and end( ) produce iterators as their return values. These are used by copy( ) as beginning and ending points for its operation, which is simply to move between the boundaries established by the iterators and copy the elements to the third argument, which is also an iterator, but in this case, a special type created for iostreams. This places int objects on cout and separates them with a newline.

Because of its genericity, copy( ) is certainly not restricted to printing on a stream. It can be used in virtually any situation: it needs only three iterators to talk to. All of the algorithms follow the form of copy( ) and simply manipulate iterators (the use of iterators is the “extra level of indirection”).

Now consider taking the form of Intset.cpp and reshaping it to display a list of the words used in a document. The solution becomes remarkably simple.

//: C04:WordSet.cpp
#include "../require.h"
#include <string>
#include <fstream>
#include <iostream>
#include <set>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream source(argv[1]);
  assure(source, argv[1]);
  string word;
  set<string> words;
  while(source >> word)
    words.insert(word);
  copy(words.begin(), words.end(),
    ostream_iterator<string>(cout, "\n"));
  cout << "Number of unique words:" 
    << words.size() << endl;
} ///:~

The only substantive difference here is that string is used instead of int. The words are pulled from a file, but everything else is the same as in Intset.cpp. The operator>> returns a whitespace-separated group of characters each time it is called, until there’s no more input from the file. So it approximately breaks an input stream up into words. Each string is placed in the set using insert( ), and the copy( ) function is used to display the results. Because of the way set is implemented (as a tree), the words are automatically sorted.

Consider how much effort it would be to accomplish the same task in C, or even in C++ without the STL.

The basic concepts

The primary idea in the STL is the container (also known as a collection), which is just what it sounds like: a place to hold things. You need containers because objects are constantly marching in and out of your program and there must be someplace to put them while they’re around. You can’t make named local objects because in a typical program you don’t know how many, or what type, or the lifetime of the objects you’re working with. So you need a container that will expand whenever necessary to fill your needs.

All the containers in the STL hold objects and expand themselves. In addition, they hold your objects in a particular way. The difference between one container and another is the way the objects are held and how the sequence is created. Let’s start by looking at the simplest containers.

A vector is a linear sequence that allows rapid random access to its elements. However, it’s expensive to insert an element in the middle of the sequence, and is also expensive when it allocates additional storage. A deque is also a linear sequence, and it allows random access that’s nearly as fast as vector, but it’s significantly faster when it needs to allocate new storage, and you can easily add new elements at either end (vector only allows the addition of elements at its tail). A list the third type of basic linear sequence, but it’s expensive to move around randomly and cheap to insert an element in the middle. Thus list, deque and vector are very similar in their basic functionality (they all hold linear sequences), but different in the cost of their activities. So for your first shot at a program, you could choose any one, and only experiment with the others if you’re tuning for efficiency.

Many of the problems you set out to solve will only require a simple linear sequence like a vector, deque or list. All three have a member function push_back( ) which you use to insert a new element at the back of the sequence (deque and list also have push_front( )).

But now how do you retrieve those elements? With a vector or deque, it is possible to use the indexing operator[ ], but that doesn’t work with list. Since it would be nicest to learn a single interface, we’ll often use the one defined for all STL containers: the iterator.

An iterator is a class that abstracts the process of moving through a sequence. It allows you to select each element of a sequence without knowing the underlying structure of that sequence. This is a powerful feature, partly because it allows us to learn a single interface that works with all containers, and partly because it allows containers to be used interchangeably.

One more observation and you’re ready for another example. Even though the STL containers hold objects by value (that is, they hold the whole object inside themselves) that’s probably not the way you’ll generally use them if you’re doing object-oriented programming. That’s because in OOP, most of the time you’ll create objects on the heap with new and then upcast the address to the base-class type, later manipulating it as a pointer to the base class. The beauty of this is that you don’t worry about the specific type of object you’re dealing with, which greatly reduces the complexity of your code and increases the maintainability of your program. This process of upcasting is what you try to do in OOP with polymorphism, so you’ll usually be using containers of pointers.

Consider the classic “shape” example where shapes have a set of common operations, and you have different types of shapes. Here’s what it looks like using the STL vector to hold pointers to various types of Shape created on the heap:

//: C04:Stlshape.cpp
// Simple shapes w/ STL
#include <vector>
#include <iostream>
using namespace std;

class Shape {
public:
  virtual void draw() = 0;
  virtual ~Shape() {};
};

class Circle : public Shape {
public:
  void draw() { cout << "Circle::draw\n"; }
  ~Circle() { cout << "~Circle\n"; }
};

class Triangle : public Shape {
public:
  void draw() { cout << "Triangle::draw\n"; }
  ~Triangle() { cout << "~Triangle\n"; }
};

class Square : public Shape {
public:
  void draw() { cout << "Square::draw\n"; }
  ~Square() { cout << "~Square\n"; }
};

typedef std::vector<Shape*> Container;
typedef Container::iterator Iter;

int main() {
  Container shapes;
  shapes.push_back(new Circle);
  shapes.push_back(new Square);
  shapes.push_back(new Triangle);
  for(Iter i = shapes.begin();
      i != shapes.end(); i++)
    (*i)->draw();
  // ... Sometime later:
  for(Iter j = shapes.begin();
      j != shapes.end(); j++)
    delete *j;
} ///:~

The creation of Shape, Circle, Square and Triangle should be fairly familiar. Shape is a pure abstract base class (because of the pure specifier =0) that defines the interface for all types of shapes. The derived classes redefine the virtual function draw( ) to perform the appropriate operation. Now we’d like to create a bunch of different types of Shape object, but where to put them? In an STL container, of course. For convenience, this typedef:

typedef std::vector<Shape*> Container;

creates an alias for a vector of Shape*, and this typedef:

typedef Container::iterator Iter;

uses that alias to create another one, for vector<Shape*>::iterator. Notice that the container type name must be used to produce the appropriate iterator, which is defined as a nested class. Although there are different types of iterators (forward, bidirectional, reverse, etc., which will be explained later) they all have the same basic interface: you can increment them with ++, you can dereference them to produce the object they’re currently selecting, and you can test them to see if they’re at the end of the sequence. That’s what you’ll want to do 90% of the time. And that’s what is done in the above example: after creating a container, it’s filled with different types of Shape*. Notice that the upcast happens as the Circle, Square or Rectangle pointer is added to the shapes container, which doesn’t know about those specific types but instead holds only Shape*. So as soon as the pointer is added to the container it loses its specific identity and becomes an anonymous Shape*. This is exactly what we want: toss them all in and let polymorphism sort it out.

The first for loop creates an iterator and sets it to the beginning of the sequence by calling the begin( ) member function for the container. All containers have begin( ) and end( ) member functions that produce an iterator selecting, respectively, the beginning of the sequence and one past the end of the sequence. To test to see if you’re done, you make sure you’re != to the iterator produced by end( ). Not < or <=. The only test that works is !=. So it’s very common to write a loop like:

for(Iter i = shapes.begin(); i != shapes.end(); i++)

This says: “take me through every element in the sequence.”

What do you do with the iterator to produce the element it’s selecting? You dereference it using (what else) the ‘*’ (which is actually an overloaded operator). What you get back is whatever the container is holding. This container holds Shape*, so that’s what *i produces. If you want to send a message to the Shape, you must select that message with ->, so you write the line:

(*i)->draw();

This calls the draw( ) function for the Shape* the iterator is currently selecting. The parentheses are ugly but necessary to produce the proper order of evaluation. As an alternative, operator-> is defined so that you can say:

i->draw();

As they are destroyed or in other cases where the pointers are removed, the STL containers do not call delete for the pointers they contain. If you create an object on the heap with new and place its pointer in a container, the container can’t tell if that pointer is also placed inside another container. So the STL just doesn’t do anything about it, and puts the responsibility squarely in your lap. The last lines in the program move through and delete every object in the container so proper cleanup occurs.

It’s very interesting to note that you can change the type of container that this program uses with two lines. Instead of including <vector>, you include <list>, and in the first typedef you say:

typedef std::list<Shape*> Container;

instead of using a vector. Everything else goes untouched. This is possible not because of an interface enforced by inheritance (there isn’t any inheritance in the STL, which comes as a surprise when you first see it), but because the interface is enforced by a convention adopted by the designers of the STL, precisely so you could perform this kind of interchange. Now you can easily switch between vector and list and see which one works fastest for your needs.

Containers of strings

In the prior example, at the end of main( ), it was necessary to move through the whole list and delete all the Shape pointers.

for(Iter j = shapes.begin();
      j != shapes.end(); j++)
    delete *j;

This highlights what could be seen as a flaw in the STL: there’s no facility in any of the STL containers to automatically delete the pointers they contain, so you must do it by hand. It’s as if the assumption of the STL designers was that containers of pointers weren’t an interesting problem, although I assert that it is one of the more common things you’ll want to do.

Automatically deleting a pointer turns out to be a rather aggressive thing to do because of the multiple membership problem. If a container holds a pointer to an object, it’s not unlikely that pointer could also be in another container. A pointer to an Aluminum object in a list of Trash pointers could also reside in a list of Aluminum pointers. If that happens, which list is responsible for cleaning up that object – that is, which list “owns” the object?

This question is virtually eliminated if the object rather than a pointer resides in the list. Then it seems clear that when the list is destroyed, the objects it contains must also be destroyed. Here, the STL shines, as you can see when creating a container of string objects. The following example stores each incoming line as a string in a vector<string>:

//: C04:StringVector.cpp
// A vector of strings
#include "../require.h"
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  vector<string> strings;
  string line;
  while(getline(in, line))
    strings.push_back(line);
  // Do something to the strings...
  int i = 1;
  vector<string>::iterator w;
  for(w = strings.begin();
      w != strings.end(); w++) {
    ostringstream ss;
    ss << i++;
    *w = ss.str() + ": " + *w;
  }
  // Now send them out:
  copy(strings.begin(), strings.end(),
    ostream_iterator<string>(cout, "\n"));
  // Since they aren't pointers, string 
  // objects clean themselves up! 
} ///:~

Once the vector<string> called strings is created, each line in the file is read into a string and put in the vector:

  while(getline(in, line))
    strings.push_back(line);

The operation that’s being performed on this file is to add line numbers. A stringstream provides easy conversion from an int to a string of characters representing that int.

Assembling string objects is quite easy, since operator+ is overloaded. Sensibly enough, the iterator w can be dereferenced to produce a string that can be used as both an rvalue and an lvalue:

*w = ss.str() + ": " + *w;

The fact that you can assign back into the container via the iterator may seem a bit surprising at first, but it’s a tribute to the careful design of the STL.

Because the vector<string> contains the objects themselves, a number of interesting things take place. First, no cleanup is necessary. Even if you were to put addresses of the string objects as pointers into other containers, it’s clear that strings is the “master list” and maintains ownership of the objects.

Second, you are effectively using dynamic object creation, and yet you never use new or delete! That’s because, somehow, it’s all taken care of for you by the vector (this is non-trivial. You can try to figure it out by looking at the header files for the STL – all the code is there – but it’s quite an exercise). Thus your coding is significantly cleaned up.

The limitation of holding objects instead of pointers inside containers is quite severe: you can’t upcast from derived types, thus you can’t use polymorphism. The problem with upcasting objects by value is that they get sliced and converted until their type is completely changed into the base type, and there’s no remnant of the derived type left. It’s pretty safe to say that you never want to do this.

Inheriting from STL containers

The power of instantly creating a sequence of elements is amazing, and it makes you realize how much time you’ve spent (or rather, wasted) in the past solving this particular problem. For example, many utility programs involve reading a file into memory, modifying the file and writing it back out to disk. One might as well take the functionality in StringVector.cpp and package it into a class for later reuse.

Now the question is: do you create a member object of type vector, or do you inherit? A general guideline is to always prefer composition (member objects) over inheritance, but with the STL this is often not true, because there are so many existing algorithms that work with the STL types that you may want your new type to be an STL type. So the list of strings should also be a vector, thus inheritance is desired.

//: C04:FileEditor.h
// File editor tool
#ifndef FILEEDITOR_H
#define FILEEDITOR_H
#include <string>
#include <vector>
#include <iostream>

class FileEditor : 
public std::vector<std::string> {
public:
  FileEditor(char* filename);
  void write(std::ostream& out = std::cout);
};
#endif // FILEEDITOR_H ///:~

Note the careful avoidance of a global using namespace std statement here, to prevent the opening of the std namespace to every file that includes this header.

The constructor opens the file and reads it into the FileEditor, and write( ) puts the vector of string onto any ostream. Notice in write( ) that you can have a default argument for a reference.

The implementation is quite simple:

//: C04:FileEditor.cpp {O}
#include "FileEditor.h"
#include "../require.h"
#include <fstream>
using namespace std;

FileEditor::FileEditor(char* filename) {
  ifstream in(filename);
  assure(in, filename);
  string line;
  while(getline(in, line))
    push_back(line);
}

// Could also use copy() here:
void FileEditor::write(ostream& out) {
  for(iterator w = begin();  w != end(); w++)
    out << *w << endl;
} ///:~

The functions from StringVector.cpp are simply repackaged. Often this is the way classes evolve – you start by creating a program to solve a particular application, then discover some commonly-used functionality within the program that can be turned into a class.

The line numbering program can now be rewritten using FileEditor:

//: C04:FEditTest.cpp
//{L} FileEditor
// Test the FileEditor tool
#include "FileEditor.h"
#include "../require.h"
#include <sstream>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  FileEditor file(argv[1]);
  // Do something to the lines...
  int i = 1;
  FileEditor::iterator w = file.begin();
  while(w != file.end()) {
    ostringstream ss;
    ss << i++;
    *w = ss.str() + ": " + *w;
    w++;
  }
  // Now send them to cout:
  file.write();
} ///:~

Now the operation of reading the file is in the constructor:

FileEditor file(argv[1]);

and writing happens in the single line (which defaults to sending the output to cout):

file.write();

The bulk of the program is involved with actually modifying the file in memory.

A plethora of iterators

As mentioned earlier, the iterator is the abstraction that allows a piece of code to be generic, and to work with different types of containers without knowing the underlying structure of those containers. Every container produces iterators. You must always be able to say:

ContainerType::iterator
ContainerType::const_iterator

to produce the types of the iterators produced by that container. Every container has a begin( ) method that produces an iterator indicating the beginning of the elements in the container, and an end( ) method that produces an iterator which is the as the past-the-end value of the container. If the container is const¸ begin( ) and end( ) produce const iterators.

Every iterator can be moved forward to the next element using the operator++ (an iterator may be able to do more than this, as you shall see, but it must at least support forward movement with operator++).

The basic iterator is only guaranteed to be able to perform == and != comparisons. Thus, to move an iterator it forward without running it off the end you say something like:

while(it != pastEnd) {
  // Do something
  it++;
}

Where pastEnd is the past-the-end value produced by the container’s end( ) member function.

An iterator can be used to produce the element that it is currently selecting within a container by dereferencing the iterator. This can take two forms. If it is an iterator and f( ) is a member function of the objects held in the container that the iterator is pointing within, then you can say either:

(*it).f();

or

it->f();

Knowing this, you can create a template that works with any container. Here, the apply( ) function template calls a member function for every object in the container, using a pointer to member that is passed as an argument:

//: C04:Apply.cpp
// Using basic iterators
#include <iostream>
#include <vector>
#include <iterator>
using namespace std;

template<class Cont, class PtrMemFun>
void apply(Cont& c, PtrMemFun f) {
  typename Cont::iterator it = c.begin();
  while(it != c.end()) {
    (it->*f)(); // Compact form
    ((*it).*f)(); // Alternate form
    it++;
  }
}

class Z {
  int i;
public:
  Z(int ii) : i(ii) {}
  void g() { i++; }
  friend ostream& 
  operator<<(ostream& os, const Z& z) {
    return os << z.i;
  }
};

int main() {
  ostream_iterator<Z> out(cout, " ");
  vector<Z> vz;
  for(int i = 0; i < 10; i++)
    vz.push_back(Z(i));
  copy(vz.begin(), vz.end(), out);
  cout << endl;
  apply(vz, &Z::g);
  copy(vz.begin(), vz.end(), out);
} ///:~

Because operator-> is defined for STL iterators, it can be used for pointer-to-member dereferencing (in the following chapter you’ll learn a more elegant way to handle the problem of applying a member function or ordinary function to every object in a container).

Much of the time, this is all you need to know about iterators – that they are produced by begin( ) and end( ), and that you can use them to move through a container and select elements. Many of the problems that you solve, and the STL algorithms (covered in the next chapter) will allow you to just flail away with the basics of iterators. However, things can at times become more subtle, and in those cases you need to know more about iterators. The rest of this section gives you the details.

Iterators in reversible containers

All containers must produce the basic iterator. A container may also be reversible, which means that it can produce iterators that move backwards from the end, as well as the iterators that move forward from the beginning.

A reversible container has the methods rbegin( ) (to produce a reverse_iterator selecting the end) and rend( ) (to produce a reverse_iterator indicating “one past the beginning”). If the container is const then rbegin( ) and rend( ) will produce const_reverse_iterators.

All the basic sequence containers vector, deque and list are reversible containers. The following example uses vector, but will work with deque and list as well:

//: C04:Reversible.cpp
// Using reversible containers
#include "../require.h"
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main() {
  ifstream in("Reversible.cpp");
  assure(in, "Reversible.cpp");
  string line;
  vector<string> lines;
  while(getline(in, line))
    lines.push_back(line);
  vector<string>::reverse_iterator r;
  for(r = lines.rbegin(); r != lines.rend(); r++)
    cout << *r << endl;
} ///:~

You move backward through the container using the same syntax as moving forward through a container with an ordinary iterator.

The associative containers set, multiset, map and multimap are also reversible. Using iterators with associative containers is a bit different, however, and will be delayed until those containers are more fully introduced.

Iterator categories

The iterators are classified into different “categories” which describe what they are capable of doing. The order in which they are generally described moves from the categories with the most restricted behavior to those with the most powerful behavior.

Input: read-only, one pass

The only predefined implementations of input iterators are istream_iterator and istreambuf_iterator, to read from an istream. As you can imagine, an input iterator can only be dereferenced once for each element that’s selected, just as you can only read a particular portion of an input stream once. They can only move forward. There is a special constructor to define the past-the-end value. In summary, you can dereference it for reading (once only for each value), and move it forward.

Output: write-only, one pass

This is the complement of an input iterator, but for writing rather than reading. The only predefined implementations of output iterators are ostream_iterator and ostreambuf_iterator, to write to an ostream, and the less-commonly-used raw_storage_iterator. Again, these can only be dereferenced once for each written value, and they can only move forward. There is no concept of a terminal past-the-end value for an output iterator. Summarizing, you can dereference it for writing (once only for each value) and move it forward.

Forward: multiple read/write

The forward iterator contains all the functionality of both the input iterator and the output iterator, plus you can dereference an iterator location multiple times, so you can read and write to a value multiple times. As the name implies, you can only move forward. There are no predefined iterators that are only forward iterators.

Bidirectional: operator--

The bidirectional iterator has all the functionality of the forward iterator, and in addition it can be moved backwards one location at a time using operator--.

Random-access: like a pointer

Finally, the random-access iterator has all the functionality of the bidirectional iterator plus all the functionality of a pointer (a pointer is a random-access iterator). Basically, anything you can do with a pointer you can do with a random-access iterator, including indexing with operator[ ], adding integral values to a pointer to move it forward or backward by a number of locations, and comparing one iterator to another with <, >=, etc.

Is this really important?

Why do you care about this categorization? When you’re just using containers in a straightforward way (for example, just hand-coding all the operations you want to perform on the objects in the container) it usually doesn’t impact you too much. Things either work or they don’t. The iterator categories become important when:

  1. You use some of the fancier built-in iterator types that will be demonstrated shortly. Or you graduate to creating your own iterators (this will also be demonstrated, later in this chapter).
  2. You use the STL algorithms (the subject of the next chapter). Each of the algorithms have requirements that they place on the iterators that they work with. Knowledge of the iterator categories is even more important when you create your own reusable algorithm templates, because the iterator category that your algorithm requires determines how flexible the algorithm will be. If you only require the most primitive iterator category (input or output) then your algorithm will work with everything (copy( ) is an example of this).

Predefined iterators

The STL has a predefined set of iterator classes that can be quite handy. For example, you’ve already seen reverse_iterator (produced by calling rbegin( ) and rend( ) for all the basic containers).

The insertion iterators are necessary because some of the STL algorithms – copy( ) for example – use the assignment operator= in order to place objects in the destination container. This is a problem when you’re using the algorithm to fill the container rather than to overwrite items that are already in the destination container. That is, when the space isn’t already there. What the insert iterators do is change the implementation of the operator= so that instead of doing an assignment, it calls a “push” or “insert” function for that container, thus causing it to allocate new space. The constructors for both back_insert_iterator and front_insert_iterator take a basic sequence container object (vector, deque or list) as their argument and produce an iterator that calls push_back( ) or push_front( ), respectively, to perform assignment. The shorthand functions back_inserter( ) and front_inserter( ) produce the same objects with a little less typing. Since all the basic sequence containers support push_back( ), you will probably find yourself using back_inserter( ) with some regularity.

The insert_iterator allows you to insert elements in the middle of the sequence, again replacing the meaning of operator=, but this time with insert( ) instead of one of the “push” functions. The insert( ) member function requires an iterator indicating the place to insert before, so the insert_iterator requires this iterator in addition to the container object. The shorthand function inserter( ) produces the same object.

The following example shows the use of the different types of inserters:

//: C04:Inserters.cpp
// Different types of iterator inserters
#include <iostream>
#include <vector>
#include <deque>
#include <list>
#include <iterator>
using namespace std;

int a[] = { 1, 3, 5, 7, 11, 13, 17, 19, 23 };

template<class Cont>
void frontInsertion(Cont& ci) {
  copy(a, a + sizeof(a)/sizeof(int), 
    front_inserter(ci));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

template<class Cont>
void backInsertion(Cont& ci) {
  copy(a, a + sizeof(a)/sizeof(int), 
    back_inserter(ci));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

template<class Cont>
void midInsertion(Cont& ci) {
  typename Cont::iterator it = ci.begin();
  it++; it++; it++;
  copy(a, a + sizeof(a)/(sizeof(int) * 2),
    inserter(ci, it));
  copy(ci.begin(), ci.end(),
    ostream_iterator<int>(cout, " "));
  cout << endl;
}

int main() {
  deque<int> di;
  list<int>  li;
  vector<int> vi;
  // Can't use a front_inserter() with vector
  frontInsertion(di);
  frontInsertion(li);
  di.clear();
  li.clear();
  backInsertion(vi);
  backInsertion(di);
  backInsertion(li);
  midInsertion(vi);
  midInsertion(di);
  midInsertion(li);
} ///:~

Since vector does not support push_front( ), it cannot produce a front_insertion_iterator. However, you can see that vector does support the other two types of insertion (even though, as you shall see later, insert( ) is not a very efficient operation for vector).

IO stream iterators

You’ve already seen some use of the ostream_iterator (an output iterator) in conjunction with copy( ) to place the contents of a container on an output stream. There is a corresponding istream_iterator (an input iterator) which allows you to “iterate” a set of objects of a specified type from an input stream. An important difference between ostream_iterator and istream_iterator comes from the fact that an output stream doesn’t have any concept of an “end,” since you can always just keep writing more elements. However, an input stream eventually terminates (for example, when you reach the end of a file) so there needs to be a way to represent that. An istream_iterator has two constructors, one that takes an istream and produces the iterator you actually read from, and the other which is the default constructor and produces an object which is the past-the-end sentinel. In the following program this object is named end:

//: C04:StreamIt.cpp
// Iterators for istreams and ostreams
#include "../require.h"
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;

int main() {
  ifstream in("StreamIt.cpp");
  assure(in, "StreamIt.cpp");
  istream_iterator<string> init(in), end;
  ostream_iterator<string> out(cout, "\n");
  vector<string> vs;
  copy(init, end, back_inserter(vs));
  copy(vs.begin(), vs.end(), out);
  *out++ = vs[0];
  *out++ = "That's all, folks!";
} ///:~

When in runs out of input (in this case when the end of the file is reached) then init becomes equivalent to end and the copy( ) terminates.

Because out is an ostream_iterator<string>, you can simply assign any string object to the dereferenced iterator using operator= and that string will be placed on the output stream, as seen in the two assignments to out. Because out is defined with a newline as its second argument, these assignments also cause a newline to be inserted along with each assignment.

While it is possible to create an istream_iterator<char> and ostream_iterator<char>, these actually parse the input and thus will for example automatically eat whitespace (spaces, tabs and newlines), which is not desirable if you want to manipulate an exact representation of an istream. Instead, you can use the special iterators istreambuf_iterator and ostreambuf_iterator, which are designed strictly to move characters[17]. Although these are templates, the only template arguments they will accept are either char or wchar_t (for wide characters). The following example allows you to compare the behavior of the stream iterators vs. the streambuf iterators:

//: C04:StreambufIterator.cpp
// istreambuf_iterator & ostreambuf_iterator
#include "../require.h"
#include <iostream>
#include <fstream>
#include <iterator>
#include <algorithm>
using namespace std;

int main() {
  ifstream in("StreambufIterator.cpp");
  assure(in, "StreambufIterator.cpp");
  // Exact representation of stream:
  istreambuf_iterator<char> isb(in), end;
  ostreambuf_iterator<char> osb(cout);
  while(isb != end)
    *osb++ = *isb++; // Copy 'in' to cout
  cout << endl;
  ifstream in2("StreambufIterator.cpp");
  // Strips white space:
  istream_iterator<char> is(in2), end2;
  ostream_iterator<char> os(cout);
  while(is != end2)
    *os++ = *is++;
  cout << endl;
} ///:~

The stream iterators use the parsing defined by istream::operator>>, which is probably not
what you want if you are parsing characters directly – it’s fairly rare that you would want all the whitespace stripped out of your character stream. You’ll virtually always want to use a streambuf iterator when using characters and streams, rather than a stream iterator. In addition, istream::operator>> adds significant overhead for each operation, so it is only appropriate for higher-level operations such as parsing floating-point numbers.[18]

Manipulating raw storage

This is a little more esoteric and is generally used in the implementation of other Standard Library functions, but it is nonetheless interesting. The raw_storage_iterator is defined in <algorithm> and is an output iterator. It is provided to enable algorithms to store their results into uninitialized memory. The interface is quite simple: the constructor takes an output iterator that is pointing to the raw memory (thus it is typically a pointer) and the operator= assigns an object into that raw memory. The template parameters are the type of the output iterator pointing to the raw storage, and the type of object that will be stored. Here’s an example which creates Noisy objects (you’ll be introduced to the Noisy class shortly; it’s not necessary to know its details for this example):

//: C04:RawStorageIterator.cpp
// Demonstrate the raw_storage_iterator
#include "Noisy.h"
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace std;

int main() {
  const int quantity = 10;
  // Create raw storage and cast to desired type:
  Noisy* np = 
    (Noisy*)new char[quantity * sizeof(Noisy)];
  raw_storage_iterator<Noisy*, Noisy> rsi(np);
  for(int i = 0; i < quantity; i++)
    *rsi++ = Noisy(); // Place objects in storage
  cout << endl;
  copy(np, np + quantity,
    ostream_iterator<Noisy>(cout, " "));
  cout << endl;
  // Explicit destructor call for cleanup:
  for(int j = 0; j < quantity; j++)
    (&np[j])->~Noisy();
  // Release raw storage:
  delete (char*)np;
} ///:~

To make the raw_storage_iterator template happy, the raw storage must be of the same type as the objects you’re creating. That’s why the pointer from the new array of char is cast to a Noisy*. The assignment operator forces the objects into the raw storage using the copy-constructor. Note that the explicit destructor call must be made for proper cleanup, and this also allows the objects to be deleted one at a time during container manipulation.

Basic sequences:
vector, list & deque

If you take a step back from the STL containers you’ll see that there are really only two types of container: sequences (including vector, list, deque, stack, queue, and priority_queue) and associations (including set, multiset, map and multimap). The sequences keep the objects in whatever sequence that you establish (either by pushing the objects on the end or inserting them in the middle).

Since all the sequence containers have the same basic goal (to maintain your order) they seem relatively interchangeable. However, they differ in the efficiency of their operations, so if you are going to manipulate a sequence in a particular fashion you can choose the appropriate container for those types of manipulations. The “basic” sequence containers are vector, list and deque – these actually have fleshed-out implementations, while stack, queue and priority_queue are built on top of the basic sequences, and represent more specialized uses rather than differences in underlying structure (stack, for example, can be implemented using a deque, vector or list).

So far in this book I have been using vector as a catch-all container. This was acceptable because I’ve only used the simplest and safest operations, primarily push_back( ) and operator[ ]. However, when you start making more sophisticated uses of containers it becomes important to know more about their underlying implementations and behavior, so you can make the right choices (and, as you’ll see, stay out of trouble).

Basic sequence operations

Using a template, the following example shows the operations that all the basic sequences (vector, deque or list) support. As you shall learn in the sections on the specific sequence containers, not all of these operations make sense for each basic sequence, but they are supported.

//: C04:BasicSequenceOperations.cpp
// The operations available for all the 
// basic sequence Containers.
#include <iostream>
#include <vector>
#include <deque>
#include <list>
using namespace std;

template<typename Container>
void print(Container& c, char* s = "") {
  cout << s << ":" << endl;
  if(c.empty()) {
    cout << "(empty)" << endl;
    return;
  }
  typename Container::iterator it;
  for(it = c.begin(); it != c.end(); it++)
    cout << *it << " ";
  cout << endl;
  cout << "size() " << c.size() 
    << " max_size() "<< c.max_size() 
    << " front() " << c.front()
    << " back() " << c.back() << endl;
}
  
template<typename ContainerOfInt>
void basicOps(char* s) {
  cout << "------- " << s << " -------" << endl;
  typedef ContainerOfInt Ci;
  Ci c;
  print(c, "c after default constructor");
  Ci c2(10, 1); // 10 elements, values all 1
  print(c2, "c2 after constructor(10,1)");
  int ia[] = { 1, 3, 5, 7, 9 };
  const int iasz = sizeof(ia)/sizeof(*ia);
  // Initialize with begin & end iterators:
  Ci c3(ia, ia + iasz);
  print(c3, "c3 after constructor(iter,iter)");
  Ci c4(c2); // Copy-constructor
  print(c4, "c4 after copy-constructor(c2)");
  c = c2; // Assignment operator
  print(c, "c after operator=c2");
  c.assign(10, 2); // 10 elements, values all 2
  print(c, "c after assign(10, 2)");
  // Assign with begin & end iterators:
  c.assign(ia, ia + iasz);
  print(c, "c after assign(iter, iter)");
  cout << "c using reverse iterators:" << endl;
  typename Ci::reverse_iterator rit = c.rbegin();
  while(rit != c.rend())
    cout << *rit++ << " ";
  cout << endl;
  c.resize(4);
  print(c, "c after resize(4)");
  c.push_back(47);
  print(c, "c after push_back(47)");
  c.pop_back();
  print(c, "c after pop_back()");
  typename Ci::iterator it = c.begin();
  it++; it++;
  c.insert(it, 74);
  print(c, "c after insert(it, 74)");
  it = c.begin();
  it++;
  c.insert(it, 3, 96);
  print(c, "c after insert(it, 3, 96)");
  it = c.begin();
  it++;
  c.insert(it, c3.begin(), c3.end());
  print(c, "c after insert("
    "it, c3.begin(), c3.end())");
  it = c.begin();
  it++;
  c.erase(it);
  print(c, "c after erase(it)");
  typename Ci::iterator it2 = it = c.begin();
  it++;
  it2++; it2++; it2++; it2++; it2++;
  c.erase(it, it2);
  print(c, "c after erase(it, it2)");
  c.swap(c2);
  print(c, "c after swap(c2)");
  c.clear();
  print(c, "c after clear()");
}

int main() {
  basicOps<vector<int> >("vector");
  basicOps<deque<int> >("deque");
  basicOps<list<int> >("list");
} ///:~

The first function template, print( ), demonstrates the basic information you can get from any sequence container: whether it’s empty, its current size, the size of the largest possible container, the element at the beginning and the element at the end. You can also see that every container has begin( ) and end( ) methods that return iterators.

The basicOps( ) function tests everything else (and in turn calls print( )), including a variety of constructors: default, copy-constructor, quantity and initial value, and beginning and ending iterators. There’s an assignment operator= and two kinds of assign( ) member functions, one which takes a quantity and initial value and the other which take a beginning and ending iterator.

All the basic sequence containers are reversible containers, as shown by the use of the rbegin( ) and rend( ) member functions. A sequence container can be resized, and the entire contents of the container can be removed with clear( ).

Using an iterator to indicate where you want to start inserting into any sequence container, you can insert( ) a single element, a number of elements that all have the same value, and a group of elements from another container using the beginning and ending iterators of that group.

To erase( ) a single element from the middle, use an iterator; to erase( ) a range of elements, use a pair of iterators. Notice that since a list only supports bidirectional iterators, all the iterator motion must be performed with increments and decrements (if the containers were limited to vector and deque, which produce random-access iterators, then operator+ and operator- could have been used to move the iterators in big jumps).

Although both list and deque support push_front( ) and pop_front( ), vector does not, so the only member functions that work with all three are push_back( ) and pop_back( ).

The naming of the member function swap( ) is a little confusing, since there’s also a non-member swap( ) algorithm that switches two elements of a container. The member swap( ), however, swaps everything in one container for another (if the containers hold the same type), effectively swapping the containers themselves. There’s also a non-member version of this function.

The following sections on the sequence containers discuss the particulars of each type of container.

vector

The vector is intentionally made to look like a souped-up array, since it has array-style indexing but also can expand dynamically. vector is so fundamentally useful that it was introduced in a very primitive way early in this book, and used quite regularly in previous examples. This section will give a more in-depth look at vector.

To achieve maximally-fast indexing and iteration, the vector maintains its storage as a single contiguous array of objects. This is a critical point to observe in understanding the behavior of vector. It means that indexing and iteration are lighting-fast, being basically the same as indexing and iterating over an array of objects. But it also means that inserting an object anywhere but at the end (that is, appending) is not really an acceptable operation for a vector. It also means that when a vector runs out of pre-allocated storage, in order to maintain its contiguous array it must allocate a whole new (larger) chunk of storage elsewhere and copy the objects to the new storage. This has a number of unpleasant side effects.

Cost of overflowing allocated storage

A vector starts by grabbing a block of storage, as if it’s taking a guess at how many objects you plan to put in it. As long as you don’t try to put in more objects than can be held in the initial block of storage, everything is very rapid and efficient (note that if you do know how many objects to expect, you can pre-allocate storage using reserve( )). But eventually you will put in one too many objects and, unbeknownst to you, the vector responds by:

  1. Allocating a new, bigger piece of storage
  2. Copying all the objects from the old storage to the new (using the copy-constructor)
  3. Destroying all the old objects (the destructor is called for each one)
  4. Releasing the old memory

For complex objects, this copy-construction and destruction can end up being very expensive if you overfill your vector a lot. To see what happens when you’re filling a vector, here is a class that prints out information about its creations, destructions, assignments and copy-constructions:

//: C04:Noisy.h
// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>

class Noisy {
  static long create, assign, copycons, destroy;
  long id;
public:
  Noisy() : id(create++) { 
    std::cout << "d[" << id << "]"; 
  }
  Noisy(const Noisy& rv) : id(rv.id) {
    std::cout << "c[" << id << "]";
    copycons++;
  }
  Noisy& operator=(const Noisy& rv) {
    std::cout << "(" << id << ")=[" <<
      rv.id << "]";
    id = rv.id;
    assign++;
    return *this;
  }
  friend bool 
  operator<(const Noisy& lv, const Noisy& rv) {
    return lv.id < rv.id;
  }
  friend bool 
  operator==(const Noisy& lv, const Noisy& rv) {
    return lv.id == rv.id;
  }
  ~Noisy() {
    std::cout << "~[" << id << "]";
    destroy++;
  }
  friend std::ostream& 
  operator<<(std::ostream& os, const Noisy& n) {
    return os << n.id;
  }
  friend class NoisyReport;
};

struct NoisyGen {
  Noisy operator()() { return Noisy(); }
};

// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
  static NoisyReport nr;
  NoisyReport() {} // Private constructor
public:
  ~NoisyReport() {
    std::cout << "\n-------------------\n"
      << "Noisy creations: " << Noisy::create
      << "\nCopy-Constructions: " 
      << Noisy::copycons
      << "\nAssignments: " << Noisy::assign
      << "\nDestructions: " << Noisy::destroy
      << std::endl;
  }
};

// Because of these this file can only be used
// in simple test situations. Move them to a 
// .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
  Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;
#endif // NOISY_H ///:~

Each Noisy object has its own identifier, and there are static variables to keep track of all the creations, assignments (using operator=), copy-constructions and destructions. The id is initialized using the create counter inside the default constructor; the copy-constructor and assignment operator take their id values from the rvalue. Of course, with operator= the lvalue is already an initialized object so the old value of id is printed before it is overwritten with the id from the rvalue.

In order to support certain operations like sorting and searching (which are used implicitly by some of the containers), Noisy must have an operator< and operator==. These simply compare the id values. The operator<< for ostream follows the standard form and simply prints the id.

NoisyGen produces a function object (since it has an operator( )) that is used to automatically generate Noisy objects during testing.

NoisyReport is a type of class called a singleton, which is a “design pattern” (these are covered more fully in Chapter XX). Here, the goal is to make sure there is one and only one NoisyReport object, because it is responsible for printing out the results at program termination. It has a private constructor so no one else can make a NoisyReport object, and a single static instance of NoisyReport called nr. The only executable statements are in the destructor, which is called as the program exits and the static destructors are called; this destructor prints out the statistics captured by the static variables in Noisy.

The one snag to this header file is the inclusion of the definitions for the statics at the end. If you include this header in more than one place in your project, you’ll get multiple-definition errors at link time. Of course, you can put the static definitions in a separate cpp file and link it in, but that is less convenient, and since Noisy is just intended for quick-and-dirty experiments the header file should be reasonable for most situations.

Using Noisy.h, the following program will show the behaviors that occur when a vector overflows its currently allocated storage:

//: C04:VectorOverflow.cpp
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
// (It maintains a linear array of elements)
#include "Noisy.h"
#include "../require.h"
#include <vector>
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  int size = 1000;
  if(argc >= 2) size = atoi(argv[1]);
  vector<Noisy> vn;
  Noisy n;
  for(int i = 0; i < size; i++)
    vn.push_back(n);
  cout << "\n cleaning up \n";
} ///:~

You can either use the default value of 1000, or use your own value by putting it on the command-line.

When you run this program, you’ll see a single default constructor call (for n), then a lot of copy-constructor calls, then some destructor calls, then some more copy-constructor calls, and so on. When the vector runs out of space in the linear array of bytes it has allocated, it must (to maintain all the objects in a linear array, which is an essential part of its job) get a bigger piece of storage and move everything over, copying first and then destroying the old objects. You can imagine that if you store a lot of large and complex objects, this process could rapidly become prohibitive.

There are two solutions to this problem. The nicest one requires that you know beforehand how many objects you’re going to make. In that case you can use reserve( ) to tell the vector how much storage to pre-allocate, thus eliminating all the copies and destructions and making everything very fast (especially random access to the objects with operator[ ]). Note that the use of reserve( ) is different from using the vector constructor with an integral first argument; the latter initializes each element using the default copy-constructor.

However, in the more general case you won’t know how many objects you’ll need. If vector reallocations are slowing things down, you can change sequence containers. You could use a list, but as you’ll see, the deque allows speedy insertions at either end of the sequence, and never needs to copy or destroy objects as it expands its storage. The deque also allows random access with operator[ ], but it’s not quite as fast as vector’s operator[ ]. So in the case where you’re creating all your objects in one part of the program and randomly accessing them in another, you may find yourself filling a deque, then creating a vector from the deque and using the vector for rapid indexing. Of course, you don’t want to program this way habitually, just be aware of these issues (avoid premature optimization).

There is a darker side to vector’s reallocation of memory, however. Because vector keeps its objects in a nice, neat array (allowing, for one thing, maximally-fast random access), the iterators used by vector are generally just pointers. This is a good thing – of all the sequence containers, these pointers allow the fastest selection and manipulation. However, consider what happens when you’re holding onto an iterator (i.e. a pointer) and then you add the one additional object that causes the vector to reallocate storage and move it elsewhere. Your pointer is now pointing off into nowhere:

//: C04:VectorCoreDump.cpp
// How to break a program using a vector
#include <vector>
#include <iostream>
using namespace std;

int main() {
  vector<int> vi(10, 0);
  ostream_iterator<int> out(cout, " ");
  copy(vi.begin(), vi.end(), out);
  vector<int>::iterator i = vi.begin();
  cout << "\n i: " << long(i) << endl;
  *i = 47;
  copy(vi.begin(), vi.end(), out);
  // Force it to move memory (could also just add
  // enough objects):
  vi.resize(vi.capacity() + 1);
  // Now i points to wrong memory:
  cout << "\n i: " << long(i) << endl;
  cout << "vi.begin(): " << long(vi.begin());
  *i = 48;  // Access violation
} ///:~

If your program is breaking mysteriously, look for places where you hold onto an iterator while adding more objects to a vector. You’ll need to get a new iterator after adding elements, or use operator[ ] instead for element selections. If you combine the above observation with the awareness of the potential expense of adding new objects to a vector, you may conclude that the safest way to use one is to fill it up all at once (ideally, knowing first how many objects you’ll need) and then just use it (without adding more objects) elsewhere in the program. This is the way vector has been used in the book up to this point.

You may observe that using vector as the “basic” container in the earlier chapters of this book may not be the best choice in all cases. This is a fundamental issue in containers, and in data structures in general: the “best” choice varies according to the way the container is used. The reason vector has been the “best” choice up until now is that it looks a lot like an array, and was thus familiar and easy for you to adopt. But from now on it’s also worth thinking about other issues when choosing containers.

Inserting and erasing elements

The vector is most efficient if:

  1. You reserve( ) the correct amount of storage at the beginning so the vector never has to reallocate.
  2. You only add and remove elements from the back end.

It is possible to insert and erase elements from the middle of a vector using an iterator, but the following program demonstrates what a bad idea it is:

//: C04:VectorInsertAndErase.cpp
// Erasing an element from a vector
#include "Noisy.h"
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int main() {
  vector<Noisy> v;
  v.reserve(11);
  cout << "11 spaces have been reserved" << endl;
  generate_n(back_inserter(v), 10, NoisyGen());
  ostream_iterator<Noisy> out(cout, " ");
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "Inserting an element:" << endl;
  vector<Noisy>::iterator it = 
    v.begin() + v.size() / 2; // Middle
  v.insert(it, Noisy());
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << "\nErasing an element:" << endl;
  // Cannot use the previous value of it:
  it = v.begin() + v.size() / 2;
  v.erase(it);
  cout << endl;
  copy(v.begin(), v.end(), out);
  cout << endl;
} ///:~

When you run the program you’ll see that the call to reserve( ) really does only allocate storage – no constructors are called. The generate_n( ) call is pretty busy: each call to NoisyGen::operator( ) results in a construction, a copy-construction (into the vector) and a destruction of the temporary. But when an object is inserted into the vector in the middle, it must shove everything down to maintain the linear array and – since there is enough space – it does this with the assignment operator (if the argument of reserve( ) is 10 instead of eleven then it would have to reallocate storage). When an object is erased from the vector, the assignment operator is once again used to move everything up to cover the place that is being erased (notice that this requires that the assignment operator properly cleans up the lvalue). Lastly, the object on the end of the array is deleted.

You can imagine how enormous the overhead can become if objects are inserted and removed from the middle of a vector if the number of elements is large and the objects are complicated. It’s obviously a practice to avoid.

deque

The deque (double-ended-queue, pronounced “deck”) is the basic sequence container optimized for adding and removing elements from either end. It also allows for reasonably fast random access – it has an operator[ ] like vector. However, it does not have vector’s constraint of keeping everything in a single sequential block of memory. Instead, deque uses multiple blocks of sequential storage (keeping track of all the blocks and their order in a mapping structure). For this reason the overhead for a deque to add or remove elements at either end is very low. In addition, it never needs to copy and destroy contained objects during a new storage allocation (like vector does) so it is far more efficient than vector if you are adding an unknown quantity of objects. This means that vector is the best choice only if you have a pretty good idea of how many objects you need. In addition, many of the programs shown earlier in this book that use vector and push_back( ) might be more efficient with a deque. The interface to deque is only slightly different from a vector (deque has a push_front( ) and pop_front( ) while vector does not, for example) so converting code from using vector to using deque is almost trivial. Consider StringVector.cpp, which can be changed to use deque by replacing the word “vector” with “deque” everywhere. The following program adds parallel deque operations to the vector operations in StringVector.cpp, and performs timing comparisons:

//: C04:StringDeque.cpp
// Converted from StringVector.cpp
#include "../require.h"
#include <string>
#include <deque>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <ctime>
using namespace std;

int main(int argc, char* argv[]) {
  requireArgs(argc, 1);
  ifstream in(argv[1]);
  assure(in, argv[1]);
  vector<string> vstrings;
  deque<string> dstrings;
  string line;
  // Time reading into vector:
  clock_t ticks = clock();
  while(getline(in, line))
    vstrings.push_back(line);
  ticks = clock() - ticks;
  cout << "Read into vector: " << ticks << endl;
  // Repeat for deque:
  ifstream in2(argv[1]);
  assure(in2, argv[1]);
  ticks = clock();
  while(getline(in2, line))
    dstrings.push_back(line);
  ticks = clock() - ticks;
  cout << "Read into deque: " << ticks << endl;
  // Now compare indexing:
  ticks = clock();
  for(int i = 0; i < vstrings.size(); i++) {
    ostringstream ss;
    ss << i;
    vstrings[i] = ss.str() + ": " + vstrings[i];
  }
  ticks = clock() - ticks;
  cout << "Indexing vector: " << ticks << endl;
  ticks = clock();
  for(int j = 0; j < dstrings.size(); j++) {
    ostringstream ss;
    ss << j;
    dstrings[j] = ss.str() + ": " + dstrings[j];
  }
  ticks = clock() - ticks;
  cout << "Indexing deqeue: " << ticks << endl;
  // Compare iteration
  ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp");
  ticks = clock();
  copy(vstrings.begin(), vstrings.end(),
    ostream_iterator<string>(tmp1, "\n"));
  ticks = clock() - ticks;
  cout << "Iterating vector: " << ticks << endl;
  ticks = clock();
  copy(dstrings.begin(), dstrings.end(),
    ostream_iterator<string>(tmp2, "\n"));
  ticks = clock() - ticks;
  cout << "Iterating deqeue: " << ticks << endl;
} ///:~

Knowing now what you do about the inefficiency of adding things to vector because of storage reallocation, you may expect dramatic differences between the two. However, on a 1.7 Megabyte text file one compiler’s program produced the following (measured in platform/compiler specific clock ticks, not seconds):

Read into vector: 8350
Read into deque: 7690
Indexing vector: 2360
Indexing deqeue: 2480
Iterating vector: 2470
Iterating deqeue: 2410

A different compiler and platform roughly agreed with this. It’s not so dramatic, is it? This points out some important issues:

  1. We (programmers) are typically very bad at guessing where inefficiencies occur in our programs.
  2. Efficiency comes from a combination of effects – here, reading the lines in and converting them to strings may dominate over the cost of the vector vs. deque.
  3. The string class is probably fairly well-designed in terms of efficiency.

Of course, this doesn’t mean you shouldn’t use a deque rather than a vector when you know that an uncertain number of objects will be pushed onto the end of the container. On the contrary, you should – when you’re tuning for performance. But you should also be aware that performance issues are usually not where you think they are, and the only way to know for sure where your bottlenecks are is by testing. Later in this chapter there will be a more “pure” comparison of performance between vector, deque and list.

Converting between sequences

Sometimes you need the behavior or efficiency of one kind of container for one part of your program, and a different container’s behavior or efficiency in another part of the program. For example, you may need the efficiency of a deque when adding objects to the container but the efficiency of a vector when indexing them. Each of the basic sequence containers (vector, deque and list) has a two-iterator constructor (indicating the beginning and ending of the sequence to read from when creating a new object) and an assign( ) member function to read into an existing container, so you can easily move objects from one sequence container to another.

The following example reads objects into a deque and then converts to a vector:

//: C04:DequeConversion.cpp
// Reading into a Deque, converting to a vector
#include "Noisy.h"
#include <deque>
#include <vector>
#include <iostream>
#include <algorithm>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  int size = 25;
  if(argc >= 2) size = atoi(argv[1]);
  deque<Noisy> d;
  generate_n(back_inserter(d), size, NoisyGen());
  cout << "\n Converting to a vector(1)" << endl;
  vector<Noisy> v1(d.begin(), d.end());
  cout << "\n Converting to a vector(2)" << endl;
  vector<Noisy> v2;
  v2.reserve(d.size());
  v2.assign(d.begin(), d.end());
  cout << "\n Cleanup" << endl;
} ///:~

You can try various sizes, but you should see that it makes no difference – the objects are simply copy-constructed into the new vectors. What’s interesting is that v1 does not cause multiple allocations while building the vector, no matter how many elements you use. You might initially think that you must follow the process used for v2 and preallocate the storage to prevent messy reallocations, but the constructor used for v1 determines the memory need ahead of time so this is unnecessary.

Cost of overflowing allocated storage

It’s illuminating to see what happens with a deque when it overflows a block of storage, in contrast with VectorOverflow.cpp:

//: C04:DequeOverflow.cpp
// A deque is much more efficient than a vector
// when pushing back a lot of elements, since it
// doesn't require copying and destroying.
#include "Noisy.h"
#include <deque>
#include <cstdlib>
using namespace std;

int main(int argc, char* argv[]) {
  int size = 1000;
  if(argc >= 2) size = atoi(argv[1]);
  deque<Noisy> dn;
  Noisy n;
  for(int i = 0; i < size; i++)
    dn.push_back(n);
  cout << "\n cleaning up \n";
} ///:~

Here you will never see any destructors before the words “cleaning up” appear. Since the deque allocates all its storage in blocks instead of a contiguous array like vector, it never needs to move existing storage (thus no additional copy-constructions and destructions occur). It simply allocates a new block. For the same reason, the deque can just as efficiently add elements to the beginning of the sequence, since if it runs out of storage it (again) just allocates a new block for the beginning. Insertions in the middle of a deque, however, could be even messier than for vector (but not as costly).

Because a deque never moves its storage, a held iterator never becomes invalid when you add new things to either end of a deque, as it was demonstrated to do with vector (in VectorCoreDump.cpp). However, it’s still possible (albeit harder) to do bad things:

//: C04:DequeCoreDump.cpp
// How to break a program using a deque
#include <queue>
#include <iostream>
using namespace std;

int main() {
  deque<int> di(100, 0);
  // No problem iterating from beginning to end,
  // even though it spans multiple blocks:
  copy(di.begin(), di.end(), 
    ostream_iterator<int>(cout, " "));
  deque<int>::iterator i = // In the middle:
    di.begin() + di.size() / 2;;
  // Walk the iterator forward as you perform 
  // a lot of insertions in the middle:
  for(int j = 0; j < 1000; j++) {
    cout << j << endl;
    di.insert(i++, 1); // Eventually breaks
  }
} ///:~

Of course, there are two things here that you wouldn’t normally do with a deque: first, elements are inserted in the middle, which deque allows but isn’t designed for. Second, calling insert( ) repeatedly with the same iterator would not ordinarily cause an access violation, but the iterator is walked forward after each insertion. I’m guessing it eventually walks off the end of a block, but I’m not sure what actually causes the problem.

If you stick to what deque is best at – insertions and removals from either end, reasonably rapid traversals and fairly fast random-access using operator[ ] – you’ll be in good shape.

Checked random-access

Both vector and deque provide two ways to perform random access of their elements: the operator[ ], which you’ve seen already, and at( ), which checks the boundaries of the container that’s being indexed and throws an exception if you go out of bounds. It does cost more to use at( ):

//: C04:IndexingVsAt.cpp
// Comparing "at()" to operator[]
#include "../require.h"
#include <vector>
#include <deque>
#include <iostream>
#include <ctime>
using namespace std;

int main(int argc, char* argv[]) {
  requireMinArgs(argc, 1);
  long count = 1000;
  int sz = 1000;
  if(argc >= 2) count = atoi(argv[1]);
  if(argc >= 3) sz = atoi(argv[2]);
  vector<int> vi(sz);
  clock_t ticks = clock();
  for(int i1 = 0; i1 < count; i1++)
    for(int j = 0; j < sz; j++)
      vi[j];
  cout << "vector[]" << clock() - ticks << endl;
  ticks = clock();
  for(int i2 = 0; i2 < count; i2++)
    for(int j = 0; j < sz; j++)
      vi.at(j);
  cout << "vector::at()" << clock()-ticks <<endl;
  deque<int> di(sz);
  ticks = clock();
  for(int i3 = 0; i3 < count; i3++)
    for(int j = 0; j < sz; j++)
      di[j];
  cout << "deque[]" << clock() - ticks << endl;
  ticks = clock();
  for(int i4 = 0; i4 < count; i4++)
    for(int j = 0; j < sz; j++)
      di.at(j);
  cout << "deque::at()" << clock()-ticks <<endl;
  // Demonstrate at() when you go out of bounds:
  di.at(vi.size() + 1);
} ///:~

As you’ll learn in the exception-handling chapter, different systems may handle the uncaught exception in different ways, but you’ll know one way or another that something went wrong with the program when using at( ), whereas it’s possible to go blundering ahead using operator[ ].

list

A list is implemented as a doubly-linked list and is thus designed for rapid insertion and removal of elements in the middle of the sequence (whereas for vector and deque this is a much more costly operation). A list is so slow when randomly accessing elements that it does not have an operator[ ]. It’s best used when you’re traversing a sequence, in order, from beginning to end (or end to beginning) rather than choosing elements randomly from the middle. Even then the traversal is significantly slower than either a vector or a deque, but if you aren’t doing a lot of traversals that won’t be your bottleneck.

Another thing to be aware of with a list is the memory overhead of each link, which requires a forward and backward pointer on top of the storage for the actual object. Thus a list is a better choice when you have larger objects that you’ll be inserting and removing from the middle of the list. It’s better not to use a list if you think you might be traversing it a lot, looking for objects, since the amount of time it takes to get from the beginning of the list – which is the only place you can start unless you’ve already got an iterator to somewhere you know is closer to your destination – to the object of interest is proportional to the number of objects between the beginning and that object.

The objects in a list never move after they are created; “moving” a list element means changing the links, but never copying or assigning the actual objects. This means that a held iterator never moves when you add new things to a list as it was demonstrated to do in vector. Here’s an example using the Noisy class:

//: C04:ListStability.cpp
// Things don't move around in lists
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;

int main() {
  list<Noisy> l;
  ostream_iterator<Noisy> out(cout, " ");
  generate_n(back_inserter(l), 25, NoisyGen());
  cout << "\n Printing the list:" << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Reversing the list:" << endl;
  l.reverse();
  copy(l.begin(), l.end(), out);
  cout << "\n Sorting the list:" << endl;
  l.sort();
  copy(l.begin(), l.end(), out);
  cout << "\n Swapping two elements:" << endl;
  list<Noisy>::iterator it1, it2;
  it1 = it2 = l.begin();
  it2++;
  swap(*it1, *it2);
  cout << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Using generic reverse(): " << endl;
  reverse(l.begin(), l.end());
  cout << endl;
  copy(l.begin(), l.end(), out);
  cout << "\n Cleanup" << endl;
} ///:~

Operations as seemingly radical as reversing and sorting the list require no copying of objects, because instead of moving the objects, the links are simply changed. However, notice that sort( ) and reverse( ) are member functions of list, so they have special knowledge of the internals of list and can perform the pointer movement instead of copying. On the other hand, the swap( ) function is a generic algorithm, and doesn’t know about list in particular and so it uses the copying approach for swapping two elements. There are also generic algorithms for sort( ) and reverse( ), but if you try to use these you’ll discover that the generic reverse( ) performs lots of copying and destruction (so you should never use it with a list) and the generic sort( ) simply doesn’t work because it requires random-access iterators that list doesn’t provide (a definite benefit, since this would certainly be an expensive way to sort compared to list’s own sort( )). The generic sort( ) and reverse( ) should only be used with arrays, vectors and deques.

If you have large and complex objects you may want to choose a list first, especially if construction, destruction, copy-construction and assignment are expensive and if you are doing things like sorting the objects or otherwise reordering them a lot.

Special list operations

The list has some special operations that are built-in to make the best use of the structure of the list. You’ve already seen reverse( ) and sort( ), and here are some of the others in use:

//: C04:ListSpecialFunctions.cpp
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");

void print(list<Noisy>& ln, char* comment = "") {
  cout << "\n" << comment << ":\n";
  copy(ln.begin(), ln.end(), out);
  cout << endl;
}

int main() {
  typedef list<Noisy> LN;
  LN l1, l2, l3, l4;
  generate_n(back_inserter(l1), 6, NoisyGen());
  generate_n(back_inserter(l2), 6, NoisyGen());
  generate_n(back_inserter(l3), 6, NoisyGen());
  generate_n(back_inserter(l4), 6, NoisyGen());
  print(l1, "l1"); print(l2, "l2");
  print(l3, "l3"); print(l4, "l4");
  LN::iterator it1 = l1.begin();
  it1++; it1++; it1++;
  l1.splice(it1, l2);
  print(l1, "l1 after splice(it1, l2)");
  print(l2, "l2 after splice(it1, l2)");
  LN::iterator it2 = l3.begin();
  it2++; it2++; it2++;
  l1.splice(it1, l3, it2);
  print(l1, "l1 after splice(it1, l3, it2)");
  LN::iterator it3 = l4.begin(), it4 = l4.end();
  it3++; it4--;
  l1.splice(it1, l4, it3, it4);
  print(l1, "l1 after splice(it1,l4,it3,it4)");
  Noisy n;
  LN l5(3, n);
  generate_n(back_inserter(l5), 4, NoisyGen());
  l5.push_back(n);
  print(l5, "l5 before remove()");
  l5.remove(l5.front());
  print(l5, "l5 after remove()");
  l1.sort(); l5.sort();
  l5.merge(l1);
  print(l5, "l5 after l5.merge(l1)");
  cout << "\n Cleanup" << endl;
} ///:~

The print( ) function is used to display results. After filling four lists with Noisy objects, one list is spliced into another in three different ways. In the first, the entire list l2 is spliced into l1 at the iterator it1. Notice that after the splice, l2 is empty – splicing means removing the elements from the source list. The second splice inserts elements from l3 starting at it2 into l1 starting at it1. The third splice starts at it1 and uses elements from l4 starting at it3 and ending at it4 (the seemingly-redundant mention of the source list is because the elements must be erased from the source list as part of the transfer to the destination list).

The output from the code that demonstrates remove( ) shows that the list does not have to be sorted in order for all the elements of a particular value to be removed.

Finally, if you merge( ) one list with another, the merge only works sensibly if the lists have been sorted. What you end up with in that case is a sorted list containing all the elements from both lists (the source list is erased – that is, the elements are moved to the destination list).

There’s also a unique( ) member function that removes all duplicates, but only if the list has been sorted first:

//: C04:UniqueList.cpp
// Testing list's unique() function
#include <list>
#include <iostream>
using namespace std;

int a[] = { 1, 3, 1, 4, 1, 5, 1, 6, 1 };
const int asz = sizeof a / sizeof *a;

int main() {
  // For output:
  ostream_iterator<int> out(cout, " ");
  list<int> li(a, a + asz);
  li.unique();
  // Oops! No duplicates removed:
  copy(li.begin(), li.end(), out);
  cout << endl;
  // Must sort it first:
  li.sort();
  copy(li.begin(), li.end(), out);
  cout << endl;
  // Now unique() will have an effect:
  li.unique();
  copy(li.begin(), li.end(), out);
  cout << endl;
} ///:~

The list constructor used here takes the starting and past-the-end iterator from another container, and it copies all the elements from that container into itself (a similar constructor is available for all the containers). Here, the “container” is just an array, and the “iterators” are pointers into that array, but because of the design of the STL it works with arrays just as easily as any other container.

If you run this program, you’ll see that unique( ) will only remove adjacent duplicate elements, and thus sorting is necessary before calling unique( ).

There are four additional list member functions that are not demonstrated here: a remove_if( ) that takes a predicate which is used to decide whether an object should be removed, a unique( ) that takes a binary predicate to perform uniqueness comparisons, a merge( ) that takes an additional argument which performs comparisons, and a sort( ) that takes a comparator (to provide a comparison or override the existing one).

list vs. set

Looking at the previous example you may note that if you want a sorted list with no duplicates, a set can give you that, right? It’s interesting to compare the performance of the two containers:

//: C04:ListVsSet.cpp
// Comparing list and set performance
#include <iostream>
#include <list>
#include <set>
#include <algorithm>
#include <ctime>
#include <cstdlib>
using namespace std;

class Obj {
  int a[20]; // To take up extra space
  int val;
public:
  Obj() : val(rand() % 500) {}
  friend bool 
  operator<(const Obj& a, const Obj& b) {
    return a.val < b.val;
  }
  friend bool 
  operator==(const Obj& a, const Obj& b) {
    return a.val == b.val;
  }
  friend ostream& 
  operator<<(ostream& os, const Obj& a) {
    return os << a.val;
  }
};

template<class Container>
void print(Container& c) {
  typename Container::iterator it;
  for(it = c.begin(); it != c.end(); it++)
    cout << *it << " ";
  cout << endl;
}

struct ObjGen {
  Obj operator()() { return Obj(); }
};

int main() {
  const int sz = 5000;
  srand(time(0));
  list<Obj> lo;
  clock_t ticks = clock();
  generate_n(back_inserter(lo), sz, ObjGen());
  lo.sort();
  lo.unique();
  cout << "list:" << clock() - ticks << endl;
  set<Obj> so;
  ticks = clock();
  generate_n(inserter(so, so.begin()), 
    sz, ObjGen());
  cout << "set:" << clock() - ticks << endl;
  print(lo);
  print(so);
} ///:~

When you run the program, you should discover that set is much faster than list. This is reassuring – after all, it is set’s primary job description!

Swapping all basic sequences

It turns out that all basic sequences have a member function swap( ) that’s designed to switch one sequence with another (however, this swap( ) is only defined for sequences of the same type). The member swap( ) makes use of its knowledge of the internal structure of the particular container in order to be efficient:

//: C04:Swapping.cpp
// All basic sequence containers can be swapped
#include "Noisy.h"
#include <list>
#include <vector>
#include <deque>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");

template<class Cont>
void print(Cont& c, char* comment = "") {
  cout << "\n" << comment << ": ";
  copy(c.begin(), c.end(), out);
  cout << endl;
}

template<class Cont>
void testSwap(char* cname) {
  Cont c1, c2;
  generate_n(back_inserter(c1), 10, NoisyGen());
  generate_n(back_inserter(c2), 5, NoisyGen());
  cout << "\n" << cname << ":" << endl;
  print(c1, "c1"); print(c2, "c2");
  cout << "\n Swapping the " << cname 
    << ":" << endl;
  c1.swap(c2);
  print(c1, "c1"); print(c2, "c2");
}  

int main() {
  testSwap<vector<Noisy> >("vector");
  testSwap<deque<Noisy> >("deque");
  testSwap<list<Noisy> >("list");
} ///:~

When you run this, you’ll discover that each type of sequence container is able to swap one sequence for another without any copying or assignments, even if the sequences are of diffe