Container classes are the solution to a
specific kind of code reuse problem. They are building blocks used to create
object-oriented programs – they make the internals of a program much
easier to construct.
A container class describes an object that holds other
objects. Container classes are so important that they were considered
fundamental to early object-oriented languages. In Smalltalk, for example,
programmers think of the language as the program translator together with the
class library, and a critical part of that library is the container classes. So
it became natural that C++ compiler vendors also include a container class
library. You’ll note that the vector was so useful that it was
introduced in its simplest form very early in this book.
Like many other early C++ libraries, early container class
libraries followed Smalltalk’s object-based hierarchy, which worked
well for Smalltalk, but turned out to be awkward and difficult to use in C++.
Another approach was required.
This chapter attempts to slowly work you into the concepts of
the C++ Standard Template Library (STL), which is a powerful library of
containers (as well as algorithms, but these are covered in the following
chapter). In the past, I have taught that there is a relatively small subset of
elements and ideas that you need to understand in order to get much of the
usefulness from the STL. Although this can be true it turns out that
understanding the STL more deeply is important to gain the full power of the
library. This chapter and the next probe into the STL containers and
algorithms.
If you don’t know how many objects you’re going to
need to solve a particular problem, or how long they will last, you also
don’t know how to store those objects. How can you know how much space to
create? You can’t, since that information isn’t known until run
time.
The solution to most problems in object-oriented design seems
flippant: you create another type of object. For the storage problem, the new
type of object holds other objects, or pointers to objects. Of course, you can
do the same thing with an array, but there’s more. This new type of
object, which is typically referred to in C++ as a container (also called
a collection in some languages), will expand itself whenever necessary to
accommodate everything you place inside it. So you don’t need to know how
many objects you’re going to hold in a collection. You just create a
collection object and let it take care of the details.
Fortunately, a good OOP language comes with a set of
containers as part of the package. In C++, it’s the Standard Template
Library (STL). In some libraries, a generic container is considered good enough
for all needs, and in others (C++ in particular) the library has different types
of containers for different needs: a vector for consistent access to all
elements, and a linked list for consistent insertion at all elements, for
example, so you can choose the particular type that fits your needs. These may
include sets, queues, hash tables, trees, stacks, etc.
All containers have some way to put things in and get things
out. The way that you place something into a container is fairly obvious.
There’s a function called “push” or “add” or a
similar name. Fetching things out of a container is not always as apparent; if
it’s an array-like entity such as a vector, you might be able to use an
indexing operator or function. But in many situations this doesn’t make
sense. Also, a single-selection function is restrictive. What if you want to
manipulate or compare a group of elements in the container?
The solution is an iterator, which is an object whose
job is to select the elements within a container and present them to the user of
the iterator. As a class, it also provides a level of abstraction. This
abstraction can be used to separate the details of the container from the code
that’s accessing that container. The container, via the iterator, is
abstracted to be simply a sequence. The iterator allows you to traverse that
sequence without worrying about the underlying structure – that is,
whether it’s a vector, a linked list, a stack or something else. This
gives you the flexibility to easily change the underlying data structure without
disturbing the code in your program.
From the design standpoint, all you really want is a sequence
that can be manipulated to solve your problem. If a single type of sequence
satisfied all of your needs, there’d be no reason to have different kinds.
There are two reasons that you need a choice of containers. First, containers
provide different types of interfaces and external behavior. A stack has a
different interface and behavior than that of a queue, which is different than
that of a set or a list. One of these might provide a more flexible solution to
your problem than the other. Second, different containers have different
efficiencies for certain operations. The best example is a vector and a list.
Both are simple sequences that can have identical interfaces and external
behaviors. But certain operations can have radically different costs. Randomly
accessing elements in a vector is a constant-time operation; it takes the same
amount of time regardless of the element you select. However, in a linked list
it is expensive to move through the list to randomly select an element, and it
takes longer to find an element if it is further down the list. On the other
hand, if you want to insert an element in the middle of a sequence, it’s
much cheaper in a list than in a vector. These and other operations have
different efficiencies depending upon the underlying structure of the sequence.
In the design phase, you might start with a list and, when tuning for
performance, change to a vector. Because of the abstraction via iterators, you
can change from one to the other with minimal impact on your code.
In the end, remember that a container is only a storage
cabinet to put objects in. If that cabinet solves all of your needs, it
doesn’t really matter how it is implemented (a basic concept with
most types of objects). If you’re working in a programming environment
that has built-in overhead due to other factors, then the cost difference
between a vector and a linked list might not matter. You might need only one
type of sequence. You can even imagine the “perfect” container
abstraction, which can automatically change its underlying implementation
according to the way it is used.
You will notice that this chapter does not contain exhaustive
documentation describing each of the member functions in each STL container.
Although I describe the member functions that I use, I’ve left the full
descriptions to others: there are at least two very good on-line sources of STL
documentation in HTML format that you can keep resident on your computer and
view with a Web browser whenever you need to look something up. The first is the
Dinkumware library (which covers the entire Standard C and C++ library)
mentioned at the beginning of this book section (page XXX). The second is the
freely-downloadable SGI STL and documentation, freely downloadable at
http://www.sgi.com/Technology/STL/. These should provide complete references
when you’re writing code. In addition, the STL books listed in Appendix XX
will provide you with other resources.
The C++
STL[16]
is a powerful library intended to satisfy the vast bulk of your needs for
containers and algorithms, but in a completely portable fashion. This means that
not only are your programs easier to port to other platforms, but that your
knowledge itself does not depend on the libraries provided by a particular
compiler vendor (and the STL is likely to be more tested and scrutinized than a
particular vendor’s library). Thus, it will benefit you greatly to look
first to the STL for containers and algorithms, before looking at
vendor-specific solutions.
A fundamental principle of software design is that all
problems can be simplified by introducing an extra level of indirection.
This simplicity is achieved in the STL using iterators to perform
operations on a data structure while knowing as little as possible about that
structure, thus producing data structure independence. With the STL, this means
that any operation that can be performed on an array of objects can also be
performed on an STL container of objects and vice versa. The STL containers work
just as easily with built-in types as they do with user-defined types. If you
learn the library, it will work on everything.
The drawback to this independence is that you’ll have to
take a little time at first getting used to the way things are done in the STL.
However, the STL uses a consistent pattern, so once you fit your mind around it,
it doesn’t change from one STL tool to another.
Consider an example using the STL set
class. A set will allow only one
of each object value to be inserted into itself. Here is a simple set
created to work with ints by providing int as the template
argument to set:
//: C04:Intset.cpp
// Simple use of STL set
#include <set>
#include <iostream>
using namespace std;
int main() {
set<int> intset;
for(int i = 0; i < 25; i++)
for(int j = 0; j < 10; j++)
// Try to insert multiple copies:
intset.insert(j);
// Print to output:
copy(intset.begin(), intset.end(),
ostream_iterator<int>(cout, "\n"));
} ///:~
The insert( ) member does all the work: it tries
putting the new element in and rejects it if it’s already there. Very
often the activities involved in using a set are simply insertion and a test to
see whether it contains the element. You can also form a union, intersection, or
difference of sets, and test to see if one set is a subset of another.
In this example, the values 0 - 9 are inserted into the set 25
times, and the results are printed out to show that only one of each of the
values is actually retained in the set.
The copy( ) function is actually the instantiation
of an STL template function, of which there are many. These template functions
are generally referred to as “the STL Algorithms” and will be the
subject of the following chapter. However, several of the algorithms are so
useful that they will be introduced in this chapter. Here, copy( )
shows the use of iterators. The set member functions
begin( ) and end( ) produce iterators as their return
values. These are used by copy( ) as beginning and ending points for
its operation, which is simply to move between the boundaries established by the
iterators and copy the elements to the third argument, which is also an
iterator, but in this case, a special type created for iostreams. This places
int objects on cout and separates them with a newline.
Because of its genericity, copy( ) is certainly
not restricted to printing on a stream. It can be used in virtually any
situation: it needs only three iterators to talk to. All of the algorithms
follow the form of copy( ) and simply manipulate iterators (the use
of iterators is the “extra level of indirection”).
Now consider taking the form of Intset.cpp and
reshaping it to display a list of the words used in a document. The solution
becomes remarkably simple.
//: C04:WordSet.cpp
#include "../require.h"
#include <string>
#include <fstream>
#include <iostream>
#include <set>
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
ifstream source(argv[1]);
assure(source, argv[1]);
string word;
set<string> words;
while(source >> word)
words.insert(word);
copy(words.begin(), words.end(),
ostream_iterator<string>(cout, "\n"));
cout << "Number of unique words:"
<< words.size() << endl;
} ///:~
The only substantive difference here is that string is
used instead of int. The words are pulled from a file, but everything
else is the same as in Intset.cpp. The operator>> returns a
whitespace-separated group of characters each time it is called, until
there’s no more input from the file. So it approximately breaks an input
stream up into words. Each string is placed in the set using
insert( ), and the copy( ) function is used to display
the results. Because of the way set is implemented (as a tree), the words
are automatically sorted.
Consider how much effort it would be to accomplish the same
task in C, or even in C++ without the STL.
The primary idea in the STL is the container (also
known as a collection), which is just what it sounds like: a place to
hold things. You need containers because objects are constantly marching in and
out of your program and there must be someplace to put them while they’re
around. You can’t make named local objects because in a typical program
you don’t know how many, or what type, or the lifetime of the objects
you’re working with. So you need a container that will expand whenever
necessary to fill your needs.
All the containers in the STL hold objects and expand
themselves. In addition, they hold your objects in a particular way. The
difference between one container and another is the way the objects are held and
how the sequence is created. Let’s start by looking at the simplest
containers.
A vector is a linear sequence that allows rapid random
access to its elements. However, it’s expensive to insert an element in
the middle of the sequence, and is also expensive when it allocates additional
storage. A deque is also a linear sequence, and it allows random access
that’s nearly as fast as vector, but it’s significantly
faster when it needs to allocate new storage, and you can easily add new
elements at either end (vector only allows the addition of elements at
its tail). A list the third type of basic linear sequence, but it’s
expensive to move around randomly and cheap to insert an element in the middle.
Thus list, deque and vector are very similar in their basic
functionality (they all hold linear sequences), but different in the cost of
their activities. So for your first shot at a program, you could choose any one,
and only experiment with the others if you’re tuning for
efficiency.
Many of the problems you set out to solve will only require a
simple linear sequence like a vector, deque or list. All
three have a member function push_back( ) which you use to insert a
new element at the back of the sequence (deque and list also have
push_front( )).
But now how do you retrieve those elements? With a
vector or deque, it is possible to use the indexing operator[
], but that doesn’t work with list. Since it would be nicest to
learn a single interface, we’ll often use the one defined for all STL
containers: the iterator.
An iterator is a class that abstracts the process of moving
through a sequence. It allows you to select each element of a sequence
without knowing the underlying structure of that sequence. This is a
powerful feature, partly because it allows us to learn a single interface that
works with all containers, and partly because it allows containers to be used
interchangeably.
One more observation and you’re ready for another
example. Even though the STL containers hold objects by value (that is, they
hold the whole object inside themselves) that’s probably not the way
you’ll generally use them if you’re doing object-oriented
programming. That’s because in OOP, most of the time you’ll create
objects on the heap with new and then upcast the address to the
base-class type, later manipulating it as a pointer to the base class. The
beauty of this is that you don’t worry about the specific type of object
you’re dealing with, which greatly reduces the complexity of your code and
increases the maintainability of your program. This process of upcasting is what
you try to do in OOP with polymorphism, so you’ll usually be using
containers of pointers.
Consider the classic “shape” example where shapes
have a set of common operations, and you have different types of shapes.
Here’s what it looks like using the STL vector to hold pointers to
various types of Shape created on the heap:
//: C04:Stlshape.cpp
// Simple shapes w/ STL
#include <vector>
#include <iostream>
using namespace std;
class Shape {
public:
virtual void draw() = 0;
virtual ~Shape() {};
};
class Circle : public Shape {
public:
void draw() { cout << "Circle::draw\n"; }
~Circle() { cout << "~Circle\n"; }
};
class Triangle : public Shape {
public:
void draw() { cout << "Triangle::draw\n"; }
~Triangle() { cout << "~Triangle\n"; }
};
class Square : public Shape {
public:
void draw() { cout << "Square::draw\n"; }
~Square() { cout << "~Square\n"; }
};
typedef std::vector<Shape*> Container;
typedef Container::iterator Iter;
int main() {
Container shapes;
shapes.push_back(new Circle);
shapes.push_back(new Square);
shapes.push_back(new Triangle);
for(Iter i = shapes.begin();
i != shapes.end(); i++)
(*i)->draw();
// ... Sometime later:
for(Iter j = shapes.begin();
j != shapes.end(); j++)
delete *j;
} ///:~
The creation of Shape, Circle, Square and
Triangle should be fairly familiar. Shape is a pure abstract base
class (because of the pure specifier =0) that defines the
interface for all types of shapes. The derived classes redefine the
virtual function draw( ) to perform the appropriate
operation. Now we’d like to create a bunch of different types of
Shape object, but where to put them? In an STL container, of course. For
convenience, this typedef:
typedef std::vector<Shape*> Container;
creates
an alias for a vector of Shape*, and this
typedef:
typedef Container::iterator Iter;
uses that alias to
create another one, for vector<Shape*>::iterator. Notice that the
container type name must be used to produce the appropriate iterator,
which is defined as a nested class. Although there are different types of
iterators (forward, bidirectional, reverse, etc., which will be explained later)
they all have the same basic interface: you can increment them with ++,
you can dereference them to produce the object they’re currently
selecting, and you can test them to see if they’re at the end of the
sequence. That’s what you’ll want to do 90% of the time. And
that’s what is done in the above example: after creating a container,
it’s filled with different types of Shape*. Notice that the upcast
happens as the Circle, Square or Rectangle pointer is added
to the shapes container, which doesn’t know about those specific
types but instead holds only Shape*. So as soon as the pointer is added
to the container it loses its specific identity and becomes an anonymous
Shape*. This is exactly what we want: toss them all in and let
polymorphism sort it out.
The first for loop creates an iterator and sets it to
the beginning of the sequence by calling the begin( ) member
function for the container. All containers have begin( ) and
end( ) member functions that produce an iterator selecting,
respectively, the beginning of the sequence and one past the end of the
sequence. To test to see if you’re done, you make sure you’re
!= to the iterator produced by end( ). Not < or
<=. The only test that works is !=. So it’s very common
to write a loop like:
for(Iter i = shapes.begin(); i != shapes.end(); i++)
This says: “take me through every element in the
sequence.”
What do you do with the iterator to produce the element
it’s selecting? You dereference it using (what else) the
‘*’ (which is actually an overloaded operator). What you get
back is whatever the container is holding. This container holds Shape*,
so that’s what *i produces. If you want to send a message to the
Shape, you must select that message with ->, so you write the
line:
(*i)->draw();
This calls the draw( ) function for the
Shape* the iterator is currently selecting. The parentheses are ugly but
necessary to produce the proper order of evaluation. As an alternative,
operator-> is defined so that you can say:
i->draw();
As they are destroyed or in other cases where the pointers are
removed, the STL containers do not call delete for the pointers
they contain. If you create an object on the heap with new and place its
pointer in a container, the container can’t tell if that pointer is also
placed inside another container. So the STL just doesn’t do anything about
it, and puts the responsibility squarely in your lap. The last lines in the
program move through and delete every object in the container so proper cleanup
occurs.
It’s very interesting to note that you can change the
type of container that this program uses with two lines. Instead of including
<vector>, you include <list>, and in the first
typedef you say:
typedef std::list<Shape*> Container;
instead of using a vector. Everything else goes
untouched. This is possible not because of an interface enforced by inheritance
(there isn’t any inheritance in the STL, which comes as a surprise when
you first see it), but because the interface is enforced by a convention adopted
by the designers of the STL, precisely so you could perform this kind of
interchange. Now you can easily switch between vector and list and
see which one works fastest for your
needs.
In the prior example, at the end of main( ), it
was necessary to move through the whole list and delete all the
Shape pointers.
for(Iter j = shapes.begin();
j != shapes.end(); j++)
delete *j;
This highlights what could be seen as a flaw in the STL:
there’s no facility in any of the STL containers to automatically
delete the pointers they contain, so you must do it by hand. It’s
as if the assumption of the STL designers was that containers of pointers
weren’t an interesting problem, although I assert that it is one of the
more common things you’ll want to do.
Automatically deleting a pointer turns out to be a rather
aggressive thing to do because of the multiple membership problem. If a
container holds a pointer to an object, it’s not unlikely that pointer
could also be in another container. A pointer to an Aluminum object in a
list of Trash pointers could also reside in a list of Aluminum
pointers. If that happens, which list is responsible for cleaning up that object
– that is, which list “owns” the object?
This question is virtually eliminated if the object rather
than a pointer resides in the list. Then it seems clear that when the list is
destroyed, the objects it contains must also be destroyed. Here, the STL shines,
as you can see when creating a container of string objects. The following
example stores each incoming line as a string in a
vector<string>:
//: C04:StringVector.cpp
// A vector of strings
#include "../require.h"
#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
ifstream in(argv[1]);
assure(in, argv[1]);
vector<string> strings;
string line;
while(getline(in, line))
strings.push_back(line);
// Do something to the strings...
int i = 1;
vector<string>::iterator w;
for(w = strings.begin();
w != strings.end(); w++) {
ostringstream ss;
ss << i++;
*w = ss.str() + ": " + *w;
}
// Now send them out:
copy(strings.begin(), strings.end(),
ostream_iterator<string>(cout, "\n"));
// Since they aren't pointers, string
// objects clean themselves up!
} ///:~
Once the vector<string> called strings is
created, each line in the file is read into a string and put in the
vector:
while(getline(in, line))
strings.push_back(line);
The operation that’s being performed on this file is to
add line numbers. A stringstream provides easy conversion from an
int to a string of characters representing that
int.
Assembling string objects is quite easy, since
operator+ is overloaded. Sensibly enough, the iterator w can be
dereferenced to produce a string that can be used as both an rvalue and
an lvalue:
*w = ss.str() + ": " + *w;
The fact that you can assign back into the container via the
iterator may seem a bit surprising at first, but it’s a tribute to the
careful design of the STL.
Because the vector<string> contains the objects
themselves, a number of interesting things take place. First, no cleanup is
necessary. Even if you were to put addresses of the string objects as
pointers into other containers, it’s clear that strings is
the “master list” and maintains ownership of the objects.
Second, you are effectively using dynamic object creation, and
yet you never use new or delete! That’s because, somehow,
it’s all taken care of for you by the vector (this is non-trivial.
You can try to figure it out by looking at the header files for the STL –
all the code is there – but it’s quite an exercise). Thus your
coding is significantly cleaned up.
The limitation of holding objects instead of pointers inside
containers is quite severe: you can’t upcast from derived types, thus you
can’t use polymorphism. The problem with upcasting objects by value is
that they get sliced and converted until their type is completely changed into
the base type, and there’s no remnant of the derived type left. It’s
pretty safe to say that you never want to do
this.
The power of instantly creating a sequence of elements is
amazing, and it makes you realize how much time you’ve spent (or rather,
wasted) in the past solving this particular problem. For example, many utility
programs involve reading a file into memory, modifying the file and writing it
back out to disk. One might as well take the functionality in
StringVector.cpp and package it into a class for later reuse.
Now the question is: do you create a member object of type
vector, or do you inherit? A general guideline is to always prefer
composition (member objects) over inheritance, but with the STL this is often
not true, because there are so many existing algorithms that work with the STL
types that you may want your new type to be an STL type. So the list of
strings should also be a vector, thus inheritance is
desired.
//: C04:FileEditor.h
// File editor tool
#ifndef FILEEDITOR_H
#define FILEEDITOR_H
#include <string>
#include <vector>
#include <iostream>
class FileEditor :
public std::vector<std::string> {
public:
FileEditor(char* filename);
void write(std::ostream& out = std::cout);
};
#endif // FILEEDITOR_H ///:~
Note the careful avoidance of a global using namespace
std statement here, to prevent the opening of the std namespace to
every file that includes this header.
The constructor opens the file and reads it into the
FileEditor, and write( ) puts the vector of
string onto any ostream. Notice in write( ) that you
can have a default argument for a reference.
The implementation is quite simple:
//: C04:FileEditor.cpp {O}
#include "FileEditor.h"
#include "../require.h"
#include <fstream>
using namespace std;
FileEditor::FileEditor(char* filename) {
ifstream in(filename);
assure(in, filename);
string line;
while(getline(in, line))
push_back(line);
}
// Could also use copy() here:
void FileEditor::write(ostream& out) {
for(iterator w = begin(); w != end(); w++)
out << *w << endl;
} ///:~
The functions from StringVector.cpp are simply
repackaged. Often this is the way classes evolve – you start by creating a
program to solve a particular application, then discover some commonly-used
functionality within the program that can be turned into a class.
The line numbering program can now be rewritten using
FileEditor:
//: C04:FEditTest.cpp
//{L} FileEditor
// Test the FileEditor tool
#include "FileEditor.h"
#include "../require.h"
#include <sstream>
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
FileEditor file(argv[1]);
// Do something to the lines...
int i = 1;
FileEditor::iterator w = file.begin();
while(w != file.end()) {
ostringstream ss;
ss << i++;
*w = ss.str() + ": " + *w;
w++;
}
// Now send them to cout:
file.write();
} ///:~Now the operation of reading the file is in the
constructor:
FileEditor file(argv[1]);
and writing happens in the single line (which defaults to
sending the output to cout):
file.write();
As mentioned earlier, the iterator is the abstraction that
allows a piece of code to be generic, and to work with different types of
containers without knowing the underlying structure of those containers.
Every container produces iterators. You must always be able to
say:
ContainerType::iterator ContainerType::const_iterator
to produce the types of the iterators produced by that
container. Every container has a begin( ) method that produces an
iterator indicating the beginning of the elements in the container, and an
end( ) method that produces an iterator which is the as the
past-the-end value of the container. If the container is
const¸ begin( ) and end( ) produce
const iterators.
Every iterator can be moved forward to the next element using
the operator++ (an iterator may be able to do more than this, as you
shall see, but it must at least support forward movement with
operator++).
The basic iterator is only guaranteed to be able to perform
== and != comparisons. Thus, to move an iterator it forward
without running it off the end you say something like:
while(it != pastEnd) {
// Do something
it++;
}
Where pastEnd is the past-the-end value produced by the
container’s end( ) member function.
An iterator can be used to produce the element that it is
currently selecting within a container by dereferencing the iterator. This can
take two forms. If it is an iterator and f( ) is a member
function of the objects held in the container that the iterator is pointing
within, then you can say either:
(*it).f();
or
it->f();
Knowing this, you can create a template that works with any
container. Here, the apply( ) function template calls a member
function for every object in the container, using a pointer to member that is
passed as an argument:
//: C04:Apply.cpp
// Using basic iterators
#include <iostream>
#include <vector>
#include <iterator>
using namespace std;
template<class Cont, class PtrMemFun>
void apply(Cont& c, PtrMemFun f) {
typename Cont::iterator it = c.begin();
while(it != c.end()) {
(it->*f)(); // Compact form
((*it).*f)(); // Alternate form
it++;
}
}
class Z {
int i;
public:
Z(int ii) : i(ii) {}
void g() { i++; }
friend ostream&
operator<<(ostream& os, const Z& z) {
return os << z.i;
}
};
int main() {
ostream_iterator<Z> out(cout, " ");
vector<Z> vz;
for(int i = 0; i < 10; i++)
vz.push_back(Z(i));
copy(vz.begin(), vz.end(), out);
cout << endl;
apply(vz, &Z::g);
copy(vz.begin(), vz.end(), out);
} ///:~
Because operator-> is defined for STL iterators, it
can be used for pointer-to-member dereferencing (in the following chapter
you’ll learn a more elegant way to handle the problem of applying a member
function or ordinary function to every object in a container).
Much of the time, this is all you need to know about iterators
– that they are produced by begin( ) and end( ),
and that you can use them to move through a container and select elements. Many
of the problems that you solve, and the STL algorithms (covered in the next
chapter) will allow you to just flail away with the basics of iterators.
However, things can at times become more subtle, and in those cases you need to
know more about iterators. The rest of this section gives you the
details.
All containers must produce the basic iterator. A
container may also be reversible, which means that it can produce
iterators that move backwards from the end, as well as the iterators that move
forward from the beginning.
A reversible container has the methods rbegin( )
(to produce a reverse_iterator selecting the end) and rend( )
(to produce a reverse_iterator indicating “one past the
beginning”). If the container is const then rbegin( )
and rend( ) will produce const_reverse_iterators.
All the basic sequence containers vector, deque
and list are reversible containers. The following example uses
vector, but will work with deque and list as
well:
//: C04:Reversible.cpp
// Using reversible containers
#include "../require.h"
#include <vector>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main() {
ifstream in("Reversible.cpp");
assure(in, "Reversible.cpp");
string line;
vector<string> lines;
while(getline(in, line))
lines.push_back(line);
vector<string>::reverse_iterator r;
for(r = lines.rbegin(); r != lines.rend(); r++)
cout << *r << endl;
} ///:~
You move backward through the container using the same syntax
as moving forward through a container with an ordinary iterator.
The associative containers set, multiset,
map and multimap are also reversible. Using iterators with
associative containers is a bit different, however, and will be delayed until
those containers are more fully
introduced.
The iterators are classified into different
“categories” which describe what they are capable of doing. The
order in which they are generally described moves from the categories with the
most restricted behavior to those with the most powerful behavior.
The only predefined implementations of input iterators are
istream_iterator and istreambuf_iterator, to read from an
istream. As you can imagine, an input iterator can only be dereferenced
once for each element that’s selected, just as you can only read a
particular portion of an input stream once. They can only move forward. There is
a special constructor to define the past-the-end value. In summary, you can
dereference it for reading (once only for each value), and move it
forward.
This is the complement of an input iterator, but for writing
rather than reading. The only predefined implementations of output iterators are
ostream_iterator and ostreambuf_iterator, to write to an
ostream, and the less-commonly-used raw_storage_iterator. Again,
these can only be dereferenced once for each written value, and they can only
move forward. There is no concept of a terminal past-the-end value for an output
iterator. Summarizing, you can dereference it for writing (once only for each
value) and move it forward.
The forward iterator contains all the functionality of both
the input iterator and the output iterator, plus you can dereference an iterator
location multiple times, so you can read and write to a value multiple times. As
the name implies, you can only move forward. There are no predefined iterators
that are only forward iterators.
The bidirectional iterator has all the functionality of the
forward iterator, and in addition it can be moved backwards one location at a
time using operator--.
Finally, the random-access iterator has all the functionality
of the bidirectional iterator plus all the functionality of a pointer (a pointer
is a random-access iterator). Basically, anything you can do with a
pointer you can do with a random-access iterator, including indexing with
operator[ ], adding integral values to a pointer to move it forward or
backward by a number of locations, and comparing one iterator to another with
<, >=, etc.
Why do you care about this categorization? When you’re
just using containers in a straightforward way (for example, just hand-coding
all the operations you want to perform on the objects in the container) it
usually doesn’t impact you too much. Things either work or they
don’t. The iterator categories become important when:
The STL has a predefined set of iterator classes that can be
quite handy. For example, you’ve already seen reverse_iterator
(produced by calling rbegin( ) and rend( ) for all the
basic containers).
The insertion iterators are necessary because some of
the STL algorithms – copy( ) for example – use the
assignment operator= in order to place objects in the destination
container. This is a problem when you’re using the algorithm to
fill the container rather than to overwrite items that are already in the
destination container. That is, when the space isn’t already there. What
the insert iterators do is change the implementation of the operator= so
that instead of doing an assignment, it calls a “push” or
“insert” function for that container, thus causing it to allocate
new space. The constructors for both back_insert_iterator and
front_insert_iterator take a basic sequence container object
(vector, deque or list) as their argument and produce an
iterator that calls push_back( ) or push_front( ),
respectively, to perform assignment. The shorthand functions
back_inserter( ) and front_inserter( ) produce the same
objects with a little less typing. Since all the basic sequence containers
support push_back( ), you will probably find yourself using
back_inserter( ) with some regularity.
The insert_iterator allows you to insert elements in
the middle of the sequence, again replacing the meaning of operator=, but
this time with insert( ) instead of one of the “push”
functions. The insert( ) member function requires an iterator
indicating the place to insert before, so the insert_iterator requires
this iterator in addition to the container object. The shorthand function
inserter( ) produces the same object.
The following example shows the use of the different types of
inserters:
//: C04:Inserters.cpp
// Different types of iterator inserters
#include <iostream>
#include <vector>
#include <deque>
#include <list>
#include <iterator>
using namespace std;
int a[] = { 1, 3, 5, 7, 11, 13, 17, 19, 23 };
template<class Cont>
void frontInsertion(Cont& ci) {
copy(a, a + sizeof(a)/sizeof(int),
front_inserter(ci));
copy(ci.begin(), ci.end(),
ostream_iterator<int>(cout, " "));
cout << endl;
}
template<class Cont>
void backInsertion(Cont& ci) {
copy(a, a + sizeof(a)/sizeof(int),
back_inserter(ci));
copy(ci.begin(), ci.end(),
ostream_iterator<int>(cout, " "));
cout << endl;
}
template<class Cont>
void midInsertion(Cont& ci) {
typename Cont::iterator it = ci.begin();
it++; it++; it++;
copy(a, a + sizeof(a)/(sizeof(int) * 2),
inserter(ci, it));
copy(ci.begin(), ci.end(),
ostream_iterator<int>(cout, " "));
cout << endl;
}
int main() {
deque<int> di;
list<int> li;
vector<int> vi;
// Can't use a front_inserter() with vector
frontInsertion(di);
frontInsertion(li);
di.clear();
li.clear();
backInsertion(vi);
backInsertion(di);
backInsertion(li);
midInsertion(vi);
midInsertion(di);
midInsertion(li);
} ///:~
Since vector does not support
push_front( ), it cannot produce a front_insertion_iterator.
However, you can see that vector does support the other two types of
insertion (even though, as you shall see later, insert( ) is not a
very efficient operation for vector).
You’ve already seen some use of the
ostream_iterator (an output iterator) in conjunction with
copy( ) to place the contents of a container on an output stream.
There is a corresponding istream_iterator (an input iterator) which
allows you to “iterate” a set of objects of a specified type from an
input stream. An important difference between ostream_iterator and
istream_iterator comes from the fact that an output stream doesn’t
have any concept of an “end,” since you can always just keep writing
more elements. However, an input stream eventually terminates (for example, when
you reach the end of a file) so there needs to be a way to represent that. An
istream_iterator has two constructors, one that takes an istream
and produces the iterator you actually read from, and the other which is the
default constructor and produces an object which is the past-the-end sentinel.
In the following program this object is named end:
//: C04:StreamIt.cpp
// Iterators for istreams and ostreams
#include "../require.h"
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;
int main() {
ifstream in("StreamIt.cpp");
assure(in, "StreamIt.cpp");
istream_iterator<string> init(in), end;
ostream_iterator<string> out(cout, "\n");
vector<string> vs;
copy(init, end, back_inserter(vs));
copy(vs.begin(), vs.end(), out);
*out++ = vs[0];
*out++ = "That's all, folks!";
} ///:~
When in runs out of input (in this case when the end of
the file is reached) then init becomes equivalent to end and the
copy( ) terminates.
Because out is an
ostream_iterator<string>, you can simply assign any string
object to the dereferenced iterator using operator= and that
string will be placed on the output stream, as seen in the two
assignments to out. Because out is defined with a newline as its
second argument, these assignments also cause a newline to be inserted along
with each assignment.
While it is possible to create an
istream_iterator<char> and ostream_iterator<char>,
these actually parse the input and thus will for example automatically
eat whitespace (spaces, tabs and newlines), which is not desirable if you want
to manipulate an exact representation of an istream. Instead, you can use
the special iterators istreambuf_iterator and ostreambuf_iterator,
which are designed strictly to move
characters[17]. Although these are templates,
the only template arguments they will accept are either char or
wchar_t (for wide characters). The following example allows you to
compare the behavior of the stream iterators vs. the streambuf
iterators:
//: C04:StreambufIterator.cpp
// istreambuf_iterator & ostreambuf_iterator
#include "../require.h"
#include <iostream>
#include <fstream>
#include <iterator>
#include <algorithm>
using namespace std;
int main() {
ifstream in("StreambufIterator.cpp");
assure(in, "StreambufIterator.cpp");
// Exact representation of stream:
istreambuf_iterator<char> isb(in), end;
ostreambuf_iterator<char> osb(cout);
while(isb != end)
*osb++ = *isb++; // Copy 'in' to cout
cout << endl;
ifstream in2("StreambufIterator.cpp");
// Strips white space:
istream_iterator<char> is(in2), end2;
ostream_iterator<char> os(cout);
while(is != end2)
*os++ = *is++;
cout << endl;
} ///:~
The stream iterators use the parsing defined by
istream::operator>>, which is probably not
what you want if you
are parsing characters directly – it’s fairly rare that you would
want all the whitespace stripped out of your character stream. You’ll
virtually always want to use a streambuf iterator when using characters and
streams, rather than a stream iterator. In addition,
istream::operator>> adds significant overhead for each operation,
so it is only appropriate for higher-level operations such as parsing
floating-point numbers.[18]
This is a little more esoteric and is generally used in the
implementation of other Standard Library functions, but it is nonetheless
interesting. The raw_storage_iterator is defined in
<algorithm> and is an output iterator. It is provided to enable
algorithms to store their results into uninitialized memory. The interface is
quite simple: the constructor takes an output iterator that is pointing to the
raw memory (thus it is typically a pointer) and the operator= assigns an
object into that raw memory. The template parameters are the type of the output
iterator pointing to the raw storage, and the type of object that will be
stored. Here’s an example which creates Noisy objects (you’ll
be introduced to the Noisy class shortly; it’s not necessary to
know its details for this example):
//: C04:RawStorageIterator.cpp
// Demonstrate the raw_storage_iterator
#include "Noisy.h"
#include <iostream>
#include <iterator>
#include <algorithm>
using namespace std;
int main() {
const int quantity = 10;
// Create raw storage and cast to desired type:
Noisy* np =
(Noisy*)new char[quantity * sizeof(Noisy)];
raw_storage_iterator<Noisy*, Noisy> rsi(np);
for(int i = 0; i < quantity; i++)
*rsi++ = Noisy(); // Place objects in storage
cout << endl;
copy(np, np + quantity,
ostream_iterator<Noisy>(cout, " "));
cout << endl;
// Explicit destructor call for cleanup:
for(int j = 0; j < quantity; j++)
(&np[j])->~Noisy();
// Release raw storage:
delete (char*)np;
} ///:~
To make the raw_storage_iterator template happy, the
raw storage must be of the same type as the objects you’re creating.
That’s why the pointer from the new array of char is cast to a
Noisy*. The assignment operator forces the objects into the raw storage
using the copy-constructor. Note that the explicit destructor call must be made
for proper cleanup, and this also allows the objects to be deleted one at a time
during container manipulation.
If you take a step back from the STL containers you’ll
see that there are really only two types of container: sequences
(including vector, list, deque, stack, queue,
and priority_queue) and associations (including set,
multiset, map and multimap). The sequences keep the objects
in whatever sequence that you establish (either by pushing the objects on the
end or inserting them in the middle).
Since all the sequence containers have the same basic goal (to
maintain your order) they seem relatively interchangeable. However, they differ
in the efficiency of their operations, so if you are going to manipulate a
sequence in a particular fashion you can choose the appropriate container for
those types of manipulations. The “basic” sequence containers are
vector, list and deque – these actually have
fleshed-out implementations, while stack, queue and
priority_queue are built on top of the basic sequences, and represent
more specialized uses rather than differences in underlying structure
(stack, for example, can be implemented using a deque,
vector or list).
So far in this book I have been using vector as a
catch-all container. This was acceptable because I’ve only used the
simplest and safest operations, primarily push_back( ) and
operator[ ]. However, when you start making more sophisticated uses of
containers it becomes important to know more about their underlying
implementations and behavior, so you can make the right choices (and, as
you’ll see, stay out of trouble).
Using a template, the following example shows the operations
that all the basic sequences (vector, deque or list)
support. As you shall learn in the sections on the specific sequence containers,
not all of these operations make sense for each basic sequence, but they are
supported.
//: C04:BasicSequenceOperations.cpp
// The operations available for all the
// basic sequence Containers.
#include <iostream>
#include <vector>
#include <deque>
#include <list>
using namespace std;
template<typename Container>
void print(Container& c, char* s = "") {
cout << s << ":" << endl;
if(c.empty()) {
cout << "(empty)" << endl;
return;
}
typename Container::iterator it;
for(it = c.begin(); it != c.end(); it++)
cout << *it << " ";
cout << endl;
cout << "size() " << c.size()
<< " max_size() "<< c.max_size()
<< " front() " << c.front()
<< " back() " << c.back() << endl;
}
template<typename ContainerOfInt>
void basicOps(char* s) {
cout << "------- " << s << " -------" << endl;
typedef ContainerOfInt Ci;
Ci c;
print(c, "c after default constructor");
Ci c2(10, 1); // 10 elements, values all 1
print(c2, "c2 after constructor(10,1)");
int ia[] = { 1, 3, 5, 7, 9 };
const int iasz = sizeof(ia)/sizeof(*ia);
// Initialize with begin & end iterators:
Ci c3(ia, ia + iasz);
print(c3, "c3 after constructor(iter,iter)");
Ci c4(c2); // Copy-constructor
print(c4, "c4 after copy-constructor(c2)");
c = c2; // Assignment operator
print(c, "c after operator=c2");
c.assign(10, 2); // 10 elements, values all 2
print(c, "c after assign(10, 2)");
// Assign with begin & end iterators:
c.assign(ia, ia + iasz);
print(c, "c after assign(iter, iter)");
cout << "c using reverse iterators:" << endl;
typename Ci::reverse_iterator rit = c.rbegin();
while(rit != c.rend())
cout << *rit++ << " ";
cout << endl;
c.resize(4);
print(c, "c after resize(4)");
c.push_back(47);
print(c, "c after push_back(47)");
c.pop_back();
print(c, "c after pop_back()");
typename Ci::iterator it = c.begin();
it++; it++;
c.insert(it, 74);
print(c, "c after insert(it, 74)");
it = c.begin();
it++;
c.insert(it, 3, 96);
print(c, "c after insert(it, 3, 96)");
it = c.begin();
it++;
c.insert(it, c3.begin(), c3.end());
print(c, "c after insert("
"it, c3.begin(), c3.end())");
it = c.begin();
it++;
c.erase(it);
print(c, "c after erase(it)");
typename Ci::iterator it2 = it = c.begin();
it++;
it2++; it2++; it2++; it2++; it2++;
c.erase(it, it2);
print(c, "c after erase(it, it2)");
c.swap(c2);
print(c, "c after swap(c2)");
c.clear();
print(c, "c after clear()");
}
int main() {
basicOps<vector<int> >("vector");
basicOps<deque<int> >("deque");
basicOps<list<int> >("list");
} ///:~
The first function template, print( ),
demonstrates the basic information you can get from any sequence container:
whether it’s empty, its current size, the size of the largest possible
container, the element at the beginning and the element at the end. You can also
see that every container has begin( ) and end( ) methods
that return iterators.
The basicOps( ) function tests everything else
(and in turn calls print( )), including a variety of constructors:
default, copy-constructor, quantity and initial value, and beginning and ending
iterators. There’s an assignment operator= and two kinds of
assign( ) member functions, one which takes a quantity and initial
value and the other which take a beginning and ending iterator.
All the basic sequence containers are reversible containers,
as shown by the use of the rbegin( ) and rend( ) member
functions. A sequence container can be resized, and the entire contents of the
container can be removed with clear( ).
Using an iterator to indicate where you want to start
inserting into any sequence container, you can insert( ) a single
element, a number of elements that all have the same value, and a group of
elements from another container using the beginning and ending iterators of that
group.
To erase( ) a single element from the middle, use
an iterator; to erase( ) a range of elements, use a pair of
iterators. Notice that since a list only supports bidirectional
iterators, all the iterator motion must be performed with increments and
decrements (if the containers were limited to vector and deque,
which produce random-access iterators, then operator+ and
operator- could have been used to move the iterators in big
jumps).
Although both list and deque support
push_front( ) and pop_front( ), vector does not,
so the only member functions that work with all three are
push_back( ) and pop_back( ).
The naming of the member function swap( ) is a
little confusing, since there’s also a non-member swap( )
algorithm that switches two elements of a container. The member
swap( ), however, swaps everything in one container for
another (if the containers hold the same type), effectively swapping the
containers themselves. There’s also a non-member version of this
function.
The following sections on the sequence containers discuss the
particulars of each type of container.
The vector is intentionally made to look like a
souped-up array, since it has array-style indexing but also can expand
dynamically. vector is so fundamentally useful that it was introduced in
a very primitive way early in this book, and used quite regularly in previous
examples. This section will give a more in-depth look at
vector.
To achieve maximally-fast indexing and iteration, the
vector maintains its storage as a single contiguous array of objects.
This is a critical point to observe in understanding the behavior of
vector. It means that indexing and iteration are lighting-fast, being
basically the same as indexing and iterating over an array of objects. But it
also means that inserting an object anywhere but at the end (that is, appending)
is not really an acceptable operation for a vector. It also means that
when a vector runs out of pre-allocated storage, in order to maintain its
contiguous array it must allocate a whole new (larger) chunk of storage
elsewhere and copy the objects to the new storage. This has a number of
unpleasant side effects.
A vector starts by grabbing a block of storage, as if
it’s taking a guess at how many objects you plan to put in it. As long as
you don’t try to put in more objects than can be held in the initial block
of storage, everything is very rapid and efficient (note that if you do
know how many objects to expect, you can pre-allocate storage using
reserve( )). But eventually you will put in one too many objects
and, unbeknownst to you, the vector responds by:
For complex objects, this copy-construction and
destruction can end up being very expensive if you overfill your vector a lot.
To see what happens when you’re filling a vector, here is a class
that prints out information about its creations, destructions, assignments and
copy-constructions:
//: C04:Noisy.h
// A class to track various object activities
#ifndef NOISY_H
#define NOISY_H
#include <iostream>
class Noisy {
static long create, assign, copycons, destroy;
long id;
public:
Noisy() : id(create++) {
std::cout << "d[" << id << "]";
}
Noisy(const Noisy& rv) : id(rv.id) {
std::cout << "c[" << id << "]";
copycons++;
}
Noisy& operator=(const Noisy& rv) {
std::cout << "(" << id << ")=[" <<
rv.id << "]";
id = rv.id;
assign++;
return *this;
}
friend bool
operator<(const Noisy& lv, const Noisy& rv) {
return lv.id < rv.id;
}
friend bool
operator==(const Noisy& lv, const Noisy& rv) {
return lv.id == rv.id;
}
~Noisy() {
std::cout << "~[" << id << "]";
destroy++;
}
friend std::ostream&
operator<<(std::ostream& os, const Noisy& n) {
return os << n.id;
}
friend class NoisyReport;
};
struct NoisyGen {
Noisy operator()() { return Noisy(); }
};
// A singleton. Will automatically report the
// statistics as the program terminates:
class NoisyReport {
static NoisyReport nr;
NoisyReport() {} // Private constructor
public:
~NoisyReport() {
std::cout << "\n-------------------\n"
<< "Noisy creations: " << Noisy::create
<< "\nCopy-Constructions: "
<< Noisy::copycons
<< "\nAssignments: " << Noisy::assign
<< "\nDestructions: " << Noisy::destroy
<< std::endl;
}
};
// Because of these this file can only be used
// in simple test situations. Move them to a
// .cpp file for more complex programs:
long Noisy::create = 0, Noisy::assign = 0,
Noisy::copycons = 0, Noisy::destroy = 0;
NoisyReport NoisyReport::nr;
#endif // NOISY_H ///:~
Each Noisy object has its own identifier, and there are
static variables to keep track of all the creations, assignments (using
operator=), copy-constructions and destructions. The id is
initialized using the create counter inside the default constructor; the
copy-constructor and assignment operator take their id values from the
rvalue. Of course, with operator= the lvalue is already an initialized
object so the old value of id is printed before it is overwritten with
the id from the rvalue.
In order to support certain operations like sorting and
searching (which are used implicitly by some of the containers), Noisy
must have an operator< and operator==. These simply compare the
id values. The operator<< for ostream follows the
standard form and simply prints the id.
NoisyGen produces a function object (since it has an
operator( )) that is used to automatically generate Noisy
objects during testing.
NoisyReport is a type of class called a
singleton, which is a “design pattern” (these are covered
more fully in Chapter XX). Here, the goal is to make sure there is one and only
one NoisyReport object, because it is responsible for printing out the
results at program termination. It has a private constructor so no one
else can make a NoisyReport object, and a single static instance of
NoisyReport called nr. The only executable statements are in the
destructor, which is called as the program exits and the static destructors are
called; this destructor prints out the statistics captured by the static
variables in Noisy.
The one snag to this header file is the inclusion of the
definitions for the statics at the end. If you include this header in
more than one place in your project, you’ll get multiple-definition errors
at link time. Of course, you can put the static definitions in a
separate cpp file and link it in, but that is less convenient, and since
Noisy is just intended for quick-and-dirty experiments the header file
should be reasonable for most situations.
Using Noisy.h, the following program will show the
behaviors that occur when a vector overflows its currently allocated
storage:
//: C04:VectorOverflow.cpp
// Shows the copy-construction and destruction
// That occurs when a vector must reallocate
// (It maintains a linear array of elements)
#include "Noisy.h"
#include "../require.h"
#include <vector>
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
int size = 1000;
if(argc >= 2) size = atoi(argv[1]);
vector<Noisy> vn;
Noisy n;
for(int i = 0; i < size; i++)
vn.push_back(n);
cout << "\n cleaning up \n";
} ///:~
You can either use the default value of 1000, or use your own
value by putting it on the command-line.
When you run this program, you’ll see a single default
constructor call (for n), then a lot of copy-constructor calls, then some
destructor calls, then some more copy-constructor calls, and so on. When the
vector runs out of space in the linear array of bytes it has allocated, it must
(to maintain all the objects in a linear array, which is an essential part of
its job) get a bigger piece of storage and move everything over, copying first
and then destroying the old objects. You can imagine that if you store a lot of
large and complex objects, this process could rapidly become
prohibitive.
There are two solutions to this problem. The nicest one
requires that you know beforehand how many objects you’re going to make.
In that case you can use reserve( ) to tell the vector how much
storage to pre-allocate, thus eliminating all the copies and destructions and
making everything very fast (especially random access to the objects with
operator[ ]). Note that the use of reserve( ) is different
from using the vector constructor with an integral first argument; the
latter initializes each element using the default copy-constructor.
However, in the more general case you won’t know how
many objects you’ll need. If vector reallocations are slowing
things down, you can change sequence containers. You could use a list,
but as you’ll see, the deque allows speedy insertions at
either end of the sequence, and never needs to copy or destroy objects as it
expands its storage. The deque also allows random access with
operator[ ], but it’s not quite as fast as vector’s
operator[ ]. So in the case where you’re creating all your
objects in one part of the program and randomly accessing them in another, you
may find yourself filling a deque, then creating a vector from the
deque and using the vector for rapid indexing. Of course, you
don’t want to program this way habitually, just be aware of these issues
(avoid premature optimization).
There is a darker side to vector’s reallocation
of memory, however. Because vector keeps its objects in a nice, neat
array (allowing, for one thing, maximally-fast random access), the iterators
used by vector are generally just pointers. This is a good thing –
of all the sequence containers, these pointers allow the fastest selection and
manipulation. However, consider what happens when you’re holding onto an
iterator (i.e. a pointer) and then you add the one additional object that causes
the vector to reallocate storage and move it elsewhere. Your pointer is
now pointing off into nowhere:
//: C04:VectorCoreDump.cpp
// How to break a program using a vector
#include <vector>
#include <iostream>
using namespace std;
int main() {
vector<int> vi(10, 0);
ostream_iterator<int> out(cout, " ");
copy(vi.begin(), vi.end(), out);
vector<int>::iterator i = vi.begin();
cout << "\n i: " << long(i) << endl;
*i = 47;
copy(vi.begin(), vi.end(), out);
// Force it to move memory (could also just add
// enough objects):
vi.resize(vi.capacity() + 1);
// Now i points to wrong memory:
cout << "\n i: " << long(i) << endl;
cout << "vi.begin(): " << long(vi.begin());
*i = 48; // Access violation
} ///:~
If your program is breaking mysteriously, look for places
where you hold onto an iterator while adding more objects to a vector.
You’ll need to get a new iterator after adding elements, or use
operator[ ] instead for element selections. If you combine the above
observation with the awareness of the potential expense of adding new objects to
a vector, you may conclude that the safest way to use one is to fill it
up all at once (ideally, knowing first how many objects you’ll need) and
then just use it (without adding more objects) elsewhere in the program. This is
the way vector has been used in the book up to this point.
You may observe that using vector as the
“basic” container in the earlier chapters of this book may not be
the best choice in all cases. This is a fundamental issue in containers, and in
data structures in general: the “best” choice varies according to
the way the container is used. The reason vector has been the
“best” choice up until now is that it looks a lot like an array, and
was thus familiar and easy for you to adopt. But from now on it’s also
worth thinking about other issues when choosing
containers.
The vector is most efficient if:
It is possible to insert and
erase elements from the middle of a vector using an iterator, but the
following program demonstrates what a bad idea it is:
//: C04:VectorInsertAndErase.cpp
// Erasing an element from a vector
#include "Noisy.h"
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main() {
vector<Noisy> v;
v.reserve(11);
cout << "11 spaces have been reserved" << endl;
generate_n(back_inserter(v), 10, NoisyGen());
ostream_iterator<Noisy> out(cout, " ");
cout << endl;
copy(v.begin(), v.end(), out);
cout << "Inserting an element:" << endl;
vector<Noisy>::iterator it =
v.begin() + v.size() / 2; // Middle
v.insert(it, Noisy());
cout << endl;
copy(v.begin(), v.end(), out);
cout << "\nErasing an element:" << endl;
// Cannot use the previous value of it:
it = v.begin() + v.size() / 2;
v.erase(it);
cout << endl;
copy(v.begin(), v.end(), out);
cout << endl;
} ///:~
When you run the program you’ll see that the call to
reserve( ) really does only allocate storage – no constructors
are called. The generate_n( ) call is pretty busy: each call to
NoisyGen::operator( ) results in a construction, a copy-construction
(into the vector) and a destruction of the temporary. But when an object
is inserted into the vector in the middle, it must shove everything down
to maintain the linear array and – since there is enough space – it
does this with the assignment operator (if the argument of
reserve( ) is 10 instead of eleven then it would have to reallocate
storage). When an object is erased from the vector, the assignment
operator is once again used to move everything up to cover the place that is
being erased (notice that this requires that the assignment operator properly
cleans up the lvalue). Lastly, the object on the end of the array is
deleted.
You can imagine how enormous the overhead can become if
objects are inserted and removed from the middle of a vector if the
number of elements is large and the objects are complicated. It’s
obviously a practice to avoid.
The deque (double-ended-queue, pronounced
“deck”) is the basic sequence container optimized for adding and
removing elements from either end. It also allows for reasonably fast random
access – it has an operator[ ] like vector. However, it does
not have vector’s constraint of keeping everything in a single
sequential block of memory. Instead, deque uses multiple blocks of
sequential storage (keeping track of all the blocks and their order in a mapping
structure). For this reason the overhead for a deque to add or remove
elements at either end is very low. In addition, it never needs to copy and
destroy contained objects during a new storage allocation (like vector
does) so it is far more efficient than vector if you are adding an
unknown quantity of objects. This means that vector is the best choice
only if you have a pretty good idea of how many objects you need. In addition,
many of the programs shown earlier in this book that use vector and
push_back( ) might be more efficient with a deque. The
interface to deque is only slightly different from a vector (deque
has a push_front( ) and pop_front( ) while vector
does not, for example) so converting code from using vector to using
deque is almost trivial. Consider StringVector.cpp, which can be
changed to use deque by replacing the word “vector” with
“deque” everywhere. The following program adds parallel deque
operations to the vector operations in StringVector.cpp, and
performs timing comparisons:
//: C04:StringDeque.cpp
// Converted from StringVector.cpp
#include "../require.h"
#include <string>
#include <deque>
#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>
#include <sstream>
#include <ctime>
using namespace std;
int main(int argc, char* argv[]) {
requireArgs(argc, 1);
ifstream in(argv[1]);
assure(in, argv[1]);
vector<string> vstrings;
deque<string> dstrings;
string line;
// Time reading into vector:
clock_t ticks = clock();
while(getline(in, line))
vstrings.push_back(line);
ticks = clock() - ticks;
cout << "Read into vector: " << ticks << endl;
// Repeat for deque:
ifstream in2(argv[1]);
assure(in2, argv[1]);
ticks = clock();
while(getline(in2, line))
dstrings.push_back(line);
ticks = clock() - ticks;
cout << "Read into deque: " << ticks << endl;
// Now compare indexing:
ticks = clock();
for(int i = 0; i < vstrings.size(); i++) {
ostringstream ss;
ss << i;
vstrings[i] = ss.str() + ": " + vstrings[i];
}
ticks = clock() - ticks;
cout << "Indexing vector: " << ticks << endl;
ticks = clock();
for(int j = 0; j < dstrings.size(); j++) {
ostringstream ss;
ss << j;
dstrings[j] = ss.str() + ": " + dstrings[j];
}
ticks = clock() - ticks;
cout << "Indexing deqeue: " << ticks << endl;
// Compare iteration
ofstream tmp1("tmp1.tmp"), tmp2("tmp2.tmp");
ticks = clock();
copy(vstrings.begin(), vstrings.end(),
ostream_iterator<string>(tmp1, "\n"));
ticks = clock() - ticks;
cout << "Iterating vector: " << ticks << endl;
ticks = clock();
copy(dstrings.begin(), dstrings.end(),
ostream_iterator<string>(tmp2, "\n"));
ticks = clock() - ticks;
cout << "Iterating deqeue: " << ticks << endl;
} ///:~
Knowing now what you do about the inefficiency of adding
things to vector because of storage reallocation, you may expect dramatic
differences between the two. However, on a 1.7 Megabyte text file one
compiler’s program produced the following (measured in platform/compiler
specific clock ticks, not seconds):
Read into vector: 8350 Read into deque: 7690 Indexing vector: 2360 Indexing deqeue: 2480 Iterating vector: 2470 Iterating deqeue: 2410
A different compiler and platform roughly agreed with this.
It’s not so dramatic, is it? This points out some important
issues:
Of course, this doesn’t mean you
shouldn’t use a deque rather than a vector when you know
that an uncertain number of objects will be pushed onto the end of the
container. On the contrary, you should – when you’re tuning for
performance. But you should also be aware that performance issues are usually
not where you think they are, and the only way to know for sure where your
bottlenecks are is by testing. Later in this chapter there will be a more
“pure” comparison of performance between vector, deque
and list.
Sometimes you need the behavior or efficiency of one kind of
container for one part of your program, and a different container’s
behavior or efficiency in another part of the program. For example, you may need
the efficiency of a deque when adding objects to the container but the
efficiency of a vector when indexing them. Each of the basic sequence
containers (vector, deque and list) has a two-iterator
constructor (indicating the beginning and ending of the sequence to read from
when creating a new object) and an assign( ) member function to read
into an existing container, so you can easily move objects from one sequence
container to another.
The following example reads objects into a deque and
then converts to a vector:
//: C04:DequeConversion.cpp
// Reading into a Deque, converting to a vector
#include "Noisy.h"
#include <deque>
#include <vector>
#include <iostream>
#include <algorithm>
#include <cstdlib>
using namespace std;
int main(int argc, char* argv[]) {
int size = 25;
if(argc >= 2) size = atoi(argv[1]);
deque<Noisy> d;
generate_n(back_inserter(d), size, NoisyGen());
cout << "\n Converting to a vector(1)" << endl;
vector<Noisy> v1(d.begin(), d.end());
cout << "\n Converting to a vector(2)" << endl;
vector<Noisy> v2;
v2.reserve(d.size());
v2.assign(d.begin(), d.end());
cout << "\n Cleanup" << endl;
} ///:~
You can try various sizes, but you should see that it makes no
difference – the objects are simply copy-constructed into the new
vectors. What’s interesting is that v1 does not cause
multiple allocations while building the vector, no matter how many
elements you use. You might initially think that you must follow the process
used for v2 and preallocate the storage to prevent messy reallocations,
but the constructor used for v1 determines the memory need ahead of time
so this is unnecessary.
It’s illuminating to see what happens with a
deque when it overflows a block of storage, in contrast with
VectorOverflow.cpp:
//: C04:DequeOverflow.cpp
// A deque is much more efficient than a vector
// when pushing back a lot of elements, since it
// doesn't require copying and destroying.
#include "Noisy.h"
#include <deque>
#include <cstdlib>
using namespace std;
int main(int argc, char* argv[]) {
int size = 1000;
if(argc >= 2) size = atoi(argv[1]);
deque<Noisy> dn;
Noisy n;
for(int i = 0; i < size; i++)
dn.push_back(n);
cout << "\n cleaning up \n";
} ///:~
Here you will never see any destructors before the words
“cleaning up” appear. Since the deque allocates all its
storage in blocks instead of a contiguous array like vector, it never
needs to move existing storage (thus no additional copy-constructions and
destructions occur). It simply allocates a new block. For the same reason, the
deque can just as efficiently add elements to the beginning of the
sequence, since if it runs out of storage it (again) just allocates a new block
for the beginning. Insertions in the middle of a deque, however, could be
even messier than for vector (but not as costly).
Because a deque never moves its storage, a held
iterator never becomes invalid when you add new things to either end of a deque,
as it was demonstrated to do with vector (in VectorCoreDump.cpp).
However, it’s still possible (albeit harder) to do bad things:
//: C04:DequeCoreDump.cpp
// How to break a program using a deque
#include <queue>
#include <iostream>
using namespace std;
int main() {
deque<int> di(100, 0);
// No problem iterating from beginning to end,
// even though it spans multiple blocks:
copy(di.begin(), di.end(),
ostream_iterator<int>(cout, " "));
deque<int>::iterator i = // In the middle:
di.begin() + di.size() / 2;;
// Walk the iterator forward as you perform
// a lot of insertions in the middle:
for(int j = 0; j < 1000; j++) {
cout << j << endl;
di.insert(i++, 1); // Eventually breaks
}
} ///:~
Of course, there are two things here that you wouldn’t
normally do with a deque: first, elements are inserted in the middle,
which deque allows but isn’t designed for. Second, calling
insert( ) repeatedly with the same iterator would not ordinarily
cause an access violation, but the iterator is walked forward after each
insertion. I’m guessing it eventually walks off the end of a block, but
I’m not sure what actually causes the problem.
If you stick to what deque is best at –
insertions and removals from either end, reasonably rapid traversals and fairly
fast random-access using operator[ ] – you’ll be in good
shape.
Both vector and deque provide two ways to
perform random access of their elements: the operator[ ], which
you’ve seen already, and at( ), which checks the boundaries of
the container that’s being indexed and throws an exception if you go out
of bounds. It does cost more to use at( ):
//: C04:IndexingVsAt.cpp
// Comparing "at()" to operator[]
#include "../require.h"
#include <vector>
#include <deque>
#include <iostream>
#include <ctime>
using namespace std;
int main(int argc, char* argv[]) {
requireMinArgs(argc, 1);
long count = 1000;
int sz = 1000;
if(argc >= 2) count = atoi(argv[1]);
if(argc >= 3) sz = atoi(argv[2]);
vector<int> vi(sz);
clock_t ticks = clock();
for(int i1 = 0; i1 < count; i1++)
for(int j = 0; j < sz; j++)
vi[j];
cout << "vector[]" << clock() - ticks << endl;
ticks = clock();
for(int i2 = 0; i2 < count; i2++)
for(int j = 0; j < sz; j++)
vi.at(j);
cout << "vector::at()" << clock()-ticks <<endl;
deque<int> di(sz);
ticks = clock();
for(int i3 = 0; i3 < count; i3++)
for(int j = 0; j < sz; j++)
di[j];
cout << "deque[]" << clock() - ticks << endl;
ticks = clock();
for(int i4 = 0; i4 < count; i4++)
for(int j = 0; j < sz; j++)
di.at(j);
cout << "deque::at()" << clock()-ticks <<endl;
// Demonstrate at() when you go out of bounds:
di.at(vi.size() + 1);
} ///:~
As you’ll learn in the exception-handling chapter,
different systems may handle the uncaught exception in different ways, but
you’ll know one way or another that something went wrong with the program
when using at( ), whereas it’s possible to go blundering ahead
using operator[ ].
A list is implemented as a doubly-linked list and is
thus designed for rapid insertion and removal of elements in the middle of the
sequence (whereas for vector and deque this is a much more costly
operation). A list is so slow when randomly accessing elements that it does not
have an operator[ ]. It’s best used when you’re traversing a
sequence, in order, from beginning to end (or end to beginning) rather than
choosing elements randomly from the middle. Even then the traversal is
significantly slower than either a vector or a deque, but if you
aren’t doing a lot of traversals that won’t be your
bottleneck.
Another thing to be aware of with a list is the memory
overhead of each link, which requires a forward and backward pointer on top of
the storage for the actual object. Thus a list is a better choice when
you have larger objects that you’ll be inserting and removing from the
middle of the list. It’s better not to use a list if you
think you might be traversing it a lot, looking for objects, since the amount of
time it takes to get from the beginning of the list – which is the
only place you can start unless you’ve already got an iterator to
somewhere you know is closer to your destination – to the object of
interest is proportional to the number of objects between the beginning and that
object.
The objects in a list never move after they are
created; “moving” a list element means changing the links, but never
copying or assigning the actual objects. This means that a held iterator never
moves when you add new things to a list as it was demonstrated to do in
vector. Here’s an example using the Noisy class:
//: C04:ListStability.cpp
// Things don't move around in lists
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
list<Noisy> l;
ostream_iterator<Noisy> out(cout, " ");
generate_n(back_inserter(l), 25, NoisyGen());
cout << "\n Printing the list:" << endl;
copy(l.begin(), l.end(), out);
cout << "\n Reversing the list:" << endl;
l.reverse();
copy(l.begin(), l.end(), out);
cout << "\n Sorting the list:" << endl;
l.sort();
copy(l.begin(), l.end(), out);
cout << "\n Swapping two elements:" << endl;
list<Noisy>::iterator it1, it2;
it1 = it2 = l.begin();
it2++;
swap(*it1, *it2);
cout << endl;
copy(l.begin(), l.end(), out);
cout << "\n Using generic reverse(): " << endl;
reverse(l.begin(), l.end());
cout << endl;
copy(l.begin(), l.end(), out);
cout << "\n Cleanup" << endl;
} ///:~
Operations as seemingly radical as reversing and sorting the
list require no copying of objects, because instead of moving the objects, the
links are simply changed. However, notice that sort( ) and
reverse( ) are member functions of list, so they have special
knowledge of the internals of list and can perform the pointer movement
instead of copying. On the other hand, the swap( ) function is a
generic algorithm, and doesn’t know about list in particular and so
it uses the copying approach for swapping two elements. There are also generic
algorithms for sort( ) and reverse( ), but if you try to
use these you’ll discover that the generic reverse( ) performs
lots of copying and destruction (so you should never use it with a list)
and the generic sort( ) simply doesn’t work because it
requires random-access iterators that list doesn’t provide (a
definite benefit, since this would certainly be an expensive way to sort
compared to list’s own sort( )). The generic
sort( ) and reverse( ) should only be used with arrays,
vectors and deques.
If you have large and complex objects you may want to choose a
list first, especially if construction, destruction, copy-construction
and assignment are expensive and if you are doing things like sorting the
objects or otherwise reordering them a
lot.
The list has some special operations that are built-in
to make the best use of the structure of the list. You’ve already
seen reverse( ) and sort( ), and here are some of the
others in use:
//: C04:ListSpecialFunctions.cpp
#include "Noisy.h"
#include <list>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");
void print(list<Noisy>& ln, char* comment = "") {
cout << "\n" << comment << ":\n";
copy(ln.begin(), ln.end(), out);
cout << endl;
}
int main() {
typedef list<Noisy> LN;
LN l1, l2, l3, l4;
generate_n(back_inserter(l1), 6, NoisyGen());
generate_n(back_inserter(l2), 6, NoisyGen());
generate_n(back_inserter(l3), 6, NoisyGen());
generate_n(back_inserter(l4), 6, NoisyGen());
print(l1, "l1"); print(l2, "l2");
print(l3, "l3"); print(l4, "l4");
LN::iterator it1 = l1.begin();
it1++; it1++; it1++;
l1.splice(it1, l2);
print(l1, "l1 after splice(it1, l2)");
print(l2, "l2 after splice(it1, l2)");
LN::iterator it2 = l3.begin();
it2++; it2++; it2++;
l1.splice(it1, l3, it2);
print(l1, "l1 after splice(it1, l3, it2)");
LN::iterator it3 = l4.begin(), it4 = l4.end();
it3++; it4--;
l1.splice(it1, l4, it3, it4);
print(l1, "l1 after splice(it1,l4,it3,it4)");
Noisy n;
LN l5(3, n);
generate_n(back_inserter(l5), 4, NoisyGen());
l5.push_back(n);
print(l5, "l5 before remove()");
l5.remove(l5.front());
print(l5, "l5 after remove()");
l1.sort(); l5.sort();
l5.merge(l1);
print(l5, "l5 after l5.merge(l1)");
cout << "\n Cleanup" << endl;
} ///:~
The print( ) function is used to display results.
After filling four lists with Noisy objects, one list is spliced
into another in three different ways. In the first, the entire list l2 is
spliced into l1 at the iterator it1. Notice that after the splice,
l2 is empty – splicing means removing the elements from the source
list. The second splice inserts elements from l3 starting at it2
into l1 starting at it1. The third splice starts at it1 and
uses elements from l4 starting at it3 and ending at it4
(the seemingly-redundant mention of the source list is because the elements must
be erased from the source list as part of the transfer to the destination
list).
The output from the code that demonstrates
remove( ) shows that the list does not have to be sorted in order
for all the elements of a particular value to be removed.
Finally, if you merge( ) one list with another,
the merge only works sensibly if the lists have been sorted. What you end up
with in that case is a sorted list containing all the elements from both lists
(the source list is erased – that is, the elements are moved to the
destination list).
There’s also a unique( ) member function
that removes all duplicates, but only if the list has been sorted
first:
//: C04:UniqueList.cpp
// Testing list's unique() function
#include <list>
#include <iostream>
using namespace std;
int a[] = { 1, 3, 1, 4, 1, 5, 1, 6, 1 };
const int asz = sizeof a / sizeof *a;
int main() {
// For output:
ostream_iterator<int> out(cout, " ");
list<int> li(a, a + asz);
li.unique();
// Oops! No duplicates removed:
copy(li.begin(), li.end(), out);
cout << endl;
// Must sort it first:
li.sort();
copy(li.begin(), li.end(), out);
cout << endl;
// Now unique() will have an effect:
li.unique();
copy(li.begin(), li.end(), out);
cout << endl;
} ///:~
The list constructor used here takes the starting and
past-the-end iterator from another container, and it copies all the elements
from that container into itself (a similar constructor is available for all the
containers). Here, the “container” is just an array, and the
“iterators” are pointers into that array, but because of the design
of the STL it works with arrays just as easily as any other container.
If you run this program, you’ll see that
unique( ) will only remove adjacent duplicate elements, and
thus sorting is necessary before calling unique( ).
There are four additional list member functions that
are not demonstrated here: a remove_if( ) that takes a predicate
which is used to decide whether an object should be removed, a
unique( ) that takes a binary predicate to perform uniqueness
comparisons, a merge( ) that takes an additional argument which
performs comparisons, and a sort( ) that takes a comparator (to
provide a comparison or override the existing one).
Looking at the previous example you may note that if you want
a sorted list with no duplicates, a set can give you that, right?
It’s interesting to compare the performance of the two
containers:
//: C04:ListVsSet.cpp
// Comparing list and set performance
#include <iostream>
#include <list>
#include <set>
#include <algorithm>
#include <ctime>
#include <cstdlib>
using namespace std;
class Obj {
int a[20]; // To take up extra space
int val;
public:
Obj() : val(rand() % 500) {}
friend bool
operator<(const Obj& a, const Obj& b) {
return a.val < b.val;
}
friend bool
operator==(const Obj& a, const Obj& b) {
return a.val == b.val;
}
friend ostream&
operator<<(ostream& os, const Obj& a) {
return os << a.val;
}
};
template<class Container>
void print(Container& c) {
typename Container::iterator it;
for(it = c.begin(); it != c.end(); it++)
cout << *it << " ";
cout << endl;
}
struct ObjGen {
Obj operator()() { return Obj(); }
};
int main() {
const int sz = 5000;
srand(time(0));
list<Obj> lo;
clock_t ticks = clock();
generate_n(back_inserter(lo), sz, ObjGen());
lo.sort();
lo.unique();
cout << "list:" << clock() - ticks << endl;
set<Obj> so;
ticks = clock();
generate_n(inserter(so, so.begin()),
sz, ObjGen());
cout << "set:" << clock() - ticks << endl;
print(lo);
print(so);
} ///:~
When you run the program, you should discover that set
is much faster than list. This is reassuring – after all, it
is set’s primary job
description!
It turns out that all basic sequences have a member function
swap( ) that’s designed to switch one sequence with another
(however, this swap( ) is only defined for sequences of the same
type). The member swap( ) makes use of its knowledge of the internal
structure of the particular container in order to be efficient:
//: C04:Swapping.cpp
// All basic sequence containers can be swapped
#include "Noisy.h"
#include <list>
#include <vector>
#include <deque>
#include <iostream>
#include <algorithm>
using namespace std;
ostream_iterator<Noisy> out(cout, " ");
template<class Cont>
void print(Cont& c, char* comment = "") {
cout << "\n" << comment << ": ";
copy(c.begin(), c.end(), out);
cout << endl;
}
template<class Cont>
void testSwap(char* cname) {
Cont c1, c2;
generate_n(back_inserter(c1), 10, NoisyGen());
generate_n(back_inserter(c2), 5, NoisyGen());
cout << "\n" << cname << ":" << endl;
print(c1, "c1"); print(c2, "c2");
cout << "\n Swapping the " << cname
<< ":" << endl;
c1.swap(c2);
print(c1, "c1"); print(c2, "c2");
}
int main() {
testSwap<vector<Noisy> >("vector");
testSwap<deque<Noisy> >("deque");
testSwap<list<Noisy> >("list");
} ///:~
When you run this, you’ll discover that each type of sequence container is able to swap one sequence for another without any copying or assignments, even if the sequences are of diffe