dtl

vec_multiset<Key, Pred, A, Container>


Category: containers	Component type: type

Description

Rationale

The vec_multiset is an STL compatible container that emulates a std::multiset using a sorted std::vector. This container works best when you have a setup phase where you are inserting elements followed by a lookup phase with few insertions or deletions. The rationale for this container is that it can provide much better performance than a std::multiset in terms of both memory and speed.

When should you use vec_multiset?

1. If you insert many elements without any lookups in a "setup" phase followed by a "lookup" phase where you search for elements in the container with few changes to the container. Vec_multiset is optimized for this case in terms of performance and memory usage. Sorted containers are often used in a very regular fashion: population before lookup.

2. Hashed containers are not suitable for your application because you need guaranteed worst case lookup time, or you are unwilling to have the memory overhead associated with a std::hash_set.

When should you NOT use vec_multiset?

1. If exception safety is an issue, you are better off with std::multiset. Vec_multiset is only exception-safe for POD's.

2. If you intermix insertions/erasures and lookups arbitrarily. Std::multiset handles this case better than vec_multiset.

3. If you need pointer or reference validity throughout the lifetime of the container, use std::multiset. Vec_multiset's underlying container can shift the data elements around in memory. However, if you write your code in a more modern fashion to use iterators rather than pointers, you can still use vec_multiset.

4. Iterators for vec_multiset are not guaranteed to be valid until after the first lookup into the container. So, if you need to capture iterators during the setup phase of the container (i.e. iterators returned by insert() before the container has been sorted) you must use std::multiset. If you only need iterators to begin() or end(), you can still use vec_multiset as those operations will trigger a sort on the container.

Performance of vec_multiset vs. std::multiset

Many applications use std::multiset by first inserting elements into the container, then performing lookups in that container. This is the behavior that vec_multiset is optimized to handle. The designers of std::multiset designed their container to handle generic sequences of inserts, erases, and lookups in an arbitrary order. Thus, they used a tree for all operations. The tree design carries with it much memory overhead resulting in poor memory contiguity and lookups that are much slower than can be achieved by a std::vector.

Vec_multiset makes some assumptions which allow it to perform its job in an efficient manner. Vec_multiset assumes that the user will first populate the container and then perform lookups, it doesn't have to keep the elements sorted until lookup time. Vec_multiset can perform inserts in amortized constant time because it just pushes the item onto the end of the underlying container. Furthermore, vec_multiset can store its elements contiguously in memory. Lookups occur in logarithmic time because the STL algorithms invoked by the vec_multiset use binary searches, but without the memory thrashing that std::multiset can suffer from. After the first lookup operation, vec_multiset will insert any additional elements into the proper place in the underlying vector to keep the container sorted, which takes linear time. These additional inserts after the first lookup are slow, but that is OK as we expect very few insertions into the container once we begin lookups into the vec_multiset. Similarly, erasures from a vec_multiset occur in linear time, but again, we don't expect many calls to vec_multiset::erase() once the vector is sorted.

The following performance results (times listed in seconds) compare the performance of vec_multiset<int> with std::multiset<int> in the case of 6,000,000 inserts followed by 6,000,000 calls to the member find() function on each container. To make an equal playing field in the timing data for insertion, the time to sort the underlying vector (which occurs right before the first lookup) is included in the totals. Note that the std::stable_sort() used to implement the sorting in vec_multiset should occur in linear time. The table below presents the test results for different ratios of lookups to inserts. We ran these tests in Debug build in Microsoft Visual C++ v 6.0 SP 5 using STLPort 4.0. We used the optimizations of 'Maximize Speed' and 'Any Suitable' inlining. We also compare against STLPort 4.0's hash_multiset<int> container. The tests performed the same number of lookups as inserts and the actual times recorded in those runs appear in the 1:1 row of the table. We then extrapolated the results in the other rows (2:1, 3:1, 4:1, and 5:1) by multiplying the lookup times by the appropriate factor and adding these totals to the insertion times for our runs. .

ratio of lookups to inserts	multiset inserts	vec_multiset inserts	hash_multiset inserts	multiset lookups	vec_multiset lookups	hash_multiset lookups	multiset total	vec_multiset total	hash_multiset total
1:1	15	6	4	2	2	1	17	8	5
2:1	15	6	4	4	4	2	19	10	6
3:1	15	6	4	6	6	3	21	12	7
4:1	15	6	4	8	8	4	23	14	8
5:1	15	6	4	10	10	5	25	16	9

Vec_multiset still can't beat a hashed container with a good hashing function as the numbers for hash_multiset prove. The numbers above indicate that vec_multiset is a significant improvement over std::multiset for insertions. Vec_multiset's usual case of constant time insertion beats the logarithmic time of std::multiset's corresponding calls. Vec_multiset matches the good logarithmic performance of lookup calls in std::multiset. We were surprised that the std::multiset lookups did not suffer from the performance penalty expected from memory thrashing. The vec_multiset matches the ideal performance that would be achieved by manually using a std::vector and the appropriate algorithm calls on it (std::stable_sort(), std::lower_bound(), and friends).

Implementation of vec_multiset

The container takes advantage of the assumption that first the user populates the container, and then perform lookups in it. When the user is populating the container, the vec_multiset simply tacks newly inserted elements onto the end of the container. The first time the user invokes a lookup method on the container (also including vec_multiset::begin() and vec_multiset::end()), the container sorts itself. The lookup operations simply call the appropriate STL algorithms, passing the predicate function through along with the begin() and end() iterators of the underlying vector which actually holds the elements. The user may still insert or erase elements after the first lookup, but now the container inserts any new items into their proper sorted positions.

Vec_multiset provides iterators that are always valid unless:

1. The container hasn't been sorted yet. This happens as the sort will rearrange the items in the container through a call to std::stable_sort(), which is oblivious to the structure of vec_multiset or its iterators.

2. The element pointed to has been removed from the container. This is true for iterators of the standard STL containers, including std::multiset.

To accomplish this, a vec_multiset::iterator stores the index of the element within the vec_multiset it belongs to (actually in its underlying container), rather than a raw pointer to the element. The iterator thus doesn't care about actual memory addresses of the elements, which is especially important when the vec_multiset's underlying container shifts elements to a new block of memory due to getting full. However, elements in the underlying vector get shifted on insertions and deletions from the container. The index of the element an iterator refers to thus may change during one of these operations. The vec_multiset keeps a log of all insertions and erasures that are performed on it after the container is sorted so that iterators may use this information to properly update their indices (or invalidate themselves if the element referred to was erased).

Memory usage vs. std::multiset

For the following discussion, assume that a pointer takes 4 bytes of memory and we're basing our data here on Microsoft Visual C++ 6.0 SP5's implementation of std::vector and std::multiset. We also ignore the comparison function and allocator members. We specifically compare a vec_multiset whose underlying container is a std::vector.

In general, vec_multiset incurs less memory overhead than std::multiset. Std::multiset contains the red-black tree which holds the data elements as a member. In turn, the tree holds a pointer to its root. The overhead for this pointer is thus 4 bytes.. The per-element overhead for std::multiset is whatever the overhead for a single tree node is. A tree node contains pointers to the left, right, and parent nodes of the element as well as the color flag (red or black). Thus, we have a total of 16 bytes overhead per element in the std::multiset assuming the color flag takes up 4 bytes (enums are 4 bytes in Visual C++).

Thus, the total overhead for std::multiset = 4 bytes for the pointer to the tree root + 16 bytes * number of elements in the container.

Vec_multiset is forgiving in its use of memory. The total overhead is that of the change log (a std::vector<iter_change> recording insertions/erasures after the first lookup into the vec_multiset), the overhead of the underlying std::vector which contains the data stored in the vec_multiset, and the memory taken up by a flag indicating whether the underlying vector's data is sorted or not (a bool, which in Visual C++ is 1 byte).

As both the change log and elements contained in a vec_multiset are stored in an underlying std::vector, first we need to compute the overhead for the std::vector template. A std::vector contains three iterators, which are implemented as pointers, referring to the begin, end, and last (end of storage) elements in the underlying array, taking up 12 bytes total. As the elements are stored in the underlying array, there is no per-element overhead.

The overhead for the change log will thus be 12 bytes plus the number of elements in the change log which should be relatively small as long as the vec_multiset is used as intended (few inserts/erasures after first lookup). Next we need to know the size of an element in the change log. Each entry in the change log contains an enum indicating whether the change was an insertion or an erasure, the index where the insertion/erasure is occurring (of type size_t, 4 bytes in Visual C++), and the number of elements inserted/erased during this operation (of type size_t). Thus, each log entry takes up 4 bytes for the insert/erase flag + 4 bytes for the insert/erase index + 4 bytes for the value indicating number elements inserted = 12 bytes.

Therefore, the overhead due to the change log = 12 bytes for std::vector overhead + 12 bytes * number of insertions and erasures after first lookup

The overhead for the underlying data vector = 12 bytes.

Thus, the total overhead for vec_multiset =12 bytes for the change log vector's overhead itself + 12 bytes * number of insertions and erasures after first lookup * sizeof(iter_change) + 12 bytes for the std::vector which contains the data elements + 1 byte for the sorted flag

Combining like terms:

Total overhead for vec_multiset = 25 bytes + 12 bytes * number of insertions and erasures after first lookup.

Comparing the results shows that in general, vec_multiset incurs far less memory overhead than std::multiset if very few insertions and erasures are done after the first lookup (remember, that's the intended case for vec_multiset!). Especially note that there is absolutely no per-element overhead for a vec_multiset, which is a major plus! Compare that with the overhead of 16 bytes per element in std::multiset.

Iterator Stability

We could have used two different methods to achieve iterator stability

with vec_multiset:



1.  A method David Abrahams proposes for iterator stability

(indirect_container and indirect_iterator .. storing a

container of iterators to the data elements).  This

means that the user has to populate a data container

such as a std::vector<T> and then create some other

container whose elements actually refer to elements in

that first data vector, such as a

std::set<T>::iterator.  This method is a lean

and mean way to get iterator (or pointer) stability. 

However, it comes at at possible cost of a loss of

memory contiguity (depending on the allocator) which

can slow down memory accesses and also extra address

arithmetic the machine must perform. 



2.  Our method references the data elements directly. 

Vec_multiset<T> internally stores an underlying

std::vector<T> which actually contains the data.  The

vec_multiset<T>::iterator stores several simple pieces

of information:

* a pointer to the vec_multiset that it is referring to

* the index of the element in the vec_multiset's

underlying vector

* an int that tells where to look in the referred to

container's change log of inserts/erasures after the

first lookup in the container so the iterator may

update its position and thus maintain iterator

stability.



This method guarantees both iterator stability and

contiguity of memory at very little expense.  The key

here is that there are no address computations

necessary each time elements are compared vs. the

indirect_iterator method which must dereference an

iterator/pointer.



We believed that the direct storing of the vector data

would be much more efficient rather than the

indirect_container approach.



So we performed a benchmark test to compare the two

methods.



For this test, we used a sample of 15,000,000 ints and

used a std::vector<int> as our base container for the

indirect_iterator-like simulation to eliminate

contiguity as a factor.



For the direct storing method (the method we used to

implement vec_multiset), we did the following:



1. Inserted the ints into the vector using

std::vector<int>::push_back().

2. Called std::stable_sort() from begin() to end() on

the vector

3. Did 15,000,000 std::lower_bound() calls on the

begin() to end() range to locate a number.  



We timed each step and then added the results together

to get the overall total for this method.



To simulate the indirect_container method, we had to:



1. Populate a std::vector<int> with the 15,000,000 elements

using std::vector<int>::push_back().

2. For each element vec[i] in the vector vec, we

inserted &vec[i] into a vector<int *> called pvec.

3. Call std::stable_sort(pvec.begin(), pvec.end(), 

IntStarLess()) where IntStarLess::operator()(pInt1,

pInt2) is true iff *p1 < *p2.

4. Perform 15,000,000 std::lower_bound() calls over the

pvec.begin() to pvec.end() range with an IntStarLess

as the predicate.



Again, we timed each step and summed the results

together for the overall time.



We performed the same operations using each method

using the same data set and calls to

std::lower_bound().



The results we got back were astonishing which

confirmed our reasoning:



---- vector<int> ----

vec.push_back(): 26

stable_sort(vec.begin(), vec.end()): 20

lower_bound(): 6

*** Total time for vector<int> impl.: 52 ***



---- vector<int *> ----

vec_ptrs.push_back(): 8

stable_sort(vec_ptrs.begin(), vec_ptrs.end()): 55

lower_bound(): 44

Subtotal time for vector<int *> ops: 107

(but we're not done as we must add in the vector<int *>

push_back time of: 26)

*** Total time for vector<int *> impl.: 133 ***



As you can see, the std::stable_sort() and

std::lower_bound() calls for the direct approach

easily outperformed the corresponding calls for the

indirect_iterator simulation.  Also, there is the

additional overhaead in the indirect method of having

to push_back the objects onto the data vector even

before it starts working with the pointers.



The performance test proved that the direct approach

we used for vec_multiset is much faster and more

efficient than the use of creatures like

indirect_iterator.



Caveats to these results:

1. With larger objects, the indirect_iterator method

probably would perform much faster as the sort only

swaps pointers.  The direct storing method's sort call

actually has to swap the objects themselves, which can

definitely be much more expensive. In the case of our

example above this penalty does not appear since ints

are small.



2. As Dave Abrahams pointed out, if your container

holds something like std::string objects, then holding a

vector of pointers (e.g. std::string *) may not lose you

much anyway since the underlying objects are in fact

holding the data via pointers.

Example


using namespace std;





int main()

{

  vec_multiset<int> vms;

  

  // "setup" phase ... populate our container

  

  vms.insert(23);

  vms.insert(94);

  vms.insert(76);

  vms.insert(10);

  vms.insert(91);

  vms.insert(12);

  vms.insert(76);

  

  // more inserts here

  // ...

   

  // "lookup" phase ... use our container to find different elements



  // print out container elements ... they'll appear in sorted order

  

  cout << "Container elements: " << endl;



  copy(vms.begin(), vms.end(), ostream_iterator<int>(cout, " "));

  cout << endl;



  if (vms.find(12) != vms.end())
	cout << "Found element with value 12" << endl;

  else

    cout << "Could not find element with value 12" << endl;



  pair<vec_multiset<int>::iterator, vec_multiset<int>::iterator> pr =

	vms.equal_range(76);



  cout << "Elements with value 76: " << endl;

  copy(pr.first, pr.second, ostream_iterator<int>(cout, " "));



  cout << endl;

  

  // more lookups here, possibly calls to vms.lower_bound(), vms.count(), etc.

        

  return 0;  

}

Definition

Defined in the vec_multiset.h header file.

Model of

Multiple Sorted Associative Container, Simple Associative Container

Associated types

Those defined by Multiple Sorted Associative Container and Simple Associative Container.

Public Base Classes

None.

Template parameters

Parameter	Description	Default
Key	The type of elements within the vec_multiset.
Pred	The key comparison function, a Strict Weak Ordering whose argument type is key_type; it returns true if its first argument is less than its second argument, and false otherwise. This is also defined as vec_multiset::key_compare and vec_multiset::value_compare.	std::less<Key> >
A	The allocator used by the container.	std::allocator<Key> >
Container	The underlying container used by the vec_multiset to actually hold the elements.	std::vector<Key, A>

Members and Usage

Identical to those declared for std::multiset.

Bibliography

The idea for this container came from concepts discussed in:

Meyers, Scott. Effective STL, Item 23, pp. 100-106. Addison-Wesley, Boston: 2001.

Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Corwin Joy and Michael Gradman make no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.

This site written using the ORB.