Vlad Lazarenko

... making all this up as I go along

Erasing Vector the Smart Way

In the C++ world, std::vector is a sequence container) that encapsulates dynamic size arrays. One of the main perks of the vector is the fact that elements are stored contiguously. In other words, a pointer to an element of a vector may be passed to any function that expects a pointer to an element of array. Or simply put — std::vector is very close to a plain old-school C-style array, yet it provides developers with a lot of C++ perks like automatic memory management, bounds checking and more. Unfortunately, std::vector is so great that many developers are using it without really thinking too much about the underlying implementation and its downsides.

Erase Trap

A common programming task is to remove one or more elements from the vector. Luckily, std::vector provides a wonderful method called erase() that does exactly that. So what’s the big deal? It’s a trap!

C++ is just like medicine — it is both a cure and a disease. Vector is no exception. Its contiguous memory layout may heal, or it may kill. In this case it is a serial killer. Let’s see why.

The Innocent Piece of Code

Let’s take a look at the real-world example that I ran into a few days ago. The goal was to remove all even numbers from a vector, leaving only odd ones. Here is how it was implemented:

1
2
3
4
5
6
7
std::vector<int>::iterator it = array.begin();
while (it != array.end()) {
    if (*it % 2 == 0)
        it = array.erase(it);
    else
        ++it;
}

Logically thinking, the code is very sound — every element is being checked and removed only if it matches a given criteria. Developer even took extra care not to access invalidated iterator. Sounds like a great job. Nice, simple, and it works. Or does it?

The Problem

No, it does not. There is one big problem with that code that makes it totally useless. It takes about 55 seconds to run on just one million random elements. Why does that happen, you may ask? The developer fell into a common trap! That happened because the elements of the vector are stored contiguously, in a contiguous memory and the whole vector is shifted left every time an element is erased. In other words — one does not simply erase an element that is not at the end of the vector.

The Solution

Here is the right way of doing this:

1
2
3
4
5
6
7
8
array.erase(
  std::remove_if(
      array.begin(), array.end(), [](int v) {
          return v % 2 == 0;
        }
    ),
    array.end()
);

The above code does the job in only 5 milliseconds. That is about eleven thousand times faster. What kind of sorcery is that? Very simple — std::remove_if() does not erase elements from the vector. It does’t remove anything either. Just a good naming joke. So what the hell it does? It re-arranges elements in such a way that elements to be erased are moved towards the end of a vector. Once the job is done, all elements are erased from the vector with a single invocation of erase(). As a result, the whole vector is not shifted tens of thousands of times and runs a lot faster.

The Conclusion

There ain’t no such thing as a free lunch.