C++ Exception Handling and Performance

written by - 47 Comments

Introduction

Exceptions provide a way to react to special conditions that change the normal flow of a program execution. Exception handling in general can refer to programming language construct, computer hardware mechanism or both.

Many people are concerned with performance impact introduced by using exception mechanism in C++. For example, one of my colleagues believes that using exceptions should be avoided at any cost to improve execution speed of an application. So it that true or not? Let’s find out!

Using exceptions

It is important to understand that exception is not a general case of a program execution flow but an unexpected situation. Such situation should normally not arise at all. However, it could arise so we need to check for error. There are two fundamental approaches – returning error code from a function and using exception. For example, let’s say we have a custom function that implements division:

int divide (int x, int y)
{
  return x / y;
}

We need to make sure that divisor is not zero because you cannot divide by zero. There are two ways of doing this. C-style error checking will make our function look like this:

int divide (int x, int y, int & result)
{
  if (y == 0)
    return -1;
  result = x / y;
  return 0;
}

C++ error checking using exceptions will look this this:

int divide (int x, int y)
{
  if (y == 0)
    throw std::logic_error (“Division by zero”);
  return x / y;
}

The use cases for these two functions will be different. For C-style function, we will always have to check for return code to make sure that operation succeeded:

void foo (int x, int y)
{
  int result;
  if (divide (x, y, result) == 0)
  {
    // Division was successful. Do something with “result”.
  }
  else
  {
    // Error occurred!
  }
}

The C++ use case:

void foo (int x, int y)
{
  try
  {
    int result = divide (x, y);
    // Division was successful. Do something with “result”.
  }
  catch (const std::logic_error &)
  {
    // Error occurred!
  }
}

C++ way with exceptions becomes even handier when program flow more complicated. What if we need to invoke “divide” function two times? In that case we will have to check for error twice using C-style:

void foo (int x, int y)
{
  int result;
  if (divide (x, y, result) == 0)
  {
    // Division was successful. Do something with “result”.
  }
  else
  {
    // Error occurred!
  }
 
  if (divide (y, x, result) == 0)
  {
    // Division was successful. Do something with “result”.
  }
  else
  {
    // Error occurred!
  }
}

But in C++, one try-catch block will be enough:

void foo (int x, int y)
{
  try
  {
    int result = divide (x, y);
    // Division was successful. Do something with “result”.
    result = divide (x, y);
    // Division was successful. Do something with “result”, again.
  }
  catch (const std::logic_error &)
  {
    // Error occurred!
  }
}

Now, imagine that we have to invoke “divide” function ten or hundred times. And to make it even more complex, imagine that we have multiple nested functions and every function needs to check for an error. Using exception now seems to be the ideal way to go. Well, that is what exceptions were designed for – to make things easier.

Easy vs. Fast

There are a lot of things that will make our lives as developers easier. But sometimes our programs should be fast. And not only fast but fastest in the world, or even the whole universe. And we have to sacrifice the ease of development to achieve that. This is the case with high-frequency trading programs, for example. So if a programmer writing code for high-frequency trading will think that wrapping invocation of “divide” function with “try-catch” block will slow down his application even by a nanosecond in comparison with C-style error checking, he will choose the hardest way and check for error code every time he invokes “divide” function. Indeed, he will spend much more time to achieve his goals, but will that be a right decision?

Under the hood

To answer the question above, we need to dive into the implementation details and figure out how exceptions are implemented. From this point, there is a huge difference between C++ and higher-level languages such as Java, C#, Python and others. In C++, there are two methods for handling exceptions at run-time – “setjmp/longjmp” (hereinafter jumping) method and “zero-cost” exception handling.

Jumping method saves the context when entering a frame with an exception handler. Then when an exception is raised, the context can be restored immediately, without the need for tracking stack frames. This method provides very fast exception propagation, but introduces significant overhead for use of exception handlers, even if no exception is raised.

Zero-cost method generates static tables to describe exception ranges. No dynamic code is required when entering a frame containing an exception handler. When an exception is raised, the tables are used to control a back trace of the subprogram invocation stack to locate the required exception handler. This method has considerably poorer performance for the propagation of exceptions, but there is no overhead for exception handlers if no exception is raised.

There is always upside and downside and we have to make a choice. Taking into account that exceptions are not a part of normal execution flow, we need to optimize the most common case when exceptions are not thrown and sacrifice the speed of handling them. So many production-quality C++ compilers made that choice in favor of zero-cost method.

Digging into the assembler

So let’s get back to our “divide” function and compare C-style error checking with zero-cost exception handling. C-style error checking:

int divide (int x, int y, int & result)
{
  if (y == 0)
    return -1;
  result = x / y;
  return 0;
}
 
int foo (int & result)
{
  volatile int x = 4, y = 28;
  int d1, d2;
  if (divide (x, y, d1) == -1)
    return -1;
  if (divide (y, x, d2) == -1)
    return -1;
  result = d1 + d2;
  return 0;
}
 
int main ()
{
  int result;
  foo (result);
  return result;
}

Code using exceptions:

int divide (int x, int y)
{
  if (y == 0)
    throw std::logic_error ("Division by zero");
  return x / y;
}
 
int foo ()
{
  volatile int x = 4, y = 28;
  return divide (x, y) + divide (y, x);
}
 
int main ()
{
  try
    {
      return foo ();
    }
  catch (const std::exception &)
    {
      return -1;
    }
}

Here is what will actually happen for C-style example (I removed code that won’t get executed for simplicity):

__Z6divideiiRi:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	%edi, -4(%rbp)
	movl	%esi, -8(%rbp)
	movq	%rdx, -16(%rbp)
	cmpl	$0, -8(%rbp)
	jne	L2
;; Skipped return of -1. We will always jump to L2
L2:
	movl	-4(%rbp), %eax
	movl	%eax, %edx
	sarl	$31, %edx
	idivl	-8(%rbp)
	movl	%eax, %edx
	movq	-16(%rbp), %rax
	movl	%edx, (%rax)
	movl	$0, %eax
	popq	%rbp
	ret
 
__Z3fooRi:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$24, %rsp
	movq	%rdi, -24(%rbp)
	movl	$4, -4(%rbp)
	movl	$28, -8(%rbp)
	movl	-8(%rbp), %ecx
	movl	-4(%rbp), %eax
	leaq	-12(%rbp), %rdx
	movl	%ecx, %esi
	movl	%eax, %edi
	call	__Z6divideiiRi
	cmpl	$-1, %eax
	sete	%al
	testb	%al, %al
	je	L5
;; Skipped return of -1, we will always jump to L5.
L5:
	movl	-4(%rbp), %ecx
	movl	-8(%rbp), %eax
	leaq	-16(%rbp), %rdx
	movl	%ecx, %esi
	movl	%eax, %edi
	call	__Z6divideiiRi
	cmpl	$-1, %eax
	sete	%al
	testb	%al, %al
	je	L7
;; Skipped return of -1, always jumping to L7.
L7:
	movl	-12(%rbp), %edx
	movl	-16(%rbp), %eax
	addl	%eax, %edx
	movq	-24(%rbp), %rax
	movl	%edx, (%rax)
	movl	$0, %eax
	leave
	ret
 
_main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp
	leaq	-4(%rbp), %rax
	movq	%rax, %rdi
	call	__Z3fooRi
	movl	-4(%rbp), %eax
	leave
	ret

That’s a lot of error checking code that gets executed assuming it doesn’t happen very often! If we need to call “divide” function more, we will have to add more checking for “-1” return result, and assembly listing will get longer and longer because of that check. Let’s see how the code will look like with exception handling:

__Z6divideii:
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%r12
	pushq	%rbx
	subq	$32, %rsp
	movl	%edi, -36(%rbp)
	movl	%esi, -40(%rbp)
	cmpl	$0, -40(%rbp)
	jne	L2
;; Skipped the code that allocates and throws exception. We will always jump to L2.
L2:
	movl	-36(%rbp), %eax
	movl	%eax, %edx
	sarl	$31, %edx
	idivl	-40(%rbp)
	addq	$32, %rsp
	popq	%rbx
	popq	%r12
	popq	%rbp
	ret
 
__Z3foov:
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%rbx
	subq	$24, %rsp
	movl	$4, -20(%rbp)
	movl	$28, -24(%rbp)
	movl	-24(%rbp), %edx
	movl	-20(%rbp), %eax
	movl	%edx, %esi
	movl	%eax, %edi
	call	__Z6divideii
	movl	%eax, %ebx
	movl	-20(%rbp), %edx
	movl	-24(%rbp), %eax
	movl	%edx, %esi
	movl	%eax, %edi
	call	__Z6divideii
	addl	%ebx, %eax
	addq	$24, %rsp
	popq	%rbx
	popq	%rbp
	ret
 
_main:
	pushq	%rbp
	movq	%rsp, %rbp
	pushq	%rbx
	subq	$24, %rsp
	call	__Z3foov
	movl	%eax, %ebx
	movl	%ebx, %eax
	addq	$24, %rsp
	popq	%rbx
	popq	%rbp
	ret
;; Stack unwinding code is invoked from statically generated exception
;; table. That code is stripped out, will never be reached in our example.

That is much better! We managed to avoid two unnecessary checks of return result. Other than that code that is being executed is exactly the same.

Jumping exceptions

Now, let’s say we have a compiler that is using “setjmp/longjmp” approach to implement exceptions. Even with that approach, exception handling could be faster than error checking. Consider the following example:

while (doContinue) {
   try {
     doSomeWork ();
   }
   catch (...) { /* do something about it! */ }
}

… that will be indeed slower than this:

while (doContinue) {
   if (doSomeWork () != 0) {
      /* do something about it! */
   }
}

… but how about this:

while (doContinue) {
   try {
      do {
        doSomeWork ();
      } while (doContinue);
      break;
   } catch (...) { /* do something about it! */ }
}

In the example above, we set recovery point once and avoid checking return result of the function multiple times. Of course, that is the best-case scenario assuming that exceptional situation almost never happens. But that is a fair assumption. Otherwise that situation should be treated as normal execution flow and handled differently, without exceptions, which will be the same for both C-style and C++-style programmers. Please note that above optimization makes no sense in case with zero-cost exception mechanism.

Finding out what exception mechanism is being used

Unfortunately, the only way to find out what underlying mechanism is used by compiler for exception handling is to write a simple program using exceptions, compile it into assembler language and analyze results.

Other performance considerations

Using exceptions will make binary size bigger, no matter what mechanism is used to implement exception handling. So if program size is more important than execution speed, exceptions should not be used. And in those cases where you want to gain maximum execution speed with minimal impact on binary size, you have to do additional testing in order to find the best combination of C-style error checking and exception handling code that achieves your result.

Summary

Performance can mean many different things – from the speed of execution or size of the binary to the time developer spends writing code.

In this article we have discussed execution speed of an application using exceptions vs. an application using C-style error checking, and developer’s time needed to write the same program using these two approaches.

If binary size of the compiled program is the most important factor then exceptions should not be used.

If execution speed or ease of development (or both!) is the most important factor then programmer deciding to give up the ease of using exceptions in favor of C-style error checking will not only spend much more time writing the code, make the code more complicated, run into the problem of actually describing the error (especially in multi-threaded environment, and especially when thread local storage cannot be used due to handling of exception in different thread in case of asynchronous programming), but will also make it slower.

References

42 comments

  1. Viktor - 16/02/2011 Reply

    Thank you Vlad for your article
    It was very interesting for me because I was agreed with your college and your article rise some questions in my mind that always good. In my experience if probability of exceptions is small you can use try catch otherwise you should avoid it
    So this is my test
    Your 2 functions:

    int divide_c (int x, int y, int & result)
    {
    if (y == 0)
    return -1;
    result = x / y;
    return 0;
    }
    int divide_c_plus (int x, int y)
    {
    if (y == 0)
    throw std::logic_error ("Division by zero");
    return x / y;
    }

    Loop C style
    1) Exceptions never occurred

    int result;
    for (int ii=0; ii<100000000; ++ii)
    {
    int x=1, y=2;
    if (divide_c (x, y, result) == 0)
    {
    y=1;
    }
    else y=0;
    }

    Time elapsed 0 h. 0 min. 3.4294960996 sec.
    2) Exceptions always occurred
    Line int x=1, y=2; changed to int x=1, y=0;
    Time elapsed 0 h. 0 min. 3.4294875181 sec.
    Practically the same time
    Loop C++ stile
    3) Exceptions never occurred
    int result;

    for (int ii=0; ii<100000000; ++ii)
    {
    int x=1, y=2;
    try
    {
    result = divide_c_plus (x, y);
    // Division was successful. Do something with “result”.
    y=1;
    }
    catch (const std::logic_error &)
    {
    y=0;
    }
    }

    Time elapsed 0 h. 0 min. 5.4294250789 sec.
    4) Exceptions always occurred
    Line int x=1, y=2; changed to int x=1, y=0;
    Time elapsed 0 h. 10 min. 24.4294709936 sec.
    Test was performed on the same box, same environment and you see the difference
    I think if you perform test by yourself you will get similar result.
    Please let me know your thoughts
    Thank you
    Viktor

    • Gianluigi - 07/03/2011 Reply

      in response to Viktor: exceptions are what they says… Exceptional cases
      This means exceptions should not be used to return a condition from a function, but should only be used for unlikely to occur events: a corrupted file, a broken connection a divide by zero. As you said it has to be used when probability of event is very low.
      A common misunderstanding is to use the exception like an ‘else’ statement.
      Also the runtime cost for setting up exception handling is very high in the examples you provide, that cost tends to zero as the program size get bigger.

      My honest opinion

      • Adelle Hartley - 08/03/2011 Reply

        When writing reusable system or library code, “unlikely” depends on the application. If I’m writing a file integrity checker, a corrupt file might be a common occurrence, or if I’m writing an application like wget, a broken connection might be a common occurrence.

        In either of these cases, I would say the overriding performance concern is I/O and consequently, the performance disadvantage of using exceptions would be a minor concern.

        However, it seems to me, that if we use exceptions at the bottom of the stack, we have no choice to avoid exceptions all the way up the stack, because there will always be the possibility of an uncaught exception.

        • Sebastian - 01/04/2011 Reply

          I think “unlikely” is also a bad description for when an exception must be used. An exception should be thrown when some state was reached that should not be reached if the program executes correctly and the input data is sensible.

          A file integrity checker is written to check for file integrity, its author expects that some files might be corrupted. A corrupt file it is an expected state. Of course the integrity checker detects an error, but this is not an error state, there is no error in the execution of the program. This is a big difference.

          To summarize, if a corrupt file was detected, no exception should be thrown but another mechanism should indicate the fact. Errors and errors can be very different things.

      • John - 17/03/2011 Reply

        I don’t think “it will rarely happen” is the only legitimate justification for the overhead of throwing an exception. What if the presence of the error condition itself will cause you to go down a code path where performance is no longer critical? For example, let’s say you have a function whose parameters could be derived from user input, and it normally takes a long time to do heavy calculation – and you’ve spent a lot of effort optimizing that. But you add some input validation to the start – and if it fails, you may (or may not) feel like throwing an exception. If you do, it’s not a big deal, because you just got to skip all that hassle of doing any real work and instead you’re in the code path of “alert the user that they messed up” which can afford the time for a throw & catch. Does that make sense to someone other than me?

        • Vlad Lazarenko - 17/03/2011 Reply

          Totally makes sense. One of the real example could be sending a crash report or something in case when child process died unexpectedly. Or skip calculating further data if some initial snapshot is just wrong, which invalidates the whole thing. I’m just saying that for cases like that it is better not to check for error code after calling every function, and just relax and do your job, unless… unexpected (or somewhat unexpected) happens. At the same time, if speed of error handling matter a lot, you may don’t want to use exceptions.

          • Becky - 01/09/2011 Reply

            Ya learn something new eveydray. It’s true I guess!

  2. Vlad - 16/02/2011 Reply

    Viktor,

    Yeah, throwing exceptions is extremely expensive, so you have to find a golden mean. Say, if exception takes 10 ticks to get thrown and caught and checking the result of a function takes 3 ticks, and unexpected situation occurs 3 times out of 10, then using exceptions will waste 30 ticks (not checking error, no errors 7 times) versus the same 30 ticks of mainline error handling (3 ticks per invocation, 10 times). This seems like even scenario. But I feel like mainline code will still be faster due to better caching. Plus, binary size will be much smaller.

    However, if you think about nested functions (not everything is that simple and linear as our “divide” function example), then error checking grows a lot. For example, if a function calls 20 other functions and checks for error code after each invocation, then you waste 60 ticks.

    What I like to do is to run an application under gdb with “catch throw” instruction and make sure exceptions are not occurring. If they don’t arise under normal circumstances – they are exceptions and using them is good, otherwise we can treat those cases as a general cases rather than exceptional situations and handle accordingly :-)

  3. Gary Powell - 07/03/2011 Reply

    Vald, Thank you for your article. The other thing to consider is if you have multiple error codes, some compilers are not very efficient at figuring out which error code was thrown. In general they do a linear search/comparison on the type, and that can be very expensive as well.

    In general I use exceptions to keep code clean enough that I can do visual inspections to assure myself that it’s “correct.” And I can use a try/catch block in the main loop to guarantee that I will handle all exceptions at some level even if I don’t handle them close to where they get thrown.

    I also suspect that we can omit the catch all (catch …) block except at the top most try/catch level as all my hand written exception classes derive from std::exception. I should probably omit those try/catch blocks as well unless I can do something intelligent. Thus letting the whole block unwind to the top is the best that can be done. And instead just making derived classes of exceptions for things I can do something about.

  4. Brendan Miller - 07/03/2011 Reply

    I think prohibiting exceptions is one of the most common, and dumbest, micro-optimizations in C++.

    Aside from that, my most hated micro-optimizations are:
    1. Using out parameters instead of returning by value with RVO
    2. Using pointers all over the place to avoid copying. This is especially bad with reference counting smart pointers.

  5. Julien Koenen - 07/03/2011 Reply

    Thanks for your article! I just have some comments:

    1. I think the example is rather bad, because most cpus I know actually check for integer divisions by zero and create interrupts.

    2. I cases like this I prefer to assert() the input arguments and clearly document the valid input range. Yes, that shifts the burden to the caller, but the caller has more informations about the special circumstances of the call and the input parameters. In this specific case I would assert() the values in foo() (which would be trivial because the values x and y are defined and known in this case.

    This is what I would use:


    int divide (int x, int y)
    {
    assert( y != 0 );
    return x / y;
    }

    int foo()
    {
    volatile int x = 4, y = 28;
    return divide( x, y ) + divide( y, x );
    }

    int main ()
    {
    return foo();
    }

    and the resulting assembly code:


    int main ()
    {
    00405410 sub esp,8
    return foo();
    00405413 mov dword ptr [esp],4
    0040541A mov dword ptr [esp+4],1Ch
    00405422 mov ecx,dword ptr [esp+4]
    00405426 push esi
    00405427 mov esi,dword ptr [esp+4]
    0040542B push edi
    0040542C mov edi,dword ptr [esp+8]
    00405430 mov eax,dword ptr [esp+0Ch]
    00405434 cdq
    00405435 idiv eax,edi
    00405437 mov edi,eax
    00405439 mov eax,esi
    0040543B cdq
    0040543C idiv eax,ecx
    0040543E add eax,edi
    00405440 pop edi
    00405441 pop esi
    }

    Regards
    Julien

    • Julien Koenen - 07/03/2011 Reply

      Without the ‘volatile’ the compiler obviously creates the optimal code:


      int main ()
      {
      return foo();
      00405410 mov eax,7
      }
      00405415 ret

      Regards
      Julien

    • anti - 02/04/2011 Reply

      +1 for the assert, was about to write almost the same reply ;)

      but I might be slightly biased.

  6. Jim - 07/03/2011 Reply

    “We need to make sure that divisor is not zero because you cannot divide by zero. There are two ways of doing this”
    Bad example I feel, what about setting a trap handler for floating point errors?

    • Vlad Lazarenko - 08/03/2011 Reply

      You are right, bad example. I don’t wanna dig into signal handlers and SEH exceptions for simplicity sake. Need better example.

      • Dave Abrahams - 11/03/2011 Reply

        I don’t know that a trap handler is very useful in this case. What’s your response going to be? The only thing you can do other than “die, die, die” is to cause the division to return a “special” value, which doesn’t save you from having to check a value for errors if you care about recovery.

  7. Bill - 08/03/2011 Reply

    Reliability of the code is often overlooked in simple machine-speed analysis like this. It’s very easy to fail badly on garbage collection when using exceptions. This is especially problematic for applications that must be long-running.

    • Vlad Lazarenko - 08/03/2011 Reply

      Do you mean stack unwinding? It is always important to write exception safe code. Dave has a very good article on this – http://www.boost.org/community/exception_safety.html.

    • Wladimir - 09/03/2011 Reply

      One word: RAI. If you need reliability in C++, make sure you always use that idiom.

    • Sting - 21/03/2011 Reply

      I think that your statement could easily be turned on C programmers in that it is very easy to fail badly when deallocating partially allocated objects when there is an unexpected failure in the middle of several allocations. The C programmer might be lazy and not have written proper deallocation code. In the C++ case, proper ordering automatically gets the deallocation correct.

      • Mickey - 01/09/2011 Reply

        This has made my day. I wish all psotgnis were this good.

  8. fnl - 08/03/2011 Reply

    The example for why exceptions are “better/easier” when having to call a routine multiple times in the example given after “In that case we will have to check for error twice using C-style” is actually a bad one; think what happens when you call the C function and the C++ version with x=1 and y=0; vey different outcomes, indeed! At best, it is a good example of why exception handling is a rather tricky business to get right…

  9. Ron - 08/03/2011 Reply

    x=2, y=-2

    The unmentioned nice thing about exceptions is that they can’t possibly be in the domain of your output. They are a way to say Mu. As it is, your divide function gives false positive divide by zero errors whenever y == -x.

    There are some functions that completely consume the domain of your output. For example, a function that returns a uniform random unsigned 32-bit integer. You can’t pick an error code from this range.

    Another related example: we generate 64-bit encodings of images where I work, and we also scan our logs for the text “FATAL” to generate alarms. Once in a while, “FATAL” shows up in the encoded bytes and we get a spurious alarm.

    • John - 11/03/2011 Reply

      errno can handle this situation also.

      • Vlad Lazarenko - 11/03/2011 Reply

        John, errno is thread-specific. I am not sure how __errno_location is implemented, but it might be worth checking. Accessing thread-local storage, for example, is extremely expensive and should be avoided.

        • John - 15/03/2011 Reply

          There are plenty of other reasons to hate errno, like the fact that most developers will not remember to check it. And if they do, they expect the number to work with strerror and therefore not be portable.
          I was just pointing out that there are other solutions that don’t constrain the return value by requiring special values. I’m not against exceptions, it just sounded like the poster thought this advantage was somehow specific to exceptions.

          • Vlad Lazarenko - 15/03/2011 Reply

            So true! I am just curious – what are other ways of error checking? I got used to return codes (and special values etc), errno and exceptions. Of course you can do exception sort of magic manually, but that seems to be an overkill.

            • John - 17/03/2011 Reply

              Well, lumping together errno with setting any other value that’s external to both the caller and callee… the only one I can think of (without getting ridiculous) that you haven’t specified there is passing in a pointer/reference to where you’d like the error code stored. I’ve seen that a lot in a body of code where C and C++ were built on top of fortran (which is what they were writing in when they started that tradition – more natural in a language that always passes by reference).

  10. SnoopDougieDoug - 09/03/2011 Reply

    You obviously have never done any c programming as your logic for the c examples is @ss-backwards. You should return 1 (actually, any value not equal to zero) for success and 0 for failure, so you would code it as:

    int result;
    if (divide (x, y, result))
    {
    // Division was successful. Do something with “result”.
    }
    else
    {
    // Error occurred!
    }

  11. Marat Abrarov - 09/03/2011 Reply

    What about some friendship between exceptions and error codes?
    Look at Boost.Asio – it uses both ways.
    1) When You want exceptions – then You get them.
    2) When You want error codes – then You explicitly use boost::system::error_code.
    I think it is the best solution for “modern C++” library (and other code considered as reusable in future).
    First, You write code to use only boost::system::error_code.
    Those, who needs “exception-free speed” will use it.
    Then You simply wrap boost::system::error_code-based C++ code into the exceptions (boost::system_error) aware code.
    Users of Your code will make their choices themselves and can easily mix these two approaches.

  12. Vlad Lazarenko - 09/03/2011 Reply

    Marat, I think your idea is good, but only if you use policy based programming so code using C++ exception won’t wrap code using error codes. Otherwise you could get worse of both worlds – slowness of exception throwing and redundant error checking in mainline code.

  13. witek - 09/03/2011 Reply

    I use exceptions, but your post is highly speculative, it actually do not show performance numbers. Only real benchmark (in this case actually microbenchmark) will show what overheads are in no-error paths.

    • Andy - 09/03/2011 Reply

      As witek says, only real benchmarks will tell. So I benchmarked it.

      The results are highly dependent on your compiler and your optimization settings. And the differences aren’t that big in any case. So you really need to test your code, in your environment, before deciding whether this is a useful optimization. But in general, it isn’t.

      Here’s the code:

      #include
      #include
      #include
      #include

      using namespace std;

      struct ScopeTimer {
      const char *name_;
      clock_t start_;
      ScopeTimer(const char *name) : name_(name), start_(clock()) {}
      ~ScopeTimer() {
      clock_t d = clock() - start_;
      cout << name_ << ": " << d < 1) ? atoi(argv[1]) : 100000000;
      driver(reps);
      return 0;
      }

      I compiled this with four compilers, with a variety of optimization settings, on a 2.66GHz i7 MacBook Pro running OS X 10.6.6. In each case, I ran the program 20 times and averaged the middle 14. (I also tried a variant where I reversed the order of the tests. This did affect the 3 outliers, but not the middle 14.)

      g++-mp-4.6 from MacPorts gcc46 @4.6-20110226:
      -O0: 615127, 585693, 582130
      -O1: 437585, 362522, 354900
      -O2: 341678, 359758, 369011
      -O3: 341009, 339335, 339335
      -Ofast: 341009, 339395, 339335

      icc from Intel composerxe-2011.2.142:
      -O0: 863872, 1011756, 1025873
      -O1: 444893, 408040, 363817
      -O2: 343831, 343466, 344633
      -O3: 346255, 348687, 342480
      -fast: 344266, 340502, 340785

      g++ from Xcode 3.2 (i686-apple-darwin10-g++-4.2.1, Apple Inc. build 5664):
      -O0: 682215, 601786, 590281
      -O1: 497900, 479291, 453321
      -O2: 466781, 410853, 407556
      -O3: 370763, 410666, 408840
      -O2 -finline-functions: 370763, 410853, 408840

      g++-4.0 from Xcode 3.2 (i686-apple-darwin10-g++-4.0.1, Apple Inc. build 5494):
      -O0: 794818, 694213, 715387
      -O1: 549842, 559536, 595054
      -O2: 551610, 562176, 578522
      -O3: 373147, 563681, 582614
      -O2 -finline-functions: 379331, 562172, 582614

      If you look at the assembly and/or test a bunch of variations, it becomes pretty clear that the main problem with g++ 4.2 is that the optimizer isn’t good at inlining when exception handling is involved. Just moving the function body to a separate compilation unit results in the C-style code being slightly slower than the exception-based code, rather than faster. But there is a lot of C++ code that involves calls to tiny functions that are available for inlining, especially when templates are involved. So, in some cases, the old maxim of avoiding exceptions for performance may apply.

      However, if you’re using g++ 4.2, switching to g++ 4.6 would give you a 17% improvement with no code changes, vs. the 10% improvement you could get by rewriting all of your code to avoid exceptions. Unless there’s some reason you can’t switch, you’re doing a few orders of magnitude more work for half the short-term benefit. And when you do eventually switch, your code will be slower than if you hadn’t changed it.

      At any rate, it’s hard to imagine too many cases where the performance benefit in either direction will be worth the effort. Just write your code whichever way it’s more readable and maintainable. The one time in your life when the profiler tells you that squeezing a tiny bit out of this one function call is your best optimization advantage, rewrite that little bit as appropriate for your specific target environment, without changing all the rest of your code to match it.

      • Andy - 09/03/2011 Reply

        Let’s try again:
        #include <iostream>
        #include <stdexcept>
        #include <ctime>
        #include <cstdlib>

        using namespace std;

        struct ScopeTimer {
        const char *name_;
        clock_t start_;
        ScopeTimer(const char *name) : name_(name), start_(clock()) {}
        ~ScopeTimer() {
        clock_t d = clock() - start_;
        cout << name_ << ": " << d << "\n";
        }
        };

        bool div_c(int x, int y, int &result) {
        if (y == 0) return false;
        result = x/y;
        return true;
        }

        int div_cpp(int x, int y) {
        if (y == 0) throw logic_error("division by zero");
        return x/y;
        }

        void driver(int reps) {
        {
        int result;
        ScopeTimer t("c ");
        for (int i = 0; i != reps; ++i) {
        volatile int x=1, y=2;
        y = div_c(x, y, result) ? result : 0;
        }
        }
        {
        ScopeTimer t("c++ 1");
        for (int i = 0; i != reps; ++i) {
        volatile int x=1, y=2;
        try {
        y = div_cpp(x, y);
        } catch (const logic_error&) {
        y = 0;
        }
        }
        }
        {
        ScopeTimer t("c++ 2");
        try {
        for (int i = 0; i != reps; ++i) {
        volatile int x=1, y=2;
        y = div_cpp(x, y);
        }
        } catch (const logic_error&) {
        }
        }
        }

        int main(int argc, char *argv[]) {
        int reps = (argc > 1) ? atoi(argv[1]) : 100000000;
        driver(reps);
        return 0;
        }

        • Vlad Lazarenko - 10/03/2011 Reply

          There are two things that mess up the whole party in this example – you cannot measure performance using “clock”, it is an approximation. High-performance clock should be used instead (platform specific). Second thing is throw inside dev_cpp, which prevents compiler from inlining it, while C version is just inlined. You also have to run this test multiple times to collect proper stats, as scheduler, other programs etc are interfering. So I’d try something like this (sorry my wordpress somehow messes up some parts of code containing “< " or ">“):

          1
          2
          3
          4
          5
          6
          7
          8
          9
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          30
          31
          32
          33
          34
          35
          36
          37
          38
          39
          40
          41
          42
          43
          44
          45
          46
          47
          48
          49
          50
          51
          52
          53
          54
          55
          56
          57
          58
          59
          60
          61
          62
          63
          64
          65
          66
          67
          68
          69
          70
          71
          72
          73
          74
          75
          76
          77
          78
          79
          80
          81
          82
          83
          84
          85
          86
          87
          88
          89
          90
          
          // $ g++ -O3 -mtune=native -o test test.cpp
          // $ ./test 
          // c : 1044693788
          // c++ 1: 1026090801
          // c++ 2: 1025955774
           
          #include <iostream>
          #include <stdexcept>
          #include <cstdlib>
          #include <stdint .h> // cstdint...?
           
          using namespace std;
           
          inline uint64_t rdtsc ()
          {
              uint32_t lo, hi;
              __asm__ __volatile__ (
              "xorl %%eax,%%eax \n cpuid"
              ::: "%rax", "%rbx", "%rcx", "%rdx");
              __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
              return (uint64_t)hi < < 32 | lo;
          }
           
          struct ScopeTimer {
              const char *name_;
              uint64_t start_;
              ScopeTimer(const char *name) : name_(name), start_(rdtsc ()) {}
              ~ScopeTimer() {
                  uint64_t d = (rdtsc () - start_);
                  cout << name_ << ": " << d << "\n";
              }
          };
           
          inline bool div_c(int x, int y, int &result) {
              if (y == 0) return false;
              result = x/y;
              return true;
          }
           
          void div_by_zero ();
           
          inline int div_cpp(int x, int y) {
              if (y == 0) div_by_zero ();
              return x/y;
          }
           
          void driver(int reps) {
              {
                  int result;
                  ScopeTimer t("c ");
                  for (int i = 0; i != reps; ++i) {
                      volatile int x=1, y=2;
                      y = div_c(x, y, result) ? result : 0;
                  }
              }
              {
                  ScopeTimer t("c++ 1");
                  for (int i = 0; i != reps; ++i) {
                      volatile int x=1, y=2;
                      try {
                          y = div_cpp(x, y);
                      } catch (const logic_error&) {
                          y = 0;
                      }
                  }
              }
              {
                  ScopeTimer t("c++ 2");
                  try {
                      for (int i = 0; i != reps; ++i) {
                          volatile int x=1, y=2;
                          y = div_cpp(x, y);
                      }
                  } catch (const logic_error&) {
                  }
              }
          }
           
          void
          div_by_zero ()
          {
              throw logic_error("division by zero");
          }
           
          int main(int argc, char *argv[]) {
              int reps = (argc > 1) ? atoi(argv[1]) : 100000000;
              driver(reps);
              return 0;
          }
          </stdint></cstdlib></stdexcept></iostream>
          • Vlad Lazarenko - 10/03/2011 Reply

            BTW, it seems like non-RT kernel swaps out the process a lot so this test have to be ran with higher priority. Then numbers are reasonable, like 1044 clocks.

            • Xannon - 01/09/2011 Reply

              It’s rlelay great that people are sharing this information.

          • Andy - 09/02/2012 Reply

            I’m not sure you actually read the post you were replying to. For example, you say “You also have to run this test multiple times to collect proper stats”, but my original post specifically explained running it multiple times. Also, you bring up “throw inside dev_cpp, which prevents compiler from inlining it”, which was described in detail in the original post (along with the fact that newer versions of gcc actually _can_ inline it).

            Meanwhile: “you cannot measure performance using “clock”, it is an approximation”. Yes, all timers are approximations. One problem with clock is its low precision; many tests only run for ~100 ticks, which makes them pretty useless. But the example posted above ran for ~500K ticks, which takes care of that problem. The problem with rdtsc, on the other hand, is that it’s not guaranteed to mean anything—and the longer you run, the more chance you’ll get scheduled to a different processor that doesn’t sync its count with the one you started on, or your processor will SpeedStep up to a higher frequency).

            But despite all of this, both tests came up with the same results. At least with a modern compiler and a decent optimization level, even for trivial code where the overhead swamps the actual work, the difference in overhead between error checking and exceptions is less than 2%.

            So, I think that backs up the conclusion: “Just write your code whichever way it’s more readable and maintainable. The one time in your life when the profiler tells you that squeezing a tiny bit out of this one function call is your best optimization advantage, rewrite that little bit as appropriate for your specific target environment, without changing all the rest of your code to match it.”

  14. Zachary Turner - 09/03/2011 Reply

    Terrible examples here. Not only did you not show performance numbers, but

    a) The examples you used where you talk about calling the divide function multiple times are not even equivalent (one will execute the divide function twice 100% of the time, while the other only executes it once sometimes).

    b) You failed to consider the even more common use case of exceptions being able to propagate many levels up a call stack.

    c) You failed to look at situations where you are checking for many different types of exceptions and handling them differently

    d) You failed to take into consideration that sometimes there are valid reasons to ignore return codes.

    I’m sure there are plenty of others.

    • Dave Abrahams - 11/03/2011 Reply

      Zachary,

      While I agree with you that this analysis does not present the whole picture, it’s an exaggeration to call these examples “terrible.” Even if they only address a part of the morass of misunderstanding and misconception surrounding exception handling, it’s an important part that hasn’t gotten much attention. There are many people who will argue against the use of exception-handling because they can’t even begin to imagine how the generated code might be an improvement over something done manually. No, it’s not a speed test, but many people won’t believe speed numbers until they can comprehend how they are possible.

  15. Sarcasm - 09/03/2011 Reply

    Many code in C return a 0 on success and “non 0″ on error, -1 is often used.

  16. error codes - 22/07/2011 Reply

    looks very confusing to me but i’ll try to figure this out. sounds also very interesting.

5 Trackbacks

Leave a Reply

*

You can add images to your comment by clicking here.

© Copyright 2010-2011 Vlad Lazarenko. All rights reserved.