Vlad Lazarenko

... making all this up as I go along

Why C++ Member Function Pointers Are 16 Bytes Wide

When talking about pointers, we generally assume it is something that can be represented by void* pointer which has a size of 8 bytes on the x86_64 architecture. For instance, here is an excerpt from a Wikipedia article about x86_64:

Pushes and pops on the stack are always in 8-byte strides, and pointers are 8 bytes wide.

From the CPU point of view, pointer is nothing but a memory address, and all memory addresses are represented by 64-bit on the x86_64, so the assumption about 8 bytes is correct. It is also not that hard to verify this by simply printing sizes of pointers of different types:

1
2
3
4
5
6
7
8
#include <iostream>

int main() {
    std::cout <<
        "sizeof(int*)      == " << sizeof(int*) << "\n"
        "sizeof(double*)   == " << sizeof(double*) << "\n"
        "sizeof(void(*)()) == " << sizeof(void(*)()) << std::endl;
}

Compile and run the above program, and it will tell that all pointers are of size 8:

1
2
3
4
5
6
7
$ uname -i
x86_64
$ g++ -Wall ./example.cc
$ ./a.out
sizeof(int*)      == 8
sizeof(double*)   == 8
sizeof(void(*)()) == 8

In C++, however, there is one exception — pointer to a member function. Interestingly enough, the size of a pointer to a member function is twice the size of any other pointer. This can be easily verified with this simple program which would print “16”:

1
2
3
4
5
6
7
8
9
#include <iostream>

struct Foo {
    void bar() const { }
};

int main() {
    std::cout << sizeof(&Foo::bar) << std::endl;
}

Does this mean that Wikipedia is wrong? No, not at all. From the hardware point of view all pointers are still 8 bytes wide. So what is a pointer to a member function then? It is a feature of the C++ language, a concept that does not map directly to hardware and is being implemented in runtime (by compilers) resulting in a slight overhead that oftentimes incurs a performance penalty. The C++ language specification is not concerned with implementation details very much and there is nothing explaining this type of pointers. Luckily, there is Itanium C++ ABI specification that devotes itself to standardizing implementation details of the C++ runtime — it explains, for example, how virtual tables, RTTI and exceptions are implemented, and it also explains member pointers in §2.3:

A pointer to member function is a pair as follows:

ptr:

For a non-virtual function, this field is a simple function pointer. For a virtual function, it is 1 plus the virtual table offset (in bytes) of the function, represented as a ptrdiff_t. The value zero represents a NULL pointer, independent of the adjustment field value below.

adj:

The required adjustment to this, represented as a ptrdiff_t.

So a member function pointer is 16 bytes instead of 8 because along with a simple function pointer it also must store the information about how to adjust “this” pointer (something that is always passed to non-static member functions implicitly). What ABI spec does not say is why and when such an adjustment is required. It might not be that obvious at first. Let’s take a look at the following class hierarchy:

1
2
3
4
5
6
7
8
9
10
11
12
struct A {
    void foo() const { }
    char pad0[32];
};

struct B {
    void bar() const { }
    char pad2[64];
};

struct C : A, B
{ };

Both A and B have a non-static member functions and a data member. Both of those methods can access data member of their class through an implicitly passed “this” pointer. In order to access any data member, the offset from the base address of the class object containing it, represented as a ptrdiff_t, is applied to “this” pointer. Things start to get complicated with multiple inheritance. What happens when we have a class C inherit both A and B? The compiler would take A and B and place them together in memory, where B comes after A. Therefore, methods of class A and methods of class B would “see” a different value of “this” pointer. This can be easily verified in practice, for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>

struct A {
    void foo() const {
        std::cout << "A's this: " << this << std::endl;
    }
    char pad0[32];
};

struct B {
    void bar() const {
        std::cout << "B's this: " << this << std::endl;
    }
    char pad2[64];
};

struct C : A, B
{ };

int main()
{
    C obj;
    obj.foo();
    obj.bar();
}
1
2
3
$ g++ -Wall -o test ./test.cc && ./test
A's this: 0x7fff57ddfb48
B's this: 0x7fff57ddfb68

As you can see, “this” pointer’s value passed to B’s method is greater than one passed to A’s method by 32 bytes – an exact size of class A. But what happens when we have the following function that calls a method of class C by pointer?

1
2
3
void call_by_ptr(const C &obj, void (C::*mem_func)() const) {
    (obj.*mem_func)();
}

Depending on what method is being called, a different value of “this” must be passed. But the “call_by_ptr” function doesn’t know whether it got a pointer to “foo()” or a pointer to “bar()”. The only time when this information is available is when address of either of those methods is taken. And that’s why a pointer to a member function also has information about how to adjust “this” before calling a method. Now, let’s put all of that together into a simple program that demonstrates what is going on “under the hood”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <iostream>

struct A {
    void foo() const {
        std::cout << "A's this:\t" << this << std::endl;
    }
    char pad0[32];
};

struct B {
    void bar() const {
        std::cout << "B's this:\t" << this << std::endl;
    }
    char pad2[64];
};

struct C : A, B
{ };

void call_by_ptr(const C &obj, void (C::*mem_func)() const)
{
    void *data[2];
    std::memcpy(data, &mem_func, sizeof(mem_func));
    std::cout << "------------------------------\n"
        "Object ptr:\t" << &obj <<
        "\nFunction ptr:\t" << data[0] <<
        "\nPointer adj:\t" << data[1] << std::endl;
    (obj.*mem_func)();
}

int main()
{
    C obj;
    call_by_ptr(obj, &C::foo);
    call_by_ptr(obj, &C::bar);
}

The above program prints the following:

1
2
3
4
5
6
7
8
9
10
------------------------------
Object ptr: 0x7fff535dfb28
Function ptr:   0x10c620cac
Pointer adj:    0
A's this:    0x7fff535dfb28
------------------------------
Object ptr: 0x7fff535dfb28
Function ptr:   0x10c620cfe
Pointer adj:    0x20
B's this:    0x7fff535dfb48

Hopefully that clears things up a little bit.