Vlad Lazarenko

... making all this up as I go along

How Constant Is a Constant?

1
2
3
4
5
6
7
8
9
10
11
#include <string.h>
#include <stdio.h>

int main()
{
    char *data = "Bender is always sober.";
    printf("Before: %s\n", data);
    memcpy(data + 17, "drunk!", 6);
    printf("After: %s\n", data);
    return 0;
}

Do you remember those good old days of DOS when programmers used to write code like this? This trick was working like a champ back then. Today, however, you will only see this kind of code in questions asked by students burning the midnight oil learning C by examples from some really old books and perhaps in some embedded systems running on processors without a memory protection unit. The rest of code monkeys like you and me do not write this kind of code because the compilers generate a warning, the language standard says it invokes an undefined behavior and at the end of the day this is a pretty straightforward way to generate a segmentation fault and have our program bite on a SIGSEGV signal sent by the operating system’s kernel and take a solid core dump.

Why this doesn’t work now and how it used to work before? It is really simple — string literals like one used in the example are placed in a data segment of the program, the operating system then loads the program into memory and makes that memory write-protected by the means of MMU. DOS was not doing this, and so it worked fine in DOS times. It also works today if operating system doesn’t take care of this and write-protect the memory, or if the CPU has no MMU.

This all sounds nice and dandy but it raises one good question — are any constant objects actually constant? Ask any HLL programmer if it is possible to modify a constant string literal on a modern Intel architecture. The answer would likely be «No!» At least my colleagues said so, and all of them are brilliant developers with decades of experience. But every time I hear — «No, that’s just not possible», I take it as a challenge and cannot rest until I prove that it is. Nothing is impossible. So trust no one. The truth is that in fact it is possible. Check this out (do not try to repeat this at work):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <sys/mman.h>
#include <unistd.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>

int take_me_back_to_DOS_times(const void *ptr, size_t len);

int main()
{
    const char *data = "Bender is always sober.";
    printf("Before: %s\n", data);
    if (take_me_back_to_DOS_times(data, strlen(data)) != 0)
        perror("Time machine appears to be broken!");
    memcpy((char *)data + 17, "drunk!", 6);
    printf("After: %s\n", data);

    return 0;
}

int take_me_back_to_DOS_times(const void *ptr, size_t len)
{
    int pagesize;
    unsigned long long pg_off;
    void *page;

    pagesize = sysconf(_SC_PAGE_SIZE);
    if (pagesize < 0)
        return -1;
    pg_off = (unsigned long long)ptr % (unsigned long long)pagesize;
    page = ((char *)ptr - pg_off);
    if (mprotect(page, len + pg_off, PROT_READ | PROT_WRITE | PROT_EXEC) == -1)
        return -1;
    return 0;
}

Hopefully, the code is self-explanatory. But if in doubt — read a manual page about mprotect system call. I have touched on memory protection in the beginning and how operating system is using it to make string literals constant. The above example uses the reverse approach and makes the memory writable again (just like a time machine taking the program to old good DOS times). The thing is, our commodity computers work with only two kinds of memory — SRAM memory of CPU caches, which is fast and expensive, and a DRAM, which is slow but very cheap. Neither of those two kinds of memory is read-only. Some of the processors have neither memory protection nor memory management units, and it won’t even be possible to write-protect the memory to make it “look” constant (which can also be unprotected as well, as we have just witnessed). Therefore, there are no constants from the hardware point of view.

So what’s up with all those constant objects in programming languages? Theoretical computer science has a concept of const-correctness, which is incorporated into many different languages. For instance, Java takes a lot of care to not allow programmers to modify constant objects. So do C and C++, for example. But the keyword here is “theoretically”.

In theory, there is no difference between theory and practice. But, in practice, there is.

Jan L. A. van de Snepscheut

Unlike higher-level languages like Java, both C and C++ are down to hardware and it doesn’t take a genius to simply get a memory address of some object and manipulate memory directly. And when you do that, there is nothing constant. This is not to mention that there are some legitimate cases when casting away the const is fine and well defined.

Don’t get me wrong. I am not saying it doesn’t make any sense or you should not use it. You definitely should. And if you violate the const-correctness rules then you are definitely taking a lot of risk. Not because it won’t work but because it may work differently from how you expect it to work. If that happens, you are the only one to blame because the standard simply say — «I told you, that’s undefined behavior!» But when it comes to theoretical computer science and const-correctness, there is one thing that bothers me — it is implemented only halfway. Consider standard C++ strings, for example. Let’s say I have a string that is declared as constant. According to C++ rules, I cannot cast away that const-ness. However, we all know that std::string stores its contents in dynamic memory, and therefore the string itself is originally non-constant. Now, everybody can interpret the laws and standards differently, but I find this code pretty legitimate and do not see how it is an undefined behavior:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <cstring>
#include <string>
#include <iostream>

static void print(const std::string &str)
{
    // OK, the ”str” is constant, but the pointer to
    // string it holds was never declared as constant,
    // so we can cast it away and modify its contents
    // using ”const_cast”:
    char *p = const_cast<char *>(str.c_str());
    memcpy(p + 17, "drunk!", 6);
    std::cout << str << std::endl;
}

int main()
{
    const std::string str = "Bender is always sober.";
    print(str);
    return 0;
}

I also don’t like the idea of having somebody else modify my constant object using const_cast when I don’t want to, but C++ still says it is legal (because the object wasn’t originally declared as “const”):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <string>
#include <iostream>

static void some_bad_function_yet_legal(const std::string &str)
{
    // OK, the ”str” is constant, but it wasn't declared
    // as such in the first place, so we can use
    // “const_cast” to change it:
    std::string &s = const_cast<std::string &>(str);
    s.resize(6);
}

int main()
{
    std::string str = "Bender is always sober.";

    // Call a function that accepts a constant reference,
    // who would think it modifies the string, right?
    some_bad_function_yet_legal(str);

    std::cout << str << std::endl;
    return 0;
}

Sometimes it is legal, sometimes it is pronounced undefined behavior. Either way you end up with a broken program unless somebody took care and employed the MMU to protect your memory. It becomes even worse if it isn’t your code and you have to debug and chase the error, which could be pretty hard to do (luckily, we can have GDB break on memory access). As an experiment, I wrote a little custom allocator that can protect and unprotect the memory. It makes it possible, for example, to make your string constant in a way that an attempt to modify its content would have the process receive SIGSEGV:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#include <sys/mman.h>
#include <unistd.h>
#include <cstddef>
#include <cstdlib>
#include <cstring>
#include <stdexcept>
#include <iostream>

template <typename T>
struct my_allocator {
    typedef std::size_t    size_type;
    typedef std::ptrdiff_t difference_type;
    typedef T*             pointer;
    typedef const T*       const_pointer;
    typedef T&             reference;
    typedef const T&       const_reference;
    typedef T              value_type;

    template <typename U>
    struct rebind { typedef my_allocator<U> other; };

    pointer allocate(size_type n, const_pointer hint = nullptr)
    {
        void *ptr;
        int pagesize = sysconf(_SC_PAGE_SIZE);
        if (pagesize < 0)
            throw std::runtime_error("Cannot obtain a page size");
        if (posix_memalign(&ptr, (std::size_t)pagesize, n * sizeof(T)) != 0)
            throw std::bad_alloc();
        return static_cast<pointer>(ptr);
    }

    void deallocate(pointer ptr, size_type s)
    {
        std::free(ptr);
    }

    static void protect(const_pointer ptr, size_type len)
    {
        int pagesize = sysconf(_SC_PAGE_SIZE);
        if (pagesize < 0)
            throw std::runtime_error("Cannot obtain a page size");
        std::uintptr_t pg_off = (std::uintptr_t)ptr % (std::uintptr_t)pagesize;
        void *page = ((char *)ptr - pg_off);
        if (mprotect(page, len + pg_off, PROT_READ) == -1)
            throw std::runtime_error("Can't make memory read-only!");
    }

    static void unprotect(const_pointer ptr, size_type len)
    {
        int pagesize = sysconf(_SC_PAGE_SIZE);
        if (pagesize < 0)
            throw std::runtime_error("Cannot obtain a page size");
        std::uintptr_t pg_off = (std::uintptr_t)ptr % (std::uintptr_t)pagesize;
        void *page = ((char *)ptr - pg_off);
        if (mprotect(page, len + pg_off, PROT_READ | PROT_WRITE) == -1)
            throw std::runtime_error("Can't make memory read-only!");
    }
};

template <typename T>
bool operator == (const my_allocator<T> &, const my_allocator<T> &) {
    return true;
}

template <typename T>
bool operator != (const my_allocator<T> &, const my_allocator<T> &) {
    return false;
}

typedef std::basic_string< char, std::char_traits<char>,
                           my_allocator<char> > my_string;

static void some_bad_function(const my_string &str)
{
    // OK, the ”str” is constant, but the pointer to
    // string it holds was never declared as constant,
    // so we can cast it away and modify its contents
    // using ”const_cast”:
    char *p = const_cast<char *>(str.c_str());
    std::memcpy(p + 17, "drunk!", 6);
}

int main()
{
    try {
        const my_string str = "Bender is always sober.";

        my_string::allocator_type::protect(str.c_str(), str.size());
        some_bad_function(str); // This call results in SIGSEGV thanks to memory protection!
        std::cout << str << std::endl;
        my_string::allocator_type::unprotect(str.c_str(), str.size());
    } catch(const std::exception &e) {
        std::cerr << "ERROR: " << e.what() << std::endl;
        return EXIT_FAILURE;
    }
}

This of course doesn’t make it impossible to unprotect that memory. To make it so, we’d need some more sophisticated access control that possibly employs protection rings, etc. But this is as far as I am willing to go. I wish we had something like this done automatically when we declare or pass variables around as “const”. But this is not going to happen because moaning with memory protection is very expensive operation that would, if implemented, slow down the program and make it pretty much unusable. Maybe one day we’d get a special hardware that would make it a reality. But for now, let’s keep shooting ourselves in the foot.