Contextual Memory Tracing

The Need

A long, long time ago I wrote about memory debugging. I redefined new as a macro, took advantage of placement new, and replaced the global operator new. However, only the replacement of the global operator new was useful in catching memory leaks in the end, while the other facilities became more or less futile; for memory allocation was becoming more and more implicit with the spread use of STL containers and smart pointers, and direct use of new was discouraged. It is even more so today, with many coding conventions basically banning the use of new and delete. So I would like to revisit this topic.

First, there is still need to trace memory usage, even though memory leakage, in the way of unmatched new, is unlikely today. People still need to know how memory is used, by which parts of the program, in which functions and modules, and so on. The exact point of memory allocation is becoming less relevant, as memory allocation is becoming less direct. It probably occurs in some libraries, instead of in application code. So now tracing memory usage means recording the usage context, instead of the exact code position.

Be Contextual

Usage contexts can be set up in a stack-like data structure, and I have done so several times in the past. What needs to be recorded in the context is something one needs to decide beforehand. If you only want to trace memory usage, you can do as I do below. But you may want to fit the interface with your specific memory manager, adding what needs to be passed to it. Anyway, you decide what should be in. My example code is as follows:

struct context {
    const char* file;
    const char* func;
};

We want to record the context automatically, and RAII can be used for this purpose:

class checkpoint {
public:
    explicit checkpoint(const context& ctx);
    ~checkpoint();

private:
    const context ctx_;
};

#define CTX_MEMORY_CHECKPOINT()       \
    checkpoint memory_checkpoint{     \
        context{__FILE__, __func__}}

thread_local std::deque<context>
    context_stack{
        context{"<UNKNOWN>", "<UNKNOWN>"}};

void save_context(const context& ctx)
{
    context_stack.push_back(ctx);
}

void restore_context(const context& ctx)
{
    assert(!context_stack.empty() &&
           context_stack.back() == ctx);
    context_stack.pop_back();
}

const context& get_current_context()
{
    assert(!context_stack.empty());
    return context_stack.back();
}

checkpoint::checkpoint(const context& ctx) : ctx_(ctx)
{
    save_context(ctx);
}

checkpoint::~checkpoint()
{
    restore_context(ctx_);
}

Please notice that the context_stack needs to be thread_local, which was something not standardized when I last wrote about tracing memory usage. It is very convenient to save the information on a per-stack basis.

Fitting with Real Memory Managers

Before we go define the operator new and operator delete functions (normally called ‘allocation’ and ‘deallocation’ functions, and there are a lot of them), let us define first the generic/sample functions that do the real allocation and deallocation. We just pass on the necessary arguments to the system memory manager here (but you may want to do more to make it work with an existing memory manager) and we still use the C convention that a memory allocation failure is indicated by a null pointer:

void* ctx_alloc(size_t size, size_t alignment,
                const context* /*unused*/)
{
#ifdef _WIN32
    return _aligned_malloc(size, alignment);
#elif defined(__unix) || defined(__unix__)
    void* memptr{};
    int result = posix_memalign(&memptr, alignment, size);
    if (result == 0) {
        return memptr;
    } else {
        return nullptr;
    }
#else
    // No alignment guarantees on other platforms
    (void)alignment;
    return malloc(size);
#endif
}

void ctx_free(void* ptr, const context* /*unused*/)
{
#ifdef _WIN32
    _aligned_free(ptr);
#else
    free(ptr);
#endif
}

operator new & operator delete

Now we can go to the allocation and deallocation functions. The declarations with our new context parameter are the following:

void* operator new  (size_t size,
                     const context& ctx);
void* operator new[](size_t size,
                     const context& ctx);
void* operator new  (size_t size,
                     std::align_val_t align_val,
                     const context& ctx);
void* operator new[](size_t size,
                     std::align_val_t align_val,
                     const context& ctx);

void operator delete  (void* ptr,
                       const context&) noexcept;
void operator delete[](void* ptr,
                       const context&) noexcept;
void operator delete  (void* ptr,
                       std::align_val_t align_val,
                       const context&) noexcept;
void operator delete[](void* ptr,
                       std::align_val_t align_val,
                       const context&) noexcept;

But these are not all the function we need to rewrite. We need to replace the non-contextual version too, and they are actually key to the memory tracing functionality. Our saved context can be used, like the following:

void* operator new(size_t size)
{
    return operator new(size, get_current_context());
}

void* operator new[](size_t size)
{
    return operator new[](size, get_current_context());
}

Assuming the existence of an alloc_mem and a free_mem function, we can make the rest of the allocation and deallocation functions basically forwarders:

void* operator new(size_t size,
                   const std::nothrow_t&) noexcept
{
    return alloc_mem(size, get_current_context(),
                     alloc_is_not_array);
}

void* operator new[](size_t size,
                     const std::nothrow_t&) noexcept
{
    return alloc_mem(size, get_current_context(),
                     alloc_is_array);
}

void* operator new(size_t size,
                   std::align_val_t align_val)
{
    return operator new(size, align_val,
                        get_current_context());
}

void* operator new[](size_t size,
                     std::align_val_t align_val)
{
    return operator new[](size, align_val,
                          get_current_context());
}

void* operator new(size_t size,
                   std::align_val_t align_val,
                   const std::nothrow_t&) noexcept
{
    return alloc_mem(size, get_current_context(),
                     alloc_is_not_array,
                     size_t(align_val));
}

void* operator new[](size_t size,
                     std::align_val_t align_val,
                     const std::nothrow_t&) noexcept
{
    return alloc_mem(size, get_current_context(),
                     alloc_is_array,
                     size_t(align_val));
}

void* operator new(size_t size, const context& ctx)
{
    void* ptr = alloc_mem(size, ctx, alloc_is_not_array);
    if (ptr == nullptr) {
        throw std::bad_alloc();
    }
    return ptr;
}

void* operator new[](size_t size, const context& ctx)
{
    void* ptr = alloc_mem(size, ctx, alloc_is_array);
    if (ptr == nullptr) {
        throw std::bad_alloc();
    }
    return ptr;
}

void* operator new(size_t size,
                   std::align_val_t align_val,
                   const context& ctx)
{
    void* ptr = alloc_mem(size, ctx, alloc_is_not_array,
                          size_t(align_val));
    if (ptr == nullptr) {
        throw std::bad_alloc();
    }
    return ptr;
}

void* operator new[](size_t size,
                     std::align_val_t align_val,
                     const context& ctx)
{
    void* ptr = alloc_mem(size, ctx, alloc_is_array,
                          size_t(align_val));
    if (ptr == nullptr) {
        throw std::bad_alloc();
    }
    return ptr;
}

void operator delete(void* ptr) noexcept
{
    free_mem(ptr, alloc_is_not_array);
}

void operator delete[](void* ptr) noexcept
{
    free_mem(ptr, alloc_is_array);
}

void operator delete(void* ptr, size_t) noexcept
{
    free_mem(ptr, alloc_is_not_array);
}

void operator delete[](void* ptr, size_t) noexcept
{
    free_mem(ptr, alloc_is_array);
}

void operator delete(
    void* ptr, std::align_val_t align_val) noexcept
{
    free_mem(ptr, alloc_is_not_array,
                 size_t(align_val));
}

void operator delete[](
    void* ptr, std::align_val_t align_val) noexcept
{
    free_mem(ptr, alloc_is_array,
                 size_t(align_val));
}

void operator delete(void* ptr,
                     const context&) noexcept
{
    operator delete(ptr);
}

void operator delete[](void* ptr,
                       const context&) noexcept
{
    operator delete[](ptr);
}

void operator delete(void* ptr,
                     std::align_val_t align_val,
                     const context&) noexcept
{
    operator delete(ptr, align_val);
}

void operator delete[](void* ptr,
                       std::align_val_t align_val,
                       const context&) noexcept
{
    operator delete[](ptr, align_val);
}

Contexts and Allocation/Deallocation

Now let us focus on the two functions that do the real job:

enum is_array_t : uint32_t {
    alloc_is_not_array,
    alloc_is_array
};

void* alloc_mem(size_t size,
                const context& ctx,
                is_array_t is_array,
                size_t alignment);

void free_mem(void* ptr,
              is_array_t is_array,
              size_t alignment);

Considering this interface, the context information can only be stored immediately before the memory returned to the user. In order to trace leaked memory, we need to link the allocated memory into a linked list, and the control block is as follows:

struct new_ptr_list_t {
    new_ptr_list_t* next;
    new_ptr_list_t* prev;
    size_t          size;
    context         ctx;
    uint32_t        head_size : 31;
    uint32_t        is_array : 1;
    uint32_t        magic;
};

The first four fields should be very clear in meaning. head_size probably requires some explanation. While the struct is fixed in size, alignments can be different across allocations, resulting in different offsets from the struct pointer to the memory pointer the user gets. So this fields records the aligned struct size. is_array records whether the allocation is done by an operator new[]; we use this piece of information to detect the new[]/delete or new/delete[] mismatch, as well as allowing for special offsets required by array allocations. magic is used to mark that the memory is allocated by this implementation so that when freeing the memory we can detect corrupt memory, double freeing, and suchlike.

We also need the list head of control blocks, a mutex to protect its access, a function to align data size, and the magic number constant:

new_ptr_list_t new_ptr_list = {
    &new_ptr_list, &new_ptr_list, 0, {},
    alloc_is_not_array, 0, CTX_MAGIC};

std::mutex new_ptr_lock;

constexpr uint32_t align(size_t s, size_t alignment)
{
    return static_cast<uint32_t>((s + alignment - 1) &
                                 ~(alignment - 1));
}

constexpr uint32_t CTX_MAGIC = 0x4D585443; // "CTXM"

alloc_mem is then quite straightforward:

void* alloc_mem(size_t size, const context& ctx,
                is_array_t is_array,
                size_t alignment =
                    __STDCPP_DEFAULT_NEW_ALIGNMENT__)
{
    assert(alignment >=
           __STDCPP_DEFAULT_NEW_ALIGNMENT__);

    uint32_t aligned_list_item_size =
        align(sizeof(new_ptr_list_t), alignment);
    size_t s = size + aligned_list_item_size;
    auto ptr = static_cast<new_ptr_list_t*>(
        ctx_alloc(s, alignment, ctx));
    if (ptr == nullptr) {
        return nullptr;
    }
    auto usr_ptr = reinterpret_cast<char*>(ptr) +
                   aligned_list_item_size;
    ptr->ctx = ctx;
    ptr->is_array = is_array;
    ptr->size = size;
    ptr->head_size = aligned_list_item_size;
    ptr->magic = CTX_MAGIC;
    {
        std::lock_guard guard{new_ptr_lock};
        ptr->prev = new_ptr_list.prev;
        ptr->next = &new_ptr_list;
        new_ptr_list.prev->next = ptr;
        new_ptr_list.prev = ptr;
    }
    return usr_ptr;
}

I.e. it does the following things:

  1. Allocates memory enough to satisfy the user requirement and the additional metadata (new_ptr_list_t)
  2. Fills in the metadata
  3. Chains the allocated memory blocks into a list
  4. Returns the pointer after the metadata

free_mem does the opposite thing. Apparently, we need a function to convert the user pointer back to the originally allocated pointer, which is not really trivial, considering the potential cases of bad pointer and unmatched use of array and non-array versions of new and delete. It is the convert_user_ptr function:

new_ptr_list_t* convert_user_ptr(void* usr_ptr,
                                 size_t alignment)
{
    auto offset = static_cast<char*>(usr_ptr) -
                  static_cast<char*>(nullptr);
    auto adjusted_ptr = static_cast<char*>(usr_ptr);
    bool is_adjusted = false;

    // Check alignment first
    if (offset % alignment != 0) {
        offset -= sizeof(size_t);
        if (offset % alignment != 0) {
            return nullptr;
        }
        // Likely caused by new[] followed by delete, if
        // we arrive here
        adjusted_ptr = static_cast<char*>(usr_ptr) -
                       sizeof(size_t);
        is_adjusted = true;
    }
    auto ptr = reinterpret_cast<new_ptr_list_t*>(
        adjusted_ptr -
        align(sizeof(new_ptr_list_t), alignment));
    if (ptr->magic == CTX_MAGIC &&
        (!is_adjusted || ptr->is_array)) {
        return ptr;
    }

    if (!is_adjusted && alignment > sizeof(size_t)) {
        // Again, likely caused by new[] followed by
        // delete, as aligned new[] allocates alignment
        // extra space for the array size.
        ptr = reinterpret_cast<new_ptr_list_t*>(
            reinterpret_cast<char*>(ptr) - alignment);
        is_adjusted = true;
    }
    if (ptr->magic == CTX_MAGIC &&
        (!is_adjusted || ptr->is_array)) {
        return ptr;
    }

    return nullptr;
}

With this, free_mem then becomes easy:

void free_mem(void* usr_ptr, is_array_t is_array,
              size_t alignment =
                  __STDCPP_DEFAULT_NEW_ALIGNMENT__)
{
    assert(alignment >=
           __STDCPP_DEFAULT_NEW_ALIGNMENT__);
    if (usr_ptr == nullptr) {
        return;
    }

    auto ptr = convert_user_ptr(usr_ptr, alignment);
    if (ptr == nullptr) {
        fprintf(stderr,
                "delete%s: invalid pointer %p\n",
                is_array ? "[]" : "", usr_ptr);
        abort();
    }
    if (is_array != ptr->is_array) {
        const char* msg = is_array
                              ? "delete[] after new"
                              : "delete after new[]";
        fprintf(stderr,
                "%s: pointer %p (size %zu)\n",
                msg, usr_ptr, ptr->size);
        abort();
    }
    {
        std::lock_guard guard{new_ptr_lock};
        ptr->magic = 0;
        ptr->prev->next = ptr->next;
        ptr->next->prev = ptr->prev;
    }
    ctx_free(ptr, &(ptr->ctx));
}

I.e.:

  1. It invokes convert_user_ptr to convert the user-provided pointer to a new_ptr_list_t*.
  2. It checks whether array-ness matches in the memory allocation and deallocation.
  3. It unlinks the memory block from the linked list.
  4. If anything bad happens, it prints a message and aborts the whole program (as the program already has undefined behaviour).

One More Thing

It is now nearly complete: we have set up the mechanisms to record memory contexts in memory allocation and deallocation functions. However, I have omitted one important detail so far. If you used my code verbatim as above, the program would crash on first memory allocation. When the global allocation and deallocation functions are replaced, care must be taken when we need additional memory inside those functions. If we somehow use the generic C++ memory allocation mechanisms, it will invoke operator new in the end, causing an infinite recursion. It is still OK to use malloc/free, so we need to use a malloc_allocator for the context stack:

template <typename T>
struct malloc_allocator {
    typedef T value_type;

    typedef std::true_type is_always_equal;
    typedef std::true_type
        propagate_on_container_move_assignment;

    malloc_allocator() = default;
    template <typename U>
    malloc_allocator(const malloc_allocator<U>&) {}

    template <typename U>
    struct rebind {
        typedef malloc_allocator<U> other;
    };

    T* allocate(size_t n)
    {
        return static_cast<T*>(malloc(n * sizeof(T)));
    }
    void deallocate(T* p, size_t)
    {
        free(p);
    }
};

thread_local std::deque<context,
                        malloc_allocator<context>>
    context_stack{context{"<UNKNOWN>", "<UNKNOWN>"}};

Everything Put Together

You can find the real code with more details and a working memory leak checker in this repository:

https://github.com/adah1972/nvwa/tree/master/nvwa

You need to add the root directory of Nvwa to your include path, and nvwa/memory_trace.cpp and nvwa/aligned_memory.cpp to your project. In order to add a new memory checkpoint, use the macro NVWA_MEMORY_CHECKPOINT (Nvwa macros are usually prefixed with ‘NVWA_’). A very short test program follows:

#include <nvwa/memory_trace.h>

int main()
{
    char* ptr1 = new char[20];
    NVWA_MEMORY_CHECKPOINT();
    char* ptr2 = new char[42];
}

The output would be like the following:

Leaked object at 0x57697e30 (size 20, context: <UNKNOWN>/<UNKNOWN>)
Leaked object at 0x57697e70 (size 42, context: test.cpp/int main())
*** 2 leaks found

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s