The Need
A long, long time ago I wrote about memory debugging. I redefined new
as a macro, took advantage of placement new, and replaced the global operator new
. However, only the replacement of the global operator new
was useful in catching memory leaks in the end, while the other facilities became more or less futile; for memory allocation was becoming more and more implicit with the spread use of STL containers and smart pointers, and direct use of new
was discouraged. It is even more so today, with many coding conventions basically banning the use of new
and delete
. So I would like to revisit this topic.
First, there is still need to trace memory usage, even though memory leakage, in the way of unmatched new
, is unlikely today. People still need to know how memory is used, by which parts of the program, in which functions and modules, and so on. The exact point of memory allocation is becoming less relevant, as memory allocation is becoming less direct. It probably occurs in some libraries, instead of in application code. So now tracing memory usage means recording the usage context, instead of the exact code position.
Be Contextual
Usage contexts can be set up in a stack-like data structure, and I have done so several times in the past. What needs to be recorded in the context is something one needs to decide beforehand. If you only want to trace memory usage, you can do as I do below. But you may want to fit the interface with your specific memory manager, adding what needs to be passed to it. Anyway, you decide what should be in. My example code is as follows:
struct context {
const char* file;
const char* func;
};
We want to record the context automatically, and RAII can be used for this purpose:
class checkpoint {
public:
explicit checkpoint(const context& ctx);
~checkpoint();
private:
const context ctx_;
};
#define CTX_MEMORY_CHECKPOINT() \
checkpoint memory_checkpoint{ \
context{__FILE__, __func__}}
thread_local std::deque<context>
context_stack{
context{"<UNKNOWN>", "<UNKNOWN>"}};
void save_context(const context& ctx)
{
context_stack.push_back(ctx);
}
void restore_context(const context& ctx)
{
assert(!context_stack.empty() &&
context_stack.back() == ctx);
context_stack.pop_back();
}
const context& get_current_context()
{
assert(!context_stack.empty());
return context_stack.back();
}
checkpoint::checkpoint(const context& ctx) : ctx_(ctx)
{
save_context(ctx);
}
checkpoint::~checkpoint()
{
restore_context(ctx_);
}
Please notice that the context_stack
needs to be thread_local, which was something not standardized when I last wrote about tracing memory usage. It is very convenient to save the information on a per-stack basis.
Fitting with Real Memory Managers
Before we go define the operator new
and operator delete
functions (normally called ‘allocation’ and ‘deallocation’ functions, and there are a lot of them), let us define first the generic/sample functions that do the real allocation and deallocation. We just pass on the necessary arguments to the system memory manager here (but you may want to do more to make it work with an existing memory manager) and we still use the C convention that a memory allocation failure is indicated by a null pointer:
void* ctx_alloc(size_t size, size_t alignment,
const context* /*unused*/)
{
#ifdef _WIN32
return _aligned_malloc(size, alignment);
#elif defined(__unix) || defined(__unix__)
void* memptr{};
int result = posix_memalign(&memptr, alignment, size);
if (result == 0) {
return memptr;
} else {
return nullptr;
}
#else
// No alignment guarantees on other platforms
(void)alignment;
return malloc(size);
#endif
}
void ctx_free(void* ptr, const context* /*unused*/)
{
#ifdef _WIN32
_aligned_free(ptr);
#else
free(ptr);
#endif
}
operator new
& operator delete
Now we can go to the allocation and deallocation functions. The declarations with our new context
parameter are the following:
void* operator new (size_t size,
const context& ctx);
void* operator new[](size_t size,
const context& ctx);
void* operator new (size_t size,
std::align_val_t align_val,
const context& ctx);
void* operator new[](size_t size,
std::align_val_t align_val,
const context& ctx);
void operator delete (void* ptr,
const context&) noexcept;
void operator delete[](void* ptr,
const context&) noexcept;
void operator delete (void* ptr,
std::align_val_t align_val,
const context&) noexcept;
void operator delete[](void* ptr,
std::align_val_t align_val,
const context&) noexcept;
But these are not all the function we need to rewrite. We need to replace the non-contextual version too, and they are actually key to the memory tracing functionality. Our saved context can be used, like the following:
void* operator new(size_t size)
{
return operator new(size, get_current_context());
}
void* operator new[](size_t size)
{
return operator new[](size, get_current_context());
}
Assuming the existence of an alloc_mem
and a free_mem
function, we can make the rest of the allocation and deallocation functions basically forwarders:
void* operator new(size_t size,
const std::nothrow_t&) noexcept
{
return alloc_mem(size, get_current_context(),
alloc_is_not_array);
}
void* operator new[](size_t size,
const std::nothrow_t&) noexcept
{
return alloc_mem(size, get_current_context(),
alloc_is_array);
}
void* operator new(size_t size,
std::align_val_t align_val)
{
return operator new(size, align_val,
get_current_context());
}
void* operator new[](size_t size,
std::align_val_t align_val)
{
return operator new[](size, align_val,
get_current_context());
}
void* operator new(size_t size,
std::align_val_t align_val,
const std::nothrow_t&) noexcept
{
return alloc_mem(size, get_current_context(),
alloc_is_not_array,
size_t(align_val));
}
void* operator new[](size_t size,
std::align_val_t align_val,
const std::nothrow_t&) noexcept
{
return alloc_mem(size, get_current_context(),
alloc_is_array,
size_t(align_val));
}
void* operator new(size_t size, const context& ctx)
{
void* ptr = alloc_mem(size, ctx, alloc_is_not_array);
if (ptr == nullptr) {
throw std::bad_alloc();
}
return ptr;
}
void* operator new[](size_t size, const context& ctx)
{
void* ptr = alloc_mem(size, ctx, alloc_is_array);
if (ptr == nullptr) {
throw std::bad_alloc();
}
return ptr;
}
void* operator new(size_t size,
std::align_val_t align_val,
const context& ctx)
{
void* ptr = alloc_mem(size, ctx, alloc_is_not_array,
size_t(align_val));
if (ptr == nullptr) {
throw std::bad_alloc();
}
return ptr;
}
void* operator new[](size_t size,
std::align_val_t align_val,
const context& ctx)
{
void* ptr = alloc_mem(size, ctx, alloc_is_array,
size_t(align_val));
if (ptr == nullptr) {
throw std::bad_alloc();
}
return ptr;
}
void operator delete(void* ptr) noexcept
{
free_mem(ptr, alloc_is_not_array);
}
void operator delete[](void* ptr) noexcept
{
free_mem(ptr, alloc_is_array);
}
void operator delete(void* ptr, size_t) noexcept
{
free_mem(ptr, alloc_is_not_array);
}
void operator delete[](void* ptr, size_t) noexcept
{
free_mem(ptr, alloc_is_array);
}
void operator delete(
void* ptr, std::align_val_t align_val) noexcept
{
free_mem(ptr, alloc_is_not_array,
size_t(align_val));
}
void operator delete[](
void* ptr, std::align_val_t align_val) noexcept
{
free_mem(ptr, alloc_is_array,
size_t(align_val));
}
void operator delete(void* ptr,
const context&) noexcept
{
operator delete(ptr);
}
void operator delete[](void* ptr,
const context&) noexcept
{
operator delete[](ptr);
}
void operator delete(void* ptr,
std::align_val_t align_val,
const context&) noexcept
{
operator delete(ptr, align_val);
}
void operator delete[](void* ptr,
std::align_val_t align_val,
const context&) noexcept
{
operator delete[](ptr, align_val);
}
Contexts and Allocation/Deallocation
Now let us focus on the two functions that do the real job:
enum is_array_t : uint32_t {
alloc_is_not_array,
alloc_is_array
};
void* alloc_mem(size_t size,
const context& ctx,
is_array_t is_array,
size_t alignment);
void free_mem(void* ptr,
is_array_t is_array,
size_t alignment);
Considering this interface, the context information can only be stored immediately before the memory returned to the user. In order to trace leaked memory, we need to link the allocated memory into a linked list, and the control block is as follows:
struct new_ptr_list_t {
new_ptr_list_t* next;
new_ptr_list_t* prev;
size_t size;
context ctx;
uint32_t head_size : 31;
uint32_t is_array : 1;
uint32_t magic;
};
The first four fields should be very clear in meaning. head_size
probably requires some explanation. While the struct is fixed in size, alignments can be different across allocations, resulting in different offsets from the struct pointer to the memory pointer the user gets. So this fields records the aligned struct size. is_array
records whether the allocation is done by an operator new[]
; we use this piece of information to detect the new[]
/delete
or new
/delete[]
mismatch, as well as allowing for special offsets required by array allocations. magic
is used to mark that the memory is allocated by this implementation so that when freeing the memory we can detect corrupt memory, double freeing, and suchlike.
We also need the list head of control blocks, a mutex to protect its access, a function to align data size, and the magic number constant:
new_ptr_list_t new_ptr_list = {
&new_ptr_list, &new_ptr_list, 0, {},
alloc_is_not_array, 0, CTX_MAGIC};
std::mutex new_ptr_lock;
constexpr uint32_t align(size_t s, size_t alignment)
{
return static_cast<uint32_t>((s + alignment - 1) &
~(alignment - 1));
}
constexpr uint32_t CTX_MAGIC = 0x4D585443; // "CTXM"
alloc_mem
is then quite straightforward:
void* alloc_mem(size_t size, const context& ctx,
is_array_t is_array,
size_t alignment =
__STDCPP_DEFAULT_NEW_ALIGNMENT__)
{
assert(alignment >=
__STDCPP_DEFAULT_NEW_ALIGNMENT__);
uint32_t aligned_list_item_size =
align(sizeof(new_ptr_list_t), alignment);
size_t s = size + aligned_list_item_size;
auto ptr = static_cast<new_ptr_list_t*>(
ctx_alloc(s, alignment, ctx));
if (ptr == nullptr) {
return nullptr;
}
auto usr_ptr = reinterpret_cast<char*>(ptr) +
aligned_list_item_size;
ptr->ctx = ctx;
ptr->is_array = is_array;
ptr->size = size;
ptr->head_size = aligned_list_item_size;
ptr->magic = CTX_MAGIC;
{
std::lock_guard guard{new_ptr_lock};
ptr->prev = new_ptr_list.prev;
ptr->next = &new_ptr_list;
new_ptr_list.prev->next = ptr;
new_ptr_list.prev = ptr;
}
return usr_ptr;
}
I.e. it does the following things:
- Allocates memory enough to satisfy the user requirement and the additional metadata (
new_ptr_list_t
) - Fills in the metadata
- Chains the allocated memory blocks into a list
- Returns the pointer after the metadata
free_mem
does the opposite thing. Apparently, we need a function to convert the user pointer back to the originally allocated pointer, which is not really trivial, considering the potential cases of bad pointer and unmatched use of array and non-array versions of new
and delete
. It is the convert_user_ptr
function:
new_ptr_list_t* convert_user_ptr(void* usr_ptr,
size_t alignment)
{
auto offset = static_cast<char*>(usr_ptr) -
static_cast<char*>(nullptr);
auto adjusted_ptr = static_cast<char*>(usr_ptr);
bool is_adjusted = false;
// Check alignment first
if (offset % alignment != 0) {
offset -= sizeof(size_t);
if (offset % alignment != 0) {
return nullptr;
}
// Likely caused by new[] followed by delete, if
// we arrive here
adjusted_ptr = static_cast<char*>(usr_ptr) -
sizeof(size_t);
is_adjusted = true;
}
auto ptr = reinterpret_cast<new_ptr_list_t*>(
adjusted_ptr -
align(sizeof(new_ptr_list_t), alignment));
if (ptr->magic == CTX_MAGIC &&
(!is_adjusted || ptr->is_array)) {
return ptr;
}
if (!is_adjusted && alignment > sizeof(size_t)) {
// Again, likely caused by new[] followed by
// delete, as aligned new[] allocates alignment
// extra space for the array size.
ptr = reinterpret_cast<new_ptr_list_t*>(
reinterpret_cast<char*>(ptr) - alignment);
is_adjusted = true;
}
if (ptr->magic == CTX_MAGIC &&
(!is_adjusted || ptr->is_array)) {
return ptr;
}
return nullptr;
}
With this, free_mem
then becomes easy:
void free_mem(void* usr_ptr, is_array_t is_array,
size_t alignment =
__STDCPP_DEFAULT_NEW_ALIGNMENT__)
{
assert(alignment >=
__STDCPP_DEFAULT_NEW_ALIGNMENT__);
if (usr_ptr == nullptr) {
return;
}
auto ptr = convert_user_ptr(usr_ptr, alignment);
if (ptr == nullptr) {
fprintf(stderr,
"delete%s: invalid pointer %p\n",
is_array ? "[]" : "", usr_ptr);
abort();
}
if (is_array != ptr->is_array) {
const char* msg = is_array
? "delete[] after new"
: "delete after new[]";
fprintf(stderr,
"%s: pointer %p (size %zu)\n",
msg, usr_ptr, ptr->size);
abort();
}
{
std::lock_guard guard{new_ptr_lock};
ptr->magic = 0;
ptr->prev->next = ptr->next;
ptr->next->prev = ptr->prev;
}
ctx_free(ptr, &(ptr->ctx));
}
I.e.:
- It invokes
convert_user_ptr
to convert the user-provided pointer to anew_ptr_list_t*
. - It checks whether array-ness matches in the memory allocation and deallocation.
- It unlinks the memory block from the linked list.
- If anything bad happens, it prints a message and aborts the whole program (as the program already has undefined behaviour).
One More Thing
It is now nearly complete: we have set up the mechanisms to record memory contexts in memory allocation and deallocation functions. However, I have omitted one important detail so far. If you used my code verbatim as above, the program would crash on first memory allocation. When the global allocation and deallocation functions are replaced, care must be taken when we need additional memory inside those functions. If we somehow use the generic C++ memory allocation mechanisms, it will invoke operator new
in the end, causing an infinite recursion. It is still OK to use malloc
/free
, so we need to use a malloc_allocator
for the context stack:
template <typename T>
struct malloc_allocator {
typedef T value_type;
typedef std::true_type is_always_equal;
typedef std::true_type
propagate_on_container_move_assignment;
malloc_allocator() = default;
template <typename U>
malloc_allocator(const malloc_allocator<U>&) {}
template <typename U>
struct rebind {
typedef malloc_allocator<U> other;
};
T* allocate(size_t n)
{
return static_cast<T*>(malloc(n * sizeof(T)));
}
void deallocate(T* p, size_t)
{
free(p);
}
};
thread_local std::deque<context,
malloc_allocator<context>>
context_stack{context{"<UNKNOWN>", "<UNKNOWN>"}};
Everything Put Together
You can find the real code with more details and a working memory leak checker in this repository:
https://github.com/adah1972/nvwa/tree/master/nvwa
You need to add the root directory of Nvwa to your include path, and nvwa/memory_trace.cpp and nvwa/aligned_memory.cpp to your project. In order to add a new memory checkpoint, use the macro NVWA_MEMORY_CHECKPOINT
(Nvwa macros are usually prefixed with ‘NVWA_
’). A very short test program follows:
#include <nvwa/memory_trace.h>
int main()
{
char* ptr1 = new char[20];
NVWA_MEMORY_CHECKPOINT();
char* ptr2 = new char[42];
}
The output would be like the following:
Leaked object at 0x57697e30 (size 20, context: <UNKNOWN>/<UNKNOWN>) Leaked object at 0x57697e70 (size 42, context: test.cpp/int main()) *** 2 leaks found