Memory Management from the Ground Up 1: Introduction
Welcome to my first post in a series about memory management in C++ (specifically, in games). In this, I will discuss how memory allocation works, why custom allocators are used in many programs, and what goals a custom allocator should fulfill.
In the course of this series, I hope to cover several designs of allocators, show some benchmarks, and explain the pitfalls of writing your own allocation system from start to finish. There is a lot to cover, and I’m going to be assuming a reasonable level of C++ competency.
Ready to dive in?
1.1 Memory Primer
For the sake of completeness, we will start with a quick primer on how memory allocations actually work. Most people should know that memory on their computer is vaguely determined by the amount of RAM – random access memory – that they have. RAM is only one part of a fairly complicated system that manages and maintains the memory used by applications on a computer.
At the highest layer, the layer seen by your application, the whole of the address space (which we will call the wilderness) is apparently empty, save for objects on the stack, such as local variables, and memory allocated to objects in the executable, such as the actual machine code as well as any compile-time constants. Your application can request additional memory from the operating system, which, if able to provide it, will return a pointer to an area that was previously part of the wilderness. This allocated memory can then be used by the application. If the application attempts to use memory that it does not own, the application will receive an access violation (or, if you’re a crazy Unix person, a segmentation fault).
That this system works in that way is an advance we owe to the original Intel 80286 (although it was vastly improved and made workable in the 386), and it is called protected mode. Protected mode is a hardware-supported system of disallowing individual programs from being aware of other running processes (at least, not without using specific interprocess APIs). In a protected mode operating system, each process gets its own “version” of the entire address space (all locations describable by a pointer). Needless to say, the ability to stop applications from writing into each others’ memory space is a fundamental requirement of making a multitasking operating system.
So, what happens when we request memory? Our call to new (or malloc, which is generally what new defers to anyway) will, barring any kind of user-side heap management (which does happen, I’ll go into that later), hit the operating system’s memory manager. The OS, being a privileged program, runs in a special mode, called ring 0. Protected mode systems have multiple privilege rings that code can execute in. In most modern operating systems, the kernel runs entirely in ring 0, while the applications run entirely in ring 2 (ring 1, designed for device drivers, is almost unused). A ring 0 application has a different view of the address space to a ring 2 application – it is able to access or “see” all the allocations made by other applications, which makes the OS’ address space significantly more cluttered. In addition to all the running applications, some of the address space is used by hardware – such as the entire block of video card RAM (indeed, these hardware reserved sections are the reason that 32 bit operating systems, while able to address 4 gigabytes of RAM, can only use less than that).
The kernel’s view of the address space is the entire map of all available, or addressable, memory on the system. Every 32 bit OS is able to address 4 gigabytes of stuff, whether it be normal memory, video RAM, sound cards (remember A220 I5 D3?), hardware interrupt managers, thread stacks, memory mapped files, you name it. Operating systems tended to carve up this area further, reserving, say, 2 gigabytes of that space to “kernel” memory (basically, memory used by the OS, drivers and hardware addresses), and allowing all the applications to use the remainder. A memory allocation request causes the operating system to look for a block of contiguous unused space in the virtual address table, mark it as belonging to the application, and return the pointer to it. If there was not a large enough unused block in the kernel’s view of the memory space, the allocation failed (which should demonstrate just why a 64 bit OS is essential).
The operating system is able to make additional use of this address space by not requiring that it exactly matches the RAM amount – it is able to take infrequently used blocks of memory and put them on the hard drive (in the page file), freeing up space in RAM, which is many orders of magnitude faster. Note that it doesn’t ever remove this memory from the virtual address space – it simply can change what is backing it. When a program requests to access that memory again, the OS can transparently reshuffle the memory, loading in data from the HDD to RAM, where the CPU can use it.
The last thing to explain about this whole system is the concept of pages. Memory is managed in discrete chunks – pages – the smallest block of data that the OS’ virtual memory manager cares about. For most operating systems, the page size is usually around 4kb. Each page is treated separately – they can be moved in and out of the page file and RAM independently (even if part of the same allocation), and there is no requirement that contiguous virtual pages are contiguous in RAM (indeed, they often aren’t). Pages vastly simplify memory management from the OS perspective – everything is the same size, there is no requirement that you have to have a large free block in physical RAM. Finally, when memory is freed (or the application is terminated), the OS marks the used pages as “free” and the cycle continues.
1.2 The Problem with Malloc
So, now that you understand the underlying systems going on with the OS level, what about malloc. Why, for example, would we even want to replace it?
From my description of the underlying system above, two things should be fairly obvious – one, that allocating memory requires a fair amount of operating system interaction, and two, that some other system must be being used for application memory management, given that the OS only cares about objects the size of a page (or some multiple thereof).
Enter the next part of the equation – malloc’s user-side (i.e., non-OS) memory management. Malloc, when a memory request arrives, will do one of two things. It will attempt to fulfill the request by providing a pointer to memory your program already owns (“here’s one I prepared earlier”), or, failing that, will perform the expensive task of requesting memory from the operating system. Depending on the exact implementation of malloc, it may request a significantly larger size than required, allowing it to then manage the larger, user space, block itself, or it may simply return the smallest multiple of pages – something which has a lower probability of failing – VM addresses must be contiguous.
Malloc will then carve up the space the OS has given it, annotate it with its own book-keeping data, and return the requested amount back to the application. When a free request comes in, it will update its book-keeping data, and potentially relinquish the memory back to the operating system.
Fundamentally, by providing a custom memory manager, we are changing the behaviour of this user-side system. We can’t speed up the memory management code that the operating system uses – we, instead, can use knowledge about our application to improve the performance of the memory system. For instance, we may know that our application will allocate and release memory in quick succession – we can immediately improve malloc (which is optimized for the general case) by not relinquishing the memory back to the OS. We may know that our application will only ever allocate tiny objects, and we can write a management system that is optimized for that case.
But that’s not all…
1.3 Widen the Scope
By using a custom allocator, we suddenly are no longer in the dark with the memory behaviour of our application. We can provide many more useful systems beyond performance tweaking, such as:
- Memory use tracking (or size/count histograms);
- Leak detection;
- Buffer overrun/underrun detection;
- Debugging info such as call stacks, allocation names, etc;
- Higher-level constructs and systems such as hedges and arenas, as well memory optimized for particular algorithms.
Now, there are third party solutions to all of these. I am a believer in reinventing the wheel, however; mainly because I think it is very useful to learn algorithms for yourself, and also because I get angry when I have to use anyone else’s code – the mark of a true programmer =)
In the next article, I will go through two very simple memory allocation algorithms – specifically, two that are both unfortunately useless, but provide a basis to build upon.
Trackbacks & Pingbacks