A power tool to understand memory layout

Core analyzer

What is Core Analyzer

Many program bugs, especially those in C/C++, are memory related. When a program failure is observed, either an erroneous behavior or a downright crash, the key to solve the mystery often boils down to an invalid data object or an inaccessible memory address. It may be as simple as dereferencing a NULL pointer due to an uninitialized variable. But it could be very difficult in cases such as memory corruption, race condition, etc., which demands further investigation. This is especially true when the bug is hidden deep inside a complex execution context with a large number of data objects involved. Finding a needle in a haystack would be no exaggeration at all, as anyone with the experience can tell you.

Memory allocation and access involves kernel, heap manager, compiler and application code. Therefore, it is essential to understand how a piece of memory is allocated, owned and accessed by them at various stages from macro to micro level. In other words, the same memory is viewed and managed differently by various software layers. For example, the compiler creates an object in a structured type and generates code to access data by its memory layout accordingly, which is described by its debug symbols; The memory manager, on the other hand, embeds heap metadata in/around the heap memory object to indicate its size and free/in-use status. It might also include padding bytes, security signature, etc.; An object may be located in the text, data, heap, or stack segment of a process, which are managed and protected by certain permission bits set by the kernel virtual memory manager. All this information is important and could be the foundation to build a theory, to prove or disprove an assumption to the cause of a program failure.

A nontrivial program would have many data objects which could range from simple primitives, such as char, integer, float, etc. to complex aggregates like C++ object with multiple inheritances. Data objects are related through direct or indirect references to each other by design. One object may be shared or referenced by multiple other objects. The application code usually goes through multiple indirect references to access a memory target. This makes it difficult to figure out the root cause when something goes wrong. In a typical debugging session, we may have one or more suspected data objects at hand. The challenge is to find out what other objects are holding references to the suspected and may potentially access them incorrectly. This is more or less like reverse engineering and understandably very difficult. The traditional way to use a debugger to inspect all variables can be a daunting task if not impossible and prone to errors for a program of thousands or even millions of variables. On top of that, heap data objects, unlike global and local variables, have no debug symbols to describe their types or locations which makes the debugger powerless. Yet they are often the target of investigation since most objects are created dynamically on heap. Take the memory overrun as an example, the key to track down this type of bug is to figure out what is the memory object preceding the victim and who owns it and how it is read and written.

Given an arbitrary memory object, e.g. a heap address, it is not obvious of its data type and its relationship with other objects. Core Analyzer is designed to help answer these questions. Although some debuggers provide part of the functions (for example, Windbg has an extension command !heap to check heap memory), none of them could use the heap information to uncover the complex relationships among numerous data objects. Besides, many programs use customized memory manager for various reasons, in which case debugger has no idea at all. Core Analyzer has built-in knowledge of the heap data structures of popular runtime memory managers, therefore it is able to scan a process’s heap to check its consistency and point out corrupted spots if any. By searching the process’s address space for all references directly or indirectly to a suspicious object, the tool helps to unveil the object’s type and usage in a thorough and systematic way. Core Analyzer understands various core dump file formats on different platforms, e.g., ELF core on Linux/Mac OS and minidump on Windows, which helps to categorize a process’s address space into text, data, heap and stack regions.

The following table lists core analyzer’s main features.

Download Core Analyzer

The project is currently hosted on Source Forge. You could download binaries and source code, or leave your comments which is very much appreciated.

Licensing Info

The use and distribution of Core Analyzer is governed by the GNU Lesser General Public License (LGPL) as published by the Free Software Foundation.

Currently, Core Analyzer understands heap data managed by the system allocators on Linux/Windows/Mac OS as well as SmartHeap by MicroQuill. However, it is designed to be extensible. It is likely that you are using a customized memory manager in your product. By plugging in a few functions required by the infrastructure, you can easily port the tool to your environment and take advantage of its power features. I have been using this tool extensively and find it indispensable to debug any serious issue.

Heap

· Scan heap and report memory corruption and memory usage statistics

· Display the layout of memory blocks surrounding a given address

· Display the memory block status containing a given address

· Show top heap memory blocks with biggest size (potential memory hog)

Reference

· Find the memory object’s size, type and symbol associated with a given address

· Search and report all references to a given object with any levels of indirection

Others

· Find all object instances of a given C++ class

· Display objects shared by selected or all threads

· Display disassembled instructions annotated with data object  context

· Data pattern within a range of memory region

· Detail process map including all segments and their attributes