Skip to content

emeryberger/Hoard

by Emery Berger

The Hoard memory allocator is a fast, scalable, and memory-efficient memory allocator that works on a range of platforms, including Linux, Mac OS X, and Windows.

Hoard is a drop-in replacement for malloc that can dramatically improve application performance, especially for multithreaded programs running on multiprocessors and multicore CPUs. No source code changes necessary: just link it in or set one environment variable (see Building Hoard, below).

Downloads

Press

Users

Companies using Hoard in their products and servers include AOL, British Telecom, Blue Vector, Business Objects (formerly Crystal Decisions), Cisco, Credit Suisse, Entrust, InfoVista, Kamakura, Novell, Oktal SE, OpenText, OpenWave Systems (for their Typhoon and Twister servers), Pervasive Software, Plath GmbH, Quest Software, Reuters, Royal Bank of Canada, SAP, Sonus Networks, Tata Communications, and Verite Group.

Open source projects using Hoard include the Asterisk Open Source Telephony Project, Bayonne GNU telephony server, the Cilk parallel programming language, the GNU Common C++ system, the OpenFOAM computational fluid dynamics toolkit, and the SafeSquid web proxy.

Hoard is now a standard compiler option for the Standard Performance Evaluation Corporation's CPU2006 benchmark suite for the Intel and Open64 compilers.

Licensing

Hoard has now been released under the widely-used and permissive Apache license, version 2.0.

Why Hoard?

There are a number of problems with existing memory allocators that make Hoard a better choice.

Contention

Multithreaded programs often do not scale because the heap is a bottleneck. When multiple threads simultaneously allocate or deallocate memory from the allocator, the allocator will serialize them. Programs making intensive use of the allocator actually slow down as the number of processors increases. Your program may be allocation-intensive without you realizing it, for instance, if your program makes many calls to the C++ Standard Template Library (STL). Hoard eliminates this bottleneck.

False Sharing

System-provided memory allocators can cause insidious problems for multithreaded code. They can lead to a phenomenon known as "false sharing": threads on different CPUs can end up with memory in the same cache line, or chunk of memory. Accessing these falsely-shared cache lines is hundreds of times slower than accessing unshared cache lines. Hoard is designed to prevent false sharing.

Blowup

Multithreaded programs can also lead the allocator to blowup memory consumption. This effect can multiply the amount of memory needed to run your application by the number of CPUs on your machine: four CPUs could mean that you need four times as much memory. Hoard is guaranteed (provably!) to bound memory consumption.

Installation

Homebrew (Mac OS X)

You can use Homebrew to install the current version of Hoard as follows:

brew tap emeryberger/hoard
brew install --HEAD emeryberger/hoard/libhoard

This not only installs the Hoard library, but also creates a hoard command you can use to run Hoard with anything at the command-line.

hoard myprogram-goes-here

Building Hoard from source (Mac OS X, Linux, and Windows WSL2)

On Linux, you may need to first install the appropriate version of libstdc++-dev (e.g., libstdc++-12-dev):

   sudo apt install libstdc++-dev

Now, to build Hoard from source, do the following:

    git clone https://github.com/emeryberger/Hoard
    mkdir build && cd build
    cmake ..
    make

You can then use Hoard by linking it with your executable, or by setting the LD_PRELOAD environment variable, as in

    export LD_PRELOAD=/path/to/libhoard.so

or, in Mac OS X:

    export DYLD_INSERT_LIBRARIES=/path/to/libhoard.dylib

Building Hoard (Windows)

Hoard uses Microsoft Detours for function interposition on Windows. Detours is automatically downloaded and built by CMake.

git clone https://github.com/emeryberger/Hoard
cd Hoard
mkdir build && cd build
cmake ..
cmake --build . --config Release

This produces build\Release\hoard.dll along with withdll.exe and setdll.exe tools. Supports x86, x64, ARM, and ARM64 architectures.

Using Hoard on Windows

Important: Programs must be compiled with /MD (dynamic C runtime) for Hoard to intercept allocations. Programs compiled with /MT (static C runtime) have allocation functions embedded directly in the executable, which Hoard cannot intercept.

With unmodified executables (recommended):

Use withdll.exe (built automatically) to inject Hoard into any program at runtime, similar to LD_PRELOAD on Linux:

build\Release\withdll.exe /d:build\Release\hoard.dll yourapp.exe [args...]

Permanent modification:

Use setdll.exe (built automatically) to modify an executable's import table:

# Add Hoard to executable (creates backup as .exe~)
build\Release\setdll.exe /d:build\Release\hoard.dll yourapp.exe

# Remove Hoard from executable
build\Release\setdll.exe /r:hoard.dll yourapp.exe

Linking at build time:

You can also link Hoard directly into your application:

cl /Ox /MD yourapp.cpp /link hoard.lib

Benchmarks

The directory benchmarks/ contains a number of benchmarks used to evaluate and tune Hoard.

All benchmarks were run on a 192-core, 2-node NUMA system (AMD EPYC). Graphs are normalized to Hoard (1.0 = Hoard, shown as green line). Values above the line mean worse than Hoard.

Summary

Key findings:

  • Hoard achieves 1.3-1.5x higher throughput than mimalloc, jemalloc, and glibc on server workloads (Larson)
  • Hoard is 2-5x faster on realloc-heavy workloads (Phong)
  • Hoard uses less memory than mimalloc and jemalloc at high thread counts
  • On NUMA systems, Hoard is up to 1.6x faster due to NUMA-aware memory management

Execution Time Summary

Memory Usage Summary

Larson (server workload simulation)

Simulates a multithreaded server handling many short-lived allocations with object passing between threads.

Take-home: Hoard achieves 1.3-1.5x higher throughput than all other allocators across all thread counts. This benchmark is representative of real server workloads.

Larson - Throughput Larson - Memory

threadtest (malloc/free throughput)

Measures raw allocation/deallocation throughput with minimal work between operations.

Take-home: Hoard is fastest at low-medium thread counts (8-32 threads) and matches mimalloc at 256 threads. Hoard uses significantly less memory than jemalloc at high thread counts.

threadtest - Time threadtest - Memory

Phong (realloc-heavy workload)

Tests realloc performance with repeated grow/shrink patterns.

Take-home: Hoard is 2-5x faster than all other allocators at low-medium thread counts (4-64) due to its optimized in-place realloc implementation.

Phong - Time Phong - Memory

linux-scalability

Pure malloc/free pairs with no work between operations. Tests raw allocator scalability.

Take-home: jemalloc excels here; this workload is adversarial for Hoard's superblock design. However, jemalloc uses significantly more memory.

linux-scalability - Time linux-scalability - Memory

NUMA Performance

On NUMA systems, memory locality matters. Hoard's NUMA-aware sharding keeps allocations on the same NUMA node as the allocating thread, reducing cross-node memory traffic.

Take-home: At 128 threads on a 2-node NUMA system, Hoard is 1.4x faster than mimalloc, 1.4x faster than jemalloc, and 1.6x faster than glibc. The advantage grows with thread count.

NUMA Throughput NUMA Speedup

Technical Information

Hoard has changed quite a bit over the years, but for technical details of the first version of Hoard, read Hoard: A Scalable Memory Allocator for Multithreaded Applications, by Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. The Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). Cambridge, MA, November 2000.

About

The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.

Topics

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENSE
Unknown
COPYING

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

Contributors