C ++库应该如何允许自定义分配器?

时间:2022-09-02 11:12:28

In C, it's simple for a library to allow the user to customize memory allocation by using global function pointers to a function that should behave similarly to malloc() and to a function that should behave similarly to free(). SQLite, for example, uses this approach.


C++ complicates things a bit because allocation and initialization are usually fused. Essentially we want to get the behavior of having overridden operator new and operator delete for only a library but there's no way to actually do that (I'm fairly certain but not quite 100%).

C ++使事情变得复杂,因为分配和初始化通常是融合的。基本上我们想要获得仅覆盖库的重写operator new和operator delete的行为,但实际上没有办法实现(我相当肯定,但不是100%)。

How should this be done in C++?

如何在C ++中完成?

Here's a first stab at something that replicates some of the semantics of new expressions with a function Lib::make<T>.

这是对使用函数Lib :: make 复制新表达式的一些语义的东西的第一次尝试。

I don't know if this is so useful, but just for fun, here's a more complicated version that also tries to replicate the semantics of new[] expressions.

我不知道这是否有用,但只是为了好玩,这是一个更复杂的版本,也试图复制new []表达式的语义。

This is a goal oriented question so I'm not necessarily looking for code review. If there's some better way to do this just say so and ignore the links.


(By "allocator" I only mean something that allocates memory. I'm not referring to the STL allocator concept or even necessarily allocating memory for containers.)


Why this might be desirable:


Here's a blog post from a Mozilla dev arguing that libraries should do this. He gives a few examples of C libraries that allow the library user to customize allocation for the library. I checked out the source code for one of the examples, SQLite, and see that this feature is also used internally for testing via fault injection. I'm not writing anything that needs to be as bulletproof as SQLite but it still seems like a sensible idea. If nothing else, it allows client code to figure out, "Which library is hogging my memory and when?".

这是一篇来自Mozilla dev的博客文章,他们认为图书馆应该这样做。他给出了一些C库的例子,它们允许库用户自定义库的分配。我查看了其中一个示例SQLite的源代码,并看到此功能也在内部用于通过故障注入进行测试。我不是在编写任何需要像SQLite一样防弹的东西,但它似乎仍然是一个明智的想法。如果没有别的,它允许客户端代码弄清楚,“哪个库正在占用我的记忆以及什么时候?”。

Simple answer: don't use C++. Sorry, joke.

简单回答:不要使用C ++。对不起,开玩笑。

But if you want to take this kind of absolute control over memory management in C++, across libraries/module boundaries, and in a completely generalized way, you can be in for some terrible grief. I'd suggest to most to look for reasons not to do it more than ways to do it.

但是如果你想对C ++中的内存管理,跨库/模块边界以及完全通用的方式采取这种绝对控制,那么你可能会遇到一些可怕的悲痛。我建议大多数人寻找不做其他方法的理由。

I've gone through many iterations of this same basic idea over the years (actually decades), from trying to naively overload operator new/new[]/delete/delete[] at a global level to linker-based solutions to platform-specific solutions, and I'm actually at the desired point you are at now: I have a system that allows me to see the amount of memory allocated per plugin. But I didn't reach this point through the kind of generalized way that you desire (and me as well, originally).

多年来(实际上几十年),我经历了多次迭代这个相同的基本思想,从尝试天然地将全局级别的运算符new / new [] / delete / delete []重载到基于链接器的特定于平台的解决方案解决方案,我实际上是你现在所希望的点:我有一个系统,可以让我看到每个插件分配的内存量。但是我并没有通过你想要的那种普遍的方式达到这一点(我最初也是如此)。

C++ complicates things a bit because allocation and initialization are usually fused.

C ++使事情变得复杂,因为分配和初始化通常是融合的。

I would offer a slight twist to this statement: C++ complicates things because initialization and allocation are usually fused. All I did was swap the order here, but the most complicating part is not that allocation wants to initialize, but because initialization often wants to allocate.

我会对这个陈述略微扭曲:C ++使事情变得复杂,因为初始化和分配通常是融合的。我所做的只是在这里交换顺序,但最复杂的部分不是分配想要初始化,而是因为初始化通常想要分配。

Take this basic example:


struct Foo
    std::vector<Bar> stuff;

In this case, we can easily allocate Foo through a custom memory allocator:


void* mem = custom_malloc(sizeof(Foo));
Foo* foo = new(foo_mem) Foo;

... and of course we can wrap this all we like to conform to RAII, achieve exception-safety, etc.


Except now the problem cascades. That stuff member using std::vector will want to use std::allocator, and now we have a second problem to solve. We could use a template instantiation of std::vector using our own allocator, and if you need runtime information passed to the allocator, you can override Foo's constructors to pass that information along with the allocator to the vector constructor.

除了现在问题级联。使用std :: vector的那个东西成员想要使用std :: allocator,现在我们有第二个问题需要解决。我们可以使用自己的分配器来使用std :: vector的模板实例化,如果需要传递给分配器的运行时信息,可以覆盖Foo的构造函数,将该信息与分配器一起传递给向量构造函数。

But what about Bar? Its constructor may also want to allocate memory for a variety of disparate objects, and so the problem cascades and cascades and cascades.


Given the difficulty of this problem, and the alternative, generalized solutions I've tried and the grief associated when porting, I've settled on a completely de-generalized, somewhat pragmatic approach.


The solution I settled on is to effectively reinvent the entire C and C++ standard library. Disgusting, I know, but I had a bit more of an excuse to do it in my case. The product I'm working on is effectively an engine and software development kit, designed to allow people to write plugins for it using any compiler, C runtime, C++ standard library implementation, and build settings they desire. To allow things like vectors or sets or maps to be passed through these central APIs in an ABI-compatible way required rolling our own standard-compliant containers in addition to a lot of C standard functions.

我解决的解决方案是有效地重新发明整个C和C ++标准库。我知道,这很恶心,但在我的情况下,我有更多的借口来做这件事。我正在开发的产品实际上是一个引擎和软件开发工具包,旨在允许人们使用任何编译器,C运行时,C ++标准库实现和他们想要的构建设置为其编写插件。为了允许向量或集合或映射等事物以ABI兼容的方式通过这些*API传递,除了许多C标准函数之外,还需要滚动我们自己的标准兼容容器。

The entire implementation of this devkit then revolves around these allocation functions:


EP_API void* ep_malloc(int lib_id, int size);
EP_API void ep_free(int lib_id, void* mem);

... and the entirety of the SDK revolves around these two, including memory pools and "sub-allocators".


For third party libraries outside of our control, we're just SOL. Some of those libraries have equally ambitious things they want to do with their memory management, and to try to override that would just lead to all kinds of *es and open up all kinds of cans of worms. There are also very low-level drivers when using things like OGL that want to allocate a lot of system memory, and we can't do anything about it.


Yet I've found this solution to work well enough to answer the basic question: "who/what is hogging up all this memory?" very quickly: a question which is often much more difficult to answer than a similar one related to clock cycles (for which we can just fire up any profiler). It only applies for code under our control, using this SDK, but we can get a very thorough memory breakdown using this system on a per-module basis. We can also set superficial caps on memory use to make sure that out of memory errors are actually being handled correctly without actually trying to exhaust all contiguous pages available in the system.


So in my case this problem was solved via policy: by building a uniform coding standard and a central library conforming to it that's used throughout the codebase (and by third parties writing plugins for our system). It's probably not the answer you are looking for, but this ended up being the most practical solution we've found yet.




