文章http://t.****img.cn/9sS23和http://t.****img.cn/0wa6h分析了rcu的基本实现原理。不过在阅读内核代码的过程中,我们经常能看到函数kfree_rcu()的使用。那么kfree究竟是怎么和rcu联系在一起的呢?
本文分析基于linux内核4.19.195
直接上代码。
/**
* kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree
* @rcu_head: the name of the struct rcu_head within the type of @ptr.
*
* Many rcu callbacks functions just call kfree() on the base structure.
* These functions are trivial, but their size adds up, and furthermore
* when they are used in a kernel module, that module must invoke the
* high-latency rcu_barrier() function at module-unload time.
*
* The kfree_rcu() function handles this issue. Rather than encoding a
* function address in the embedded rcu_head structure, kfree_rcu() instead
* encodes the offset of the rcu_head structure within the base structure.
* Because the functions are not allowed in the low-order 4096 bytes of
* kernel virtual memory, offsets up to 4095 bytes can be accommodated.
* If the offset is larger than 4095 bytes, a compile-time error will
* be generated in __kfree_rcu(). If this error is triggered, you can
* either fall back to use of call_rcu() or rearrange the structure to
* position the rcu_head structure into the first 4096 bytes.
*
* Note that the allowable offset might decrease in the future, for example,
* to allow something like kmem_cache_free_rcu().
*
* The BUILD_BUG_ON check must not involve any function calls, hence the
* checks are done in macros here.
*/
#define kfree_rcu(ptr, rcu_head) \
__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
注释写的非常清楚,kfree_rcu的作用,就是在一个gp后,将相关对象通过kfree释放掉。
/*
* Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
*/
#define __kfree_rcu(head, offset) \
do { \
BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
} while (0)
/*
* Queue an RCU callback for lazy invocation after a grace period.
* This will likely be later named something like "call_rcu_lazy()",
* but this change will require some way of tagging the lazy RCU
* callbacks in the list of pending callbacks. Until then, this
* function may only be called from __kfree_rcu().
*/
void kfree_call_rcu(struct rcu_head *head,
rcu_callback_t func)
{
__call_rcu(head, func, rcu_state_p, -1, 1);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);
最后还是通过__call_rcu来实现。
奇怪的是,__kfree_rcu的第二个参数,获取的是offsetof(typeof(*(ptr)), rcu_head),最后将这个值,作为rcu_callback_t类型的变量,传递给了__call_rcu。这个值显然是个偏移的量,作为callback调用的函数肯定会触发系统panic,那么内核是怎么识别并处理这个变量的呢,然后又是怎么使用kfree去将这个变量释放的呢?这里并没有看到有kfree的传入。
调用rcu callback的流程,最终会走到函数__rcu_reclaim,下面来看这个函数的实现
/*
* Does the specified offset indicate that the corresponding rcu_head
* structure can be handled by kfree_rcu()?
*/
#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
/*
* Reclaim the specified callback, either by invoking it (non-lazy case)
* or freeing it directly (lazy case). Return true if lazy, false otherwise.
*/
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{
unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map);
if (__is_kfree_rcu_offset(offset)) {
RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)
kfree((void *)head - offset);
rcu_lock_release(&rcu_callback_map);
return true;
} else {
RCU_TRACE(trace_rcu_invoke_callback(rn, head);)
head->func(head);
rcu_lock_release(&rcu_callback_map);
return false;
}
}
原来在函数__rcu_reclaim进行了处理,如果__is_kfree_rcu_offset(offset)返回true,则会调用kfree,将相关变量释放,否则,需要将这个“offset”当做一个函数指针进行调用,从而触发相关的资源回收。
那么问题来了,为什么内核要专门弄个if else来实现这个kfree_rcu,而不是在kfree_rcu中,通过传入kfree作为callback function?这样做岂不是更优雅,也能够去掉一个分支判断的损耗?
其实,内核也确实节省不了这个if else的分支判断。原因在于,rcu callback链表里面,连着的都是struct rcu_head的对象,但是kfree释放的对象的地址,无法直接等同于这个struct rcu_head的对象的地址;内核中很多结构是通过包含struct rcu_head来实现rcu相关逻辑的。所以,上述所谓的“优雅”的方法,会导致kfree无法获取到被释放对象的正确地址。从而,内核只能通过这种不优雅的方式,实现kfree_rcu。