kfree_rcu实现浅析

文章http://t.****img.cn/9sS23和http://t.****img.cn/0wa6h分析了rcu的基本实现原理。不过在阅读内核代码的过程中，我们经常能看到函数kfree_rcu()的使用。那么kfree究竟是怎么和rcu联系在一起的呢？
本文分析基于linux内核4.19.195
直接上代码。

/**
 * kfree_rcu() - kfree an object after a grace period.
 * @ptr:	pointer to kfree
 * @rcu_head:	the name of the struct rcu_head within the type of @ptr.
 *
 * Many rcu callbacks functions just call kfree() on the base structure.
 * These functions are trivial, but their size adds up, and furthermore
 * when they are used in a kernel module, that module must invoke the
 * high-latency rcu_barrier() function at module-unload time.
 *
 * The kfree_rcu() function handles this issue.  Rather than encoding a
 * function address in the embedded rcu_head structure, kfree_rcu() instead
 * encodes the offset of the rcu_head structure within the base structure.
 * Because the functions are not allowed in the low-order 4096 bytes of
 * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
 * If the offset is larger than 4095 bytes, a compile-time error will
 * be generated in __kfree_rcu().  If this error is triggered, you can
 * either fall back to use of call_rcu() or rearrange the structure to
 * position the rcu_head structure into the first 4096 bytes.
 *
 * Note that the allowable offset might decrease in the future, for example,
 * to allow something like kmem_cache_free_rcu().
 *
 * The BUILD_BUG_ON check must not involve any function calls, hence the
 * checks are done in macros here.
 */
#define kfree_rcu(ptr, rcu_head)					\
	__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))

注释写的非常清楚，kfree_rcu的作用，就是在一个gp后，将相关对象通过kfree释放掉。

/*
 * Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.
 */
#define __kfree_rcu(head, offset) \
	do { \
		BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \
		kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \
	} while (0)
/*
 * Queue an RCU callback for lazy invocation after a grace period.
 * This will likely be later named something like "call_rcu_lazy()",
 * but this change will require some way of tagging the lazy RCU
 * callbacks in the list of pending callbacks. Until then, this
 * function may only be called from __kfree_rcu().
 */
void kfree_call_rcu(struct rcu_head *head,
		    rcu_callback_t func)
{
	__call_rcu(head, func, rcu_state_p, -1, 1);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);

最后还是通过__call_rcu来实现。
奇怪的是，__kfree_rcu的第二个参数，获取的是offsetof(typeof(*(ptr)), rcu_head)，最后将这个值，作为rcu_callback_t类型的变量，传递给了__call_rcu。这个值显然是个偏移的量，作为callback调用的函数肯定会触发系统panic，那么内核是怎么识别并处理这个变量的呢，然后又是怎么使用kfree去将这个变量释放的呢？这里并没有看到有kfree的传入。
调用rcu callback的流程，最终会走到函数__rcu_reclaim，下面来看这个函数的实现

/*
 * Does the specified offset indicate that the corresponding rcu_head
 * structure can be handled by kfree_rcu()?
 */
#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
/*
 * Reclaim the specified callback, either by invoking it (non-lazy case)
 * or freeing it directly (lazy case).  Return true if lazy, false otherwise.
 */
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{
	unsigned long offset = (unsigned long)head->func;

	rcu_lock_acquire(&rcu_callback_map);
	if (__is_kfree_rcu_offset(offset)) {
		RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)
		kfree((void *)head - offset);
		rcu_lock_release(&rcu_callback_map);
		return true;
	} else {
		RCU_TRACE(trace_rcu_invoke_callback(rn, head);)
		head->func(head);
		rcu_lock_release(&rcu_callback_map);
		return false;
	}
}

原来在函数__rcu_reclaim进行了处理，如果__is_kfree_rcu_offset(offset)返回true，则会调用kfree，将相关变量释放，否则，需要将这个“offset”当做一个函数指针进行调用，从而触发相关的资源回收。
那么问题来了，为什么内核要专门弄个if else来实现这个kfree_rcu，而不是在kfree_rcu中，通过传入kfree作为callback function？这样做岂不是更优雅，也能够去掉一个分支判断的损耗？
其实，内核也确实节省不了这个if else的分支判断。原因在于，rcu callback链表里面，连着的都是struct rcu_head的对象，但是kfree释放的对象的地址，无法直接等同于这个struct rcu_head的对象的地址；内核中很多结构是通过包含struct rcu_head来实现rcu相关逻辑的。所以，上述所谓的“优雅”的方法，会导致kfree无法获取到被释放对象的正确地址。从而，内核只能通过这种不优雅的方式，实现kfree_rcu。

秒客网

kfree_rcu实现浅析

相关文章