预取数据以缓存x86-64

时间:2022-07-23 02:56:15

In my application, at one point I need to perform calculations on a large contiguous block of memory data (100s of MBs). What I was thinking was to keep prefetching the part of the block my program will touch in future, so that when I perform calculations on that portion, the data is already in the cache.

在我的应用程序中,我需要在一个大的连续内存数据块(100个MB)上执行计算。我想的是继续预取我的程序将来会触及的块的部分,这样当我对该部分执行计算时,数据已经在缓存中。

Can someone give me a simple example of how to achieve this with gcc? I read _mm_prefetch somewhere, but don't know how to properly use it. Also note that I have a multicore system, but each core will be working on a different region of memory in parallel.

有人能给我一个简单的例子来说明如何用gcc实现这个目标吗?我在某处读过_mm_prefetch,但不知道如何正确使用它。另请注意,我有一个多核系统,但每个核心将并行处理不同的内存区域。

2 个解决方案

#1


17  

gcc uses builtin functions as an interface for lowlevel instructions. In particular for your case __builtin_prefetch. But you only should see a measurable difference when using this in cases where the access pattern is not easy to predict automatically.

gcc使用内置函数作为低级指令的接口。特别是对于你的情况__builtin_prefetch。但是,在访问模式不易自动预测的情况下,使用它时,您应该看到可测量的差异。

#2


15  

Modern CPUs have pretty good automatic prefetch and you may well find that you do more harm than good if you try to initiate software prefetching. There is most likely a lot more "low hanging fruit" that you can focus on for optimisation if you find that you actually have a performance problem. Prefetch tends to be one of the last things that you might try, when you're desperate for a few more percent throughput.

现代CPU具有相当不错的自动预取功能,如果您尝试启动软件预取,您可能会发现弊大于利。如果您发现实际上存在性能问题,那么很可能会有更多“低挂果”,您可以专注于优化。当你急需几个百分点的吞吐量时,预取往往是你可能会尝试的最后一件事。

#1


17  

gcc uses builtin functions as an interface for lowlevel instructions. In particular for your case __builtin_prefetch. But you only should see a measurable difference when using this in cases where the access pattern is not easy to predict automatically.

gcc使用内置函数作为低级指令的接口。特别是对于你的情况__builtin_prefetch。但是,在访问模式不易自动预测的情况下,使用它时,您应该看到可测量的差异。

#2


15  

Modern CPUs have pretty good automatic prefetch and you may well find that you do more harm than good if you try to initiate software prefetching. There is most likely a lot more "low hanging fruit" that you can focus on for optimisation if you find that you actually have a performance problem. Prefetch tends to be one of the last things that you might try, when you're desperate for a few more percent throughput.

现代CPU具有相当不错的自动预取功能,如果您尝试启动软件预取,您可能会发现弊大于利。如果您发现实际上存在性能问题,那么很可能会有更多“低挂果”,您可以专注于优化。当你急需几个百分点的吞吐量时,预取往往是你可能会尝试的最后一件事。