在I / O跟踪文件中查找缓存未命中，命中率

I have an I/O trace file with the following fields ('asu', 'block_address','size', 'opcode','time_stamp'). The Data looks like this. (over 5 million rows)

我有一个带有以下字段的I / O跟踪文件('asu','block_address','size','opcode','time_stamp')。数据看起来像这样。 (超过500万行)

0,20941264,8192,W,0.551706
0,20939840,8192,W,0.554041
0,20939808,8192,W,0.556202
1,3436288,15872,W,1.250720
1,3435888,512,W,1.609859
1,3435889,512,W,1.634761
0,7695360,4096,R,2.346628
1,10274472,4096,R,2.436645
2,30862016,4096,W,2 448003
2,30845544,4096,W,2.449733
1,10356592,4096,W,2.449733

I am trying to add a cache layer in my project and want to calculate the misses and hits. I am using @functools.lru_cache(maxsize = None) to find cache hits and misses for the block_address. Following the tutorial I tried calculating the miss/hits. blk_trace is the trace array for block_address.

我试图在我的项目中添加一个缓存层,并想要计算未命中和命中。我正在使用@ functools.lru_cache(maxsize = None)来查找block_address的缓存命中和未命中。在教程之后,我尝试计算未命中/命中。 blk_trace是block_address的跟踪数组。

@functools.lru_cache(maxsize = None)
def blk_iter():
    blk_len = len(blk_trace)
    for i in range(0,blk_len):
        print(blk_trace[i])

On looking at the cache info blk_iter.cache_info() , I get CacheInfo(hits=0, misses=1, maxsize=None, currsize=1) . Which is not right. I am fairly new to python and caching concepts. I don't know what I am doing wrong. How do I find the miss/hits for the block address?

在查看缓存信息blk_iter.cache_info()时,我得到CacheInfo(hits = 0,misses = 1,maxsize = None,currsize = 1)。哪个不对。我是python和缓存概念的新手。我不知道我做错了什么。如何找到块地址的未命中/命中?

1 个解决方案

#1

The cache is for the function blk_iter -- you only called blk_iter once, therefore your cache size is one, and it has one miss.

缓存用于函数blk_iter - 您只调用一次blk_iter,因此缓存大小为1,并且只有一个未命中。

Consider the following function with lru_cache

使用lru_cache考虑以下函数

@lru_cache(maxsize=None)
def myfunc(x):
    print('Cache miss: ', x)
    return x + 1

When called with a certain value for x the function will run and the result will be stored in the cache. If called again with the same parameter, the function will not run at all and the cached value will be returned.

当用x的某个值调用时,函数将运行,结果将存储在缓存中。如果使用相同的参数再次调用,则该函数将根本不运行,并且将返回缓存的值。

>>> for i in range(3):
...     print(myfunc(i))
...
Cache miss:  0
1
Cache miss:  1
2
Cache miss:  2
3
>>> myfunc(0) # this will be a cache hit
1
>>> myfunc(3) # this will be another miss
Cache miss:  3
4
>>> myfunc.cache_info()
CacheInfo(hits=1, misses=4, maxsize=None, currsize=4)

In your example, even if the cache was setup correctly, you would have all misses and no hits anyhow for i in range(0,blk_len): will call with a new argument each iteration, therefore the cache will never hit.

在你的例子中,即使缓存设置正确,你仍然会在范围内(0,blk_len)发生所有未命中和无命中:每次迭代都会调用一个新参数,因此缓存永远不会命中。

#1