UPDATE: Hey guys thanks for the replies. Last night and tonight I tried a few different approaches and came up with one similar to the one laid out below by Jeff (I had even already done what he suggested in his update, and put together my own simple LL implementation for additional gains). Here is the code, at this point it doesn't look particularily clean anymore, but I have been over this numerous times changing anything I could to beef up performance.
更新:嗨,大家好,谢谢你的回复。昨晚和今晚我尝试了几种不同的方法,并提出了类似于Jeff下面列出的方法(我甚至已经完成了他在更新中建议的内容,并将我自己的简单LL实现放在一起以获得额外收益)。这是代码,此时它看起来不再特别干净了,但是我已经多次改变了我可以做的任何事情以增强性能。
public class NewLRU2<K, V> where V : class
{
int m_iMaxItems;
Dictionary<K, LRUNode<K, V>> m_oMainDict;
private LRUNode<K,V> m_oHead;
private LRUNode<K,V> m_oTail;
private LRUNode<K,V> m_oCurrent;
public NewLRU2(int iSize)
{
m_iMaxItems = iSize;
m_oMainDict = new Dictionary<K, LRUNode<K,V>>();
m_oHead = null;
m_oTail = null;
}
public V this[K key]
{
get
{
m_oCurrent = m_oMainDict[key];
if (m_oCurrent == m_oHead)
{
//do nothing
}
else if (m_oCurrent == m_oTail)
{
m_oTail = m_oCurrent.Next;
m_oTail.Prev = null;
m_oHead.Next = m_oCurrent;
m_oCurrent.Prev = m_oHead;
m_oCurrent.Next = null;
m_oHead = m_oCurrent;
}
else
{
m_oCurrent.Prev.Next = m_oCurrent.Next;
m_oCurrent.Next.Prev = m_oCurrent.Prev;
m_oHead.Next = m_oCurrent;
m_oCurrent.Prev = m_oHead;
m_oCurrent.Next = null;
m_oHead = m_oCurrent;
}
return m_oCurrent.Value;
}
}
public void Add(K key, V value)
{
if (m_oMainDict.Count >= m_iMaxItems)
{
//remove old
m_oMainDict.Remove(m_oTail.Key);
//reuse old
LRUNode<K, V> oNewNode = m_oTail;
oNewNode.Key = key;
oNewNode.Value = value;
m_oTail = m_oTail.Next;
m_oTail.Prev = null;
//add new
m_oHead.Next = oNewNode;
oNewNode.Prev = m_oHead;
oNewNode.Next = null;
m_oHead = oNewNode;
m_oMainDict.Add(key, oNewNode);
}
else
{
LRUNode<K, V> oNewNode = new LRUNode<K, V>(key, value);
if (m_oHead == null)
{
m_oHead = oNewNode;
m_oTail = oNewNode;
}
else
{
m_oHead.Next = oNewNode;
oNewNode.Prev = m_oHead;
m_oHead = oNewNode;
}
m_oMainDict.Add(key, oNewNode);
}
}
public bool Contains(K key)
{
return m_oMainDict.ContainsKey(key);
}
}
internal class LRUNode<K,V>
{
public LRUNode(K key, V val)
{
Key = key;
Value = val;
}
public K Key;
public V Value;
public LRUNode<K, V> Next;
public LRUNode<K, V> Prev;
}
There are a few parts that look/feel wonky -- like reusing the old node when doing an add -- but I was able to get an appreciable boost in porformance out of them. I was also slightly surprised at the difference it made to switch from actual properties on the node to just public variables, but I guess that's how it goes with this stuff. At this point the code above is almost entirely performance-limited by the dictionary operations, so I'm not sure I'd get a lot more out of mashing it around. I'll continue to think on it and look into some of the responses.
有一些部分看起来/感觉很糟糕 - 比如在进行添加时重用旧节点 - 但是我能够从中获得明显的性能提升。我对从节点上的实际属性切换到公共变量所做的差异也略感惊讶,但我想这就是它与这些东西的关系。在这一点上,上面的代码几乎完全受到字典操作的性能限制,所以我不确定我是否可以通过混搭来获得更多。我将继续思考并研究一些回应。
Explanation From Original Post: Hello all. So I've written a simple lightweight LRU implementation for use in a compression library (I'm using it to find matching byte-strings in the input based on a hash, LZW-style), and I'm looking for ways to make it faster.
来自原帖的解释:大家好。所以我编写了一个简单的轻量级LRU实现,用于压缩库(我用它来根据散列,LZW样式在输入中查找匹配的字节串),我正在寻找制作方法它更快。
3 个解决方案
#1
UPDATE #2
This reduces the need for list traversal on a linked list Remove. It introduces a LruCacheNode that has both the key and the value. The key is only used when you trim the cache. You could get better performance if you wrote your own linked list implementation where each node essentially is an LruCacheNode along with a Next and Back reference. This is sort of what a LinkedHashMap does (see these two questions).
这减少了链接列表删除列表遍历的需要。它引入了一个既具有键又具有值的LruCacheNode。该键仅在您修剪缓存时使用。如果您编写自己的链接列表实现,其中每个节点本质上是LruCacheNode以及Next和Back引用,则可以获得更好的性能。这是LinkedHashMap的作用(参见这两个问题)。
public class LruCache<K, V>
{
private readonly int m_iMaxItems;
private readonly Dictionary<K, LinkedListNode<LruCacheNode<K, V>>> m_oMainDict;
private readonly LinkedList<LruCacheNode<K, V>> m_oMainList;
public LruCache(int iSize)
{
m_iMaxItems = iSize;
m_oMainDict = new Dictionary<K, LinkedListNode<LruCacheNode<K, V>>>();
m_oMainList = new LinkedList<LruCacheNode<K, V>>();
}
public V this[K key]
{
get
{
return BumpToFront(key).Value;
}
set
{
BumpToFront(key).Value = value;
}
}
public void Add(K key, V value)
{
LinkedListNode<LruCacheNode<K, V>> newNode = m_oMainList.AddFirst(new LruCacheNode<K, V>(key, value));
m_oMainDict.Add(key, newNode);
if (m_oMainList.Count > m_iMaxItems)
{
m_oMainDict.Remove(m_oMainList.Last.Value.Key);
m_oMainList.RemoveLast();
}
}
private LruCacheNode<K, V> BumpToFront(K key)
{
LinkedListNode<LruCacheNode<K, V>> node = m_oMainDict[key];
if (m_oMainList.First != node)
{
m_oMainList.Remove(node);
m_oMainList.AddFirst(node);
}
return node.Value;
}
public bool Contains(K key)
{
return m_oMainDict.ContainsKey(key);
}
}
internal sealed class LruCacheNode<K, V>
{
private readonly K m_Key;
private V m_Value;
public LruCacheNode(K key, V value)
{
m_Key = key;
m_Value = value;
}
public K Key
{
get { return m_Key; }
}
public V Value
{
get { return m_Value; }
set { m_Value = value; }
}
}
You'll have to profile things to see if this is an improvement in your environment.
您必须对事物进行分析,以确定这是否是您环境中的改进。
Minor Update: I updated BumpToFront to check to see if the node is already at the front per comment from Tim Stewart.
次要更新:我更新了BumpToFront以检查节点是否已经位于Tim Stewart的评论前面。
#2
Isn't the point of a LRU cache to allow you to trim the cache and throw out the least-recently-used stuff? :-) I don't see any code to trim the cache. Since you most likely want high performance for the retrieve use-case, and the trim use-case is less important why not offload the list maintenance to the trim process?
是不是LRU缓存的重点是允许你修剪缓存并丢弃最近最少使用的东西? :-)我没有看到任何代码来修剪缓存。由于您很可能希望检索用例具有高性能,并且修剪用例不太重要,为什么不将列表维护卸载到修剪过程?
IOW, just throw the entries into the cache, but timestamp them on retrieval. Don't reorder the entries, just mark them whenever they're used. Could be a true DateTime timestamp, or could be a simple counter in the class, highest number was most recently used. Then in the trim process just walk the whole tree and remove the entries with the oldest stamps.
IOW,只需将条目放入缓存中,但在检索时加上时间戳。不要对条目重新排序,只需在使用时标记它们。可以是真正的DateTime时间戳,也可以是类中的简单计数器,最近使用的是最高数字。然后在修剪过程中,只需走完整棵树并删除带有最旧邮票的条目。
#3
With hardware caches, instead of having say 128 elements, and maintaining the order of items 1-128, you might have it as 32 x 4, so 32 rows of 4 elements each. The first 5 bits of an address would determine which of the 32 rows that address would map to, then you would search only the 4 items, and if not found replace the oldest of the 4.
使用硬件缓存,而不是说128个元素,并维护项目1-128的顺序,您可能将其设置为32 x 4,因此32行,每行4个元素。地址的前5位将确定地址中的32行中的哪一行将映射到,然后您将仅搜索4个项目,如果未找到,则替换4中最旧的项目。
This is much faster, and is IIRC within 10% of the hit rate of a 1 x 128 cache.
这速度要快得多,并且IIRC在1 x 128缓存命中率的10%范围内。
To translate, you would instead of one linked list, have multiple ones, so traversing them was much faster. You would have to have a way of determining which list a particular item mapped to.
要进行翻译,您将使用多个链接列表而不是一个链接列表,因此遍历它们要快得多。您必须有一种方法来确定特定项目映射到哪个列表。
The point being, as your list grows in size, you get diminishing returns from trying to maintain with perfect accuracy the exact order of each element in the list. You might even be better off with an unordered list, and randomly replacing any element when you have a cache miss. Depends on the size of your list, and the penalty for a miss vs the cost of maintaining the list.
关键是,随着列表大小的增加,您尝试以完美的精度维护列表中每个元素的确切顺序,从而获得收益递减。使用无序列表可能会更好,并且当您有缓存未命中时随机替换任何元素。取决于您的列表的大小,以及未命中的罚款与维护列表的成本。
#1
UPDATE #2
This reduces the need for list traversal on a linked list Remove. It introduces a LruCacheNode that has both the key and the value. The key is only used when you trim the cache. You could get better performance if you wrote your own linked list implementation where each node essentially is an LruCacheNode along with a Next and Back reference. This is sort of what a LinkedHashMap does (see these two questions).
这减少了链接列表删除列表遍历的需要。它引入了一个既具有键又具有值的LruCacheNode。该键仅在您修剪缓存时使用。如果您编写自己的链接列表实现,其中每个节点本质上是LruCacheNode以及Next和Back引用,则可以获得更好的性能。这是LinkedHashMap的作用(参见这两个问题)。
public class LruCache<K, V>
{
private readonly int m_iMaxItems;
private readonly Dictionary<K, LinkedListNode<LruCacheNode<K, V>>> m_oMainDict;
private readonly LinkedList<LruCacheNode<K, V>> m_oMainList;
public LruCache(int iSize)
{
m_iMaxItems = iSize;
m_oMainDict = new Dictionary<K, LinkedListNode<LruCacheNode<K, V>>>();
m_oMainList = new LinkedList<LruCacheNode<K, V>>();
}
public V this[K key]
{
get
{
return BumpToFront(key).Value;
}
set
{
BumpToFront(key).Value = value;
}
}
public void Add(K key, V value)
{
LinkedListNode<LruCacheNode<K, V>> newNode = m_oMainList.AddFirst(new LruCacheNode<K, V>(key, value));
m_oMainDict.Add(key, newNode);
if (m_oMainList.Count > m_iMaxItems)
{
m_oMainDict.Remove(m_oMainList.Last.Value.Key);
m_oMainList.RemoveLast();
}
}
private LruCacheNode<K, V> BumpToFront(K key)
{
LinkedListNode<LruCacheNode<K, V>> node = m_oMainDict[key];
if (m_oMainList.First != node)
{
m_oMainList.Remove(node);
m_oMainList.AddFirst(node);
}
return node.Value;
}
public bool Contains(K key)
{
return m_oMainDict.ContainsKey(key);
}
}
internal sealed class LruCacheNode<K, V>
{
private readonly K m_Key;
private V m_Value;
public LruCacheNode(K key, V value)
{
m_Key = key;
m_Value = value;
}
public K Key
{
get { return m_Key; }
}
public V Value
{
get { return m_Value; }
set { m_Value = value; }
}
}
You'll have to profile things to see if this is an improvement in your environment.
您必须对事物进行分析,以确定这是否是您环境中的改进。
Minor Update: I updated BumpToFront to check to see if the node is already at the front per comment from Tim Stewart.
次要更新:我更新了BumpToFront以检查节点是否已经位于Tim Stewart的评论前面。
#2
Isn't the point of a LRU cache to allow you to trim the cache and throw out the least-recently-used stuff? :-) I don't see any code to trim the cache. Since you most likely want high performance for the retrieve use-case, and the trim use-case is less important why not offload the list maintenance to the trim process?
是不是LRU缓存的重点是允许你修剪缓存并丢弃最近最少使用的东西? :-)我没有看到任何代码来修剪缓存。由于您很可能希望检索用例具有高性能,并且修剪用例不太重要,为什么不将列表维护卸载到修剪过程?
IOW, just throw the entries into the cache, but timestamp them on retrieval. Don't reorder the entries, just mark them whenever they're used. Could be a true DateTime timestamp, or could be a simple counter in the class, highest number was most recently used. Then in the trim process just walk the whole tree and remove the entries with the oldest stamps.
IOW,只需将条目放入缓存中,但在检索时加上时间戳。不要对条目重新排序,只需在使用时标记它们。可以是真正的DateTime时间戳,也可以是类中的简单计数器,最近使用的是最高数字。然后在修剪过程中,只需走完整棵树并删除带有最旧邮票的条目。
#3
With hardware caches, instead of having say 128 elements, and maintaining the order of items 1-128, you might have it as 32 x 4, so 32 rows of 4 elements each. The first 5 bits of an address would determine which of the 32 rows that address would map to, then you would search only the 4 items, and if not found replace the oldest of the 4.
使用硬件缓存,而不是说128个元素,并维护项目1-128的顺序,您可能将其设置为32 x 4,因此32行,每行4个元素。地址的前5位将确定地址中的32行中的哪一行将映射到,然后您将仅搜索4个项目,如果未找到,则替换4中最旧的项目。
This is much faster, and is IIRC within 10% of the hit rate of a 1 x 128 cache.
这速度要快得多,并且IIRC在1 x 128缓存命中率的10%范围内。
To translate, you would instead of one linked list, have multiple ones, so traversing them was much faster. You would have to have a way of determining which list a particular item mapped to.
要进行翻译,您将使用多个链接列表而不是一个链接列表,因此遍历它们要快得多。您必须有一种方法来确定特定项目映射到哪个列表。
The point being, as your list grows in size, you get diminishing returns from trying to maintain with perfect accuracy the exact order of each element in the list. You might even be better off with an unordered list, and randomly replacing any element when you have a cache miss. Depends on the size of your list, and the penalty for a miss vs the cost of maintaining the list.
关键是,随着列表大小的增加,您尝试以完美的精度维护列表中每个元素的确切顺序,从而获得收益递减。使用无序列表可能会更好,并且当您有缓存未命中时随机替换任何元素。取决于您的列表的大小,以及未命中的罚款与维护列表的成本。