高效缓存神器：简析最近最少使用（MRU）缓存模板及实践

陆业聪

发布于 2024-07-23 18:39:57

130

发布于 2024-07-23 18:39:57

在许多应用中，我们需要快速访问最近使用的数据项。最近最少使用（Most Recently Used，MRU）缓存就是一种实现这一需求的数据结构。本文将详细介绍Base库中实现 MRU 缓存的模板，并结合源码进行解析。

MRU 缓存的核心设计

MRU 缓存的实现基于两个主要的数据结构：一个链表（PayloadList）和一个映射（KeyIndex）。链表用于存储缓存的项目，其中每个节点包含一个键值对（value_type），键用于标识项目，值是项目的有效载荷。链表的顺序按照项目的使用频率排序，最近使用的项目在链表的前面，最少使用的项目在链表的后面。

typedef std::pair<KeyType, PayloadType> value_type;
typedef std::list<value_type> PayloadList;
typedef typename MapType<KeyType, typename PayloadList::iterator, HashOrCompareType>::Type KeyIndex;

映射则用于提供快速的键到链表节点的查找。映射的键是项目的键，值是指向链表节点的迭代器。这种设计使得我们可以在常数时间内找到任何给定键的项目，并且可以在常数时间内将任何项目移动到链表的前面。

设计细节

插入操作：当插入新的项目时，如果缓存已满（即达到了指定的最大大小），则会自动删除最少使用的项目。这是通过 ShrinkToSize 函数实现的，该函数通过反向迭代链表并删除节点来收缩链表的大小。

template <typename Payload>
iterator Put(const KeyType& key, Payload&& payload) {
  typename KeyIndex::iterator index_iter = index_.find(key);
  if (index_iter != index_.end()) {
    Erase(index_iter->second);
  } else if (max_size_ != NO_AUTO_EVICT) {
    ShrinkToSize(max_size_ - 1);
  }
  ordering_.emplace_front(key, std::forward<Payload>(payload));
  index_.emplace(key, ordering_.begin());
  return ordering_.begin();
}

获取操作：当获取一个项目时，该项目会被移动到链表的前面，表示它是最近使用的。这是通过 std::list::splice 函数实现的，该函数可以在常数时间内将链表中的任何节点移动到任何位置。

iterator Get(const KeyType& key) {
  typename KeyIndex::iterator index_iter = index_.find(key);
  if (index_iter == index_.end())
    return end();
  typename PayloadList::iterator iter = index_iter->second;
  ordering_.splice(ordering_.begin(), ordering_, iter);
  return ordering_.begin();
}

删除操作：当删除一个项目时，需要同时从链表和映射中删除。这是通过 Erase 函数实现的，该函数首先从映射中删除键，然后从链表中删除节点。

iterator Erase(iterator pos) {
  index_.erase(pos->first);
  return ordering_.erase(pos);
}

遍历操作：MRU 缓存提供了正向和反向的迭代器，以便于遍历缓存中的项目。正向迭代从最近的项目开始，向后进行。反向迭代从最少使用的项目开始，向前进行。

iterator begin() { return ordering_.begin(); }
const_iterator begin() const { return ordering_.begin(); }
iterator end() { return ordering_.end(); }
const_iterator end() const { return ordering_.end(); }

reverse_iterator rbegin() { return ordering_.rbegin(); }
const_reverse_iterator rbegin() const { return ordering_.rbegin(); }
reverse_iterator rend() { return ordering_.rend(); }
const_reverse_iterator rend() const { return ordering_.rend(); }

bool empty() const { return ordering_.empty(); }

哈希表的使用

除了基本的 MRU 缓存实现外，这个模板还提供了一个基于哈希表的变体（HashingMRUCache）。在这个变体中，映射是一个 std::unordered_map，而不是 std::map。这意味着键的查找速度更快，但是键必须是可哈希的。

template <class KeyType, class ValueType, class HashType>
struct MRUCacheHashMap {
  typedef std::unordered_map<KeyType, ValueType, HashType> Type;
};

template <class KeyType, class PayloadType, class HashType = std::hash<KeyType>>
class HashingMRUCache
    : public MRUCacheBase<KeyType, PayloadType, HashType, MRUCacheHashMap> {
 private:
  using ParentType =
      MRUCacheBase<KeyType, PayloadType, HashType, MRUCacheHashMap>;

 public:
  explicit HashingMRUCache(typename ParentType::size_type max_size)
      : ParentType(max_size) {}
  virtual ~HashingMRUCache() {}

 private:
  DISALLOW_COPY_AND_ASSIGN(HashingMRUCache);
};

使用示例

要使用这个 MRU 缓存模板，首先需要包含相应的头文件，然后根据需要创建一个 MRUCache 或 HashingMRUCache 实例。以下是一个简单的使用示例：

#include "base/containers/mru_cache.h"
#include <iostream>
#include <string>

int main() {
  // 创建一个最大容量为 3 的 MRU 缓存
  base::MRUCache<std::string, int> cache(3);

  // 向缓存中插入数据
  cache.Put("one", 1);
  cache.Put("two", 2);
  cache.Put("three", 3);

  // 获取并打印缓存中的数据
  std::cout << "one: " << cache.Get("one")->second << std::endl;
  std::cout << "two: " << cache.Get("two")->second << std::endl;
  std::cout << "three: " << cache.Get("three")->second << std::endl;

  // 插入一个新的数据项，导致最旧的数据项（one）被移除
  cache.Put("four", 4);

  // 尝试获取已移除的数据项
  if (cache.Get("one") == cache.end()) {
    std::cout << "one has been evicted from the cache" << std::endl;
  }

  return 0;
}

这个示例创建了一个最大容量为 3 的 MRU 缓存，然后向其中插入了一些数据。当插入第四个数据项时，最旧的数据项（one）被自动移除，以保持缓存大小在指定范围内。之后，尝试获取已移除的数据项将返回缓存的 end() 迭代器。

总结

本文详细介绍了一个实现了最近最少使用（MRU）缓存的模板，它具有易读性和高效性。通过简洁的设计，该模板提供了插入、获取、删除和清空缓存的方法，并支持自动驱逐最近最少使用的项目，以保持缓存大小在指定范围内。此外，还提供了一个基于哈希表的变体，以提供更快的查找速度。

这个 MRU 缓存模板可以作为一个通用的缓存解决方案，可以应用于各种场景，如文件缓存、网络请求缓存等。虽然所有操作都是 O(1) 时间复杂度，但是这段代码是为了易读性而非最优性而编写的，如果将来的性能分析确定这是瓶颈，那么 O(1) 中的 1 有减小的空间。希望本文能抛砖引玉，帮助读者理解和使用Base库中的优秀设计。

源码和注释

最后附上完整的源码和代码注释：

// 同一时间每个键只能关联一个有效载荷项目。
//
// 键对象将被存储两次，因此它应支持高效的复制。
//
// 注意：虽然所有操作都是 O(1)，但这段代码是为了易读性而非最优性而编写的。
// 如果将来的性能分析确定这是瓶颈，那么 O(1) 中的 1 有减小的空间。:)

#ifndef BASE_CONTAINERS_MRU_CACHE_H_
#define BASE_CONTAINERS_MRU_CACHE_H_

#include <stddef.h>

#include <algorithm>
#include <functional>
#include <list>
#include <map>
#include <unordered_map>
#include <utility>

#include "base/logging.h"
#include "base/macros.h"

namespace base {

// MRUCacheBase ----------------------------------------------------------------

// 此模板用于标准化 MRUCacheBase 可以使用的映射类型容器。
// 由于模板模板参数和默认模板参数之间的相互作用，需要这种间接级别。
template <class KeyType, class ValueType, class CompareType>
struct MRUCacheStandardMap {
  typedef std::map<KeyType, ValueType, CompareType> Type;
};

// 下面定义的 MRU 缓存特化的基类。
template <class KeyType,
          class PayloadType,
          class HashOrCompareType,
          template <typename, typename, typename> class MapType =
              MRUCacheStandardMap>
class MRUCacheBase {
 public:
  // 列表的有效载荷。这将保留键的副本，以便我们可以有效地删除列表中的元素。
  typedef std::pair<KeyType, PayloadType> value_type;

 private:
  typedef std::list<value_type> PayloadList;
  typedef typename MapType<KeyType,
                           typename PayloadList::iterator,
                           HashOrCompareType>::Type KeyIndex;

 public:
  typedef typename PayloadList::size_type size_type;

  typedef typename PayloadList::iterator iterator;
  typedef typename PayloadList::const_iterator const_iterator;
  typedef typename PayloadList::reverse_iterator reverse_iterator;
  typedef typename PayloadList::const_reverse_iterator const_reverse_iterator;

  enum { NO_AUTO_EVICT = 0 };

  // max_size 是插入新项目时缓存将剪裁其成员的大小。
  // 如果调用者想要自己管理（例如，可能在驱逐时需要执行特殊操作），
  // 可以传递 NO_AUTO_EVICT 以不限制缓存大小。
  explicit MRUCacheBase(size_type max_size) : max_size_(max_size) {}

  virtual ~MRUCacheBase() {}

  size_type max_size() const { return max_size_; }

  // 插入具有给定键的有效载荷项目。如果现有项目具有相同的键，
  // 则在插入之前将其删除。将返回指示插入项目的迭代器（这将始终位于列表的前面）。
  //
  // 有效载荷将被转发。
  template <typename Payload>
  iterator Put(const KeyType& key, Payload&& payload) {
    // 删除具有该键的任何现有有效载荷。
    typename KeyIndex::iterator index_iter = index_.find(key);
    if (index_iter != index_.end()) {
      // 删除对它的引用。下面的代码将替换索引引用。
      Erase(index_iter->second);
    } else if (max_size_ != NO_AUTO_EVICT) {
      // 正在插入新项目，这可能使其大于最大大小：如果需要，将最旧的项目踢出。
      ShrinkToSize(max_size_ - 1);
    }

    // 在链表的开头放置新的项目
    ordering_.emplace_front(key, std::forward<Payload>(payload));
    // 在索引中添加新的项目
    index_.emplace(key, ordering_.begin());
    // 返回指向新插入项目的迭代器
    return ordering_.begin();
  }

  // 获取给定键的内容，如果未找到，则返回 end()。此方法具有将请求的项目移动到最近列表前面的副作用。
  iterator Get(const KeyType& key) {
    typename KeyIndex::iterator index_iter = index_.find(key);
    if (index_iter == index_.end())
      return end();
    typename PayloadList::iterator iter = index_iter->second;

    // 将触摸的项目移动到最近排序的前面。
    ordering_.splice(ordering_.begin(), ordering_, iter);
    return ordering_.begin();
  }

  // 获取与给定键关联的有效载荷，并通过结果返回它，而不影响排序（与 Get 不同）。
  iterator Peek(const KeyType& key) {
    typename KeyIndex::const_iterator index_iter = index_.find(key);
    if (index_iter == index_.end())
      return end();
    return index_iter->second;
  }

  const_iterator Peek(const KeyType& key) const {
    typename KeyIndex::const_iterator index_iter = index_.find(key);
    if (index_iter == index_.end())
      return end();
    return index_iter->second;
  }

  // 通过 |other| 的内容交换 |this| 的内容。
  void Swap(MRUCacheBase& other) {
    ordering_.swap(other.ordering_);
    index_.swap(other.index_);
    std::swap(max_size_, other.max_size_);
  }

  // 删除给定迭代器引用的项目。将返回指向其后项目的迭代器。迭代器必须有效。
  iterator Erase(iterator pos) {
    index_.erase(pos->first);
    return ordering_.erase(pos);
  }

  // MRUCache 条目通常按反向顺序处理，因此我们添加了这个便利功能（STL 容器通常未定义）。
  reverse_iterator Erase(reverse_iterator pos) {
    // 我们必须实际给出要删除的增量迭代器，因为 base() 返回的正向迭代器实际上是迭代的项目之后的一个。
    return reverse_iterator(Erase((++pos).base()));
  }

  // 收缩缓存，使其只保留 |new_size| 个项目。如果 |new_size| 大于或等于当前项目数，此操作将不起作用。
  void ShrinkToSize(size_type new_size) {
    for (size_type i = size(); i > new_size; i--)
      Erase(rbegin());
  }

  // 从缓存中删除所有内容。
  void Clear() {
    index_.clear();
    ordering_.clear();
  }

  // 返回缓存中的元素数量。
  size_type size() const {
    // 我们不使用 ordering_.size() 作为返回值，因为（作为链表）它可能是 O(n)。
    DCHECK(index_.size() == ordering_.size());
    return index_.size();
  }

  // 允许遍历列表。正向迭代从最近的项目开始，向后进行。
  //
  // 请注意，由于这些迭代器实际上是列表的迭代器，您可以在插入或删除事物时保留它们
  // （只要不删除您指向的那个），它们仍然有效。
  iterator begin() { return ordering_.begin(); }
  const_iterator begin() const { return ordering_.begin(); }
  iterator end() { return ordering_.end(); }
  const_iterator end() const { return ordering_.end(); }

  reverse_iterator rbegin() { return ordering_.rbegin(); }
  const_reverse_iterator rbegin() const { return ordering_.rbegin(); }
  reverse_iterator rend() { return ordering_.rend(); }
  const_reverse_iterator rend() const { return ordering_.rend(); }

  bool empty() const { return ordering_.empty(); }

 private:
  PayloadList ordering_;  // 有效载荷列表
  KeyIndex index_;  // 键索引

  size_type max_size_;  // 缓存的最大大小

  DISALLOW_COPY_AND_ASSIGN(MRUCacheBase);
};

// MRUCache --------------------------------------------------------------------

// 不会释放其数据的容器。在列表中存储值类型（而不是指针）时使用。
template <class KeyType,
          class PayloadType,
          class CompareType = std::less<KeyType>>
class MRUCache : public MRUCacheBase<KeyType, PayloadType, CompareType> {
 private:
  using ParentType = MRUCacheBase<KeyType, PayloadType, CompareType>;

 public:
  // 参见 MRUCacheBase，注意使用 NO_AUTO_EVICT 的可能性。
  explicit MRUCache(typename ParentType::size_type max_size)
      : ParentType(max_size) {}
  virtual ~MRUCache() {}

 private:
  DISALLOW_COPY_AND_ASSIGN(MRUCache);
};

// HashingMRUCache ------------------------------------------------------------

template <class KeyType, class ValueType, class HashType>
struct MRUCacheHashMap {
  typedef std::unordered_map<KeyType, ValueType, HashType> Type;
};

// 此类类似于 MRUCache，只是它使用 std::unordered_map 作为映射类型，而不是 std::map。
// 请注意，您的 KeyType 必须是可哈希的，以便使用此缓存，或者您需要提供哈希类。
template <class KeyType, class PayloadType, class HashType = std::hash<KeyType>>
class HashingMRUCache
    : public MRUCacheBase<KeyType, PayloadType, HashType, MRUCacheHashMap> {
 private:
  using ParentType =
      MRUCacheBase<KeyType, PayloadType, HashType, MRUCacheHashMap>;

 public:
  // 参见 MRUCacheBase，注意使用 NO_AUTO_EVICT 的可能性。
  explicit HashingMRUCache(typename ParentType::size_type max_size)
      : ParentType(max_size) {}
  virtual ~HashingMRUCache() {}

 private:
  DISALLOW_COPY_AND_ASSIGN(HashingMRUCache);
};

}  // namespace base

#endif  // BASE_CONTAINERS_MRU_CACHE_H_

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2024-06-07，如有侵权请联系 cloudcommunity@tencent.com 删除

函数