leveldb研究系列六——Block的组件构建和读取

上一节我们已经详细表述了Ssttbale的组成，从逻辑结构和物理结构出发做了了解。那么我们在这里怎么创建一个Ssttable呢，时序肯定是这样的：对于一个<key,value> 我们先构建他的逻辑结构，先是构建Block数据部分，然后再是填写meta Block和索引部分，等到这些逻辑部分填充完毕我们在将他们写入磁盘Block(根据type启用压缩）

关键的便是Block数据的填充，即record的写入。

本篇我们将要讲述Block的构建和读取（查询）对于Block的构建是通过class BlockBuilder完成。类的声明如下

class BlockBuilder {
 public:
  explicit BlockBuilder(const Options* options);

  // Reset the contents as if the BlockBuilder was just constructed.
  void Reset();                                      //重设内容，通常在Finish之后调用已构建新的block

  // REQUIRES: Finish() has not been callled since the last call to Reset().
  // REQUIRES: key is larger than any previously added key
  void Add(const Slice& key, const Slice& value);         //添加数据<key，value>    

  // Finish building the block and return a slice that refers to the
  // block contents.  The returned slice will remain valid for the
  // lifetime of this builder or until Reset() is called.
  Slice Finish();                                   // 结束构建block，并返回指向block内容的指针 

  // Returns an estimate of the current (uncompressed) size of the block
  // we are building.
  size_t CurrentSizeEstimate() const;           // 返回正在构建block的未压缩大小估计值   

  // Return true iff no entries have been added since the last Reset()
  bool empty() const {
    return buffer_.empty();
  }

 private:
  const Options*        options_;
  std::string           buffer_;      // Destination buffer                         //目标Block内容
  std::vector<uint32_t> restarts_;    // Restart points                            //重启点
  int                   counter_;     // Number of entries emitted since restart    //重启点后面的入口数目
  bool                  finished_;    // Has Finish() been called?                  //结束标志
  std::string           last_key_;                                                   // 记录最后添加的key  

  // No copying allowed
  BlockBuilder(const BlockBuilder&);
  void operator=(const BlockBuilder&);

接口注释很齐全，那么我们在看看具体实现

下面的代码我们给出详细出注释

BlockBuilder::BlockBuilder(const Options* options)
    : options_(options),
      restarts_(),
      counter_(0),
      finished_(false) {
  assert(options->block_restart_interval >= 1);
  restarts_.push_back(0);       // First restart point is at offset 0  //重启点开始设为0
}

下面是重置reset方法，一般用在finish 之后，用于已经构建好的Block.

void BlockBuilder::Reset() {
  buffer_.clear();
  restarts_.clear();
  restarts_.push_back(0);       // First restart point is at offset 0
  counter_ = 0;
  finished_ = false;
  last_key_.clear();
}

接下来是Block大小的估算代码也很简单

size_t BlockBuilder::CurrentSizeEstimate() const {
  return (buffer_.size() +                        // Raw data buffer
          restarts_.size() * sizeof(uint32_t) +   // Restart array
          sizeof(uint32_t));                      // Restart array length
}

Add方法是重点采用前缀压缩，给出详细注释如下

void BlockBuilder::Add(const Slice& key, const Slice& value) {
  Slice last_key_piece(last_key_);                          
  assert(!finished_);
  assert(counter_ <= options_->block_restart_interval);
  assert(buffer_.empty() // No values yet?
         || options_->comparator->Compare(key, last_key_piece) > 0);
  size_t shared = 0;                                        //计算前缀长度
  if (counter_ < options_->block_restart_interval) {
    // See how much sharing to do with previous string
    const size_t min_length = std::min(last_key_piece.size(), key.size());   //取两者较小值
    while ((shared < min_length) && (last_key_piece[shared] == key[shared])) {
      shared++;                                                             //公共长度自增
    }
  } else {
    // Restart compression
    restarts_.push_back(buffer_.size());                                       //出错处理
    counter_ = 0;
  }
  const size_t non_shared = key.size() - shared;                            //非共享字符长度

  // Add "<shared><non_shared><value_size>" to buffer_
  PutVarint32(&buffer_, shared);      //写入shared字段	
  PutVarint32(&buffer_, non_shared);        //非共享字段
  PutVarint32(&buffer_, value.size());

  // Add string delta to buffer_ followed by value
  buffer_.append(key.data() + shared, non_shared);      //写入非共享资源	
  buffer_.append(value.data(), value.size());           //写入value值

  // Update state
  last_key_.resize(shared);
  last_key_.append(key.data() + shared, non_shared);
  assert(Slice(last_key_) == key);                       
  counter_++;                                                 //entries计数
}

以上我们已经全部解析出 Block的构建过程，下面我们重点谈谈Block的读取过程。读取是class Block完成的

class Block {
 public:
  // Initialize the block with the specified contents.
  explicit Block(const BlockContents& contents);

  ~Block();

  size_t size() const { return size_; }
  Iterator* NewIterator(const Comparator* comparator);     //用来遍历<key,value>

 private:
  uint32_t NumRestarts() const;

  const char* data_;                 //block数据指针
  size_t size_;                      //block数据大小
  uint32_t restart_offset_;     // Offset in data_ of restart array          //重启点数组偏移
  bool owned_;                  // Block owns data_[]

  // No copying allowed
  Block(const Block&);
  void operator=(const Block&);

  class Iter;

Block采用iter来遍历和读取<key,value>

遍历的过程是这样的：

1.找到相关重启点（key之前的一个重启点）

2.在两个重启点指间依次查找<key,value>

查找之前一个重启点的过程是二分法实现的，那么为什么重启点直接的record不能二分查找呢，原因在于record长度不一致

二分查找重启点代码如下

virtual void Seek(const Slice& target) {
    // Binary search in restart array to find the last restart point
    // with a key < target
    uint32_t left = 0;
    uint32_t right = num_restarts_ - 1;
    while (left < right) {
      uint32_t mid = (left + right + 1) / 2;
      uint32_t region_offset = GetRestartPoint(mid);
      uint32_t shared, non_shared, value_length;
      const char* key_ptr = DecodeEntry(data_ + region_offset,
                                        data_ + restarts_,
                                        &shared, &non_shared, &value_length);
      if (key_ptr == NULL || (shared != 0)) {
        CorruptionError();
        return;
      }
      Slice mid_key(key_ptr, non_shared);
      if (Compare(mid_key, target) < 0) {
        // Key at "mid" is smaller than "target".  Therefore all
        // blocks before "mid" are uninteresting.
        left = mid;
      } else {
        // Key at "mid" is >= "target".  Therefore all blocks at or
        // after "mid" are uninteresting.
        right = mid - 1;
      }
    }

得到重启点后便沿着重启点线性向下查找

   while (true) {
      if (!ParseNextKey()) return;
      if (Compare(key_, target)>= 0) return;
    }

到此为止我们便能够查找到目标<key,value>

以上就是Block的构建和读（查询过程）SStable的构建和查找基本上也是在这之上的，下篇说说bloom filter策略。

秒客网

leveldb研究系列六——Block的组件构建和读取

相关文章