将Btree保存到磁盘文件并读取它

时间:2021-05-31 21:43:18

I want to save a Btree(not sure a binary one) in a disk file. and then read it to the memory. some Level-order traversal may be a good way for a binary Btree. but if it is not a binary one. I build up the Btree from the leafnode to the rootnode in memory. I believe that I have to define some structures in the disk file and output the tree nodes. using some extra tag to identify a node in the file? how to traversal may be the key problem here. I coudn't figure out a good way to save the nodes and the pointers. and then read it. rconstruct the tree in memory. any good ideas?. thanks a lot.

我想在磁盘文件中保存一个Btree(不确定二进制文件)。然后将其读入内存。某些Level-order遍历可能是二进制Btree的好方法。但如果它不是二元的那个。我将叶子节点中的Btree构建到内存中的根节点。我相信我必须在磁盘文件中定义一些结构并输出树节点。使用一些额外的标签来识别文件中的节点?如何遍历可能是这里的关键问题。我不知道保存节点和指针的好方法。然后阅读它。在记忆中构建树。有什么好主意吗?非常感谢。

4 个解决方案

#1


If you really want to do something similar, you can just assign at each node an id and save the nodes in that format:

如果你真的想做类似的事情,你可以在每个节点分配一个id并以这种格式保存节点:

[node-id value left-node-id right-node-id]

[node-id value left-node-id right-node-id]

and then visit the tree with a breadth-first search.

然后使用广度优先搜索来访问树。

When you want to reconstruct the tree, create a map id->node and then read backward the file: so, when you read a record, create the node, register it to the map and assign the left and right node fetching the nodes from the map.

当您想要重建树时,创建一个map id-> node然后向后读取文件:因此,当您读取记录时,创建节点,将其注册到地图并指定左右节点从中获取节点地图。

#2


The usual technique for B-Trees is to ensure that the size of a node is equal to the block size of the disk, and mmap the disk file. You don't specify what programming language you're working in, so it might be as simple as a cast in C, or something more complicated such as creating flyweight objects to wrap up a java.nio.IntBuffer. Either way, much of the advantage of the B-tree is that you don't have to load it all at once, but can jump around it fairly efficiently.

B-Trees的常用技术是确保节点的大小等于磁盘的块大小,并对磁盘文件进行mmap。您没有指定您正在使用的编程语言,因此它可能像C中的强制转换一样简单,或者更复杂的东西,例如创建flyweight对象来包装java.nio.IntBuffer。无论哪种方式,B树的许多优点是你不必一次加载它,但可以相当有效地跳转它。

#3


For each node define some data structure which will keep for you the same information the node has and add to that structure additional field which will mark for you the offset in the file for next son. And make the top field of that structure it's actual size, since you don't know what kind of tree you are looking now. Now by jumping over file you will be able to reconstruct your tree. I'm sure my solution is not final, but I hope it could be the good star point for you.

对于每个节点定义一些数据结构,它将为您保留节点具有的相同信息,并添加到该结构的附加字段,该字段将标记下一个子文件中的偏移量。并且使该结构的顶部区域成为实际大小,因为您现在不知道您正在寻找什么样的树。现在通过跳过文件,您将能够重建您的树。我确信我的解决方案不是最终的,但我希望它对你来说可能是一个很好的明星点。

#4


You might want to check out Protocol Buffers. They're compact, binary, extensible, easy to read and write, and available in C++, Java and Python (as well as third party implementations in other languages).

您可能想要查看协议缓冲区。它们是紧凑的,二进制的,可扩展的,易于读写,并且可用于C ++,Java和Python(以及其他语言的第三方实现)。

You can define a protocol buffer message for a BTree node, with file offsets for child nodes, and simply serialize it to disk in the obvious manner.

您可以为BTree节点定义协议缓冲区消息,为子节点定义文件偏移量,并以明显的方式将其序列化为磁盘。

#1


If you really want to do something similar, you can just assign at each node an id and save the nodes in that format:

如果你真的想做类似的事情,你可以在每个节点分配一个id并以这种格式保存节点:

[node-id value left-node-id right-node-id]

[node-id value left-node-id right-node-id]

and then visit the tree with a breadth-first search.

然后使用广度优先搜索来访问树。

When you want to reconstruct the tree, create a map id->node and then read backward the file: so, when you read a record, create the node, register it to the map and assign the left and right node fetching the nodes from the map.

当您想要重建树时,创建一个map id-> node然后向后读取文件:因此,当您读取记录时,创建节点,将其注册到地图并指定左右节点从中获取节点地图。

#2


The usual technique for B-Trees is to ensure that the size of a node is equal to the block size of the disk, and mmap the disk file. You don't specify what programming language you're working in, so it might be as simple as a cast in C, or something more complicated such as creating flyweight objects to wrap up a java.nio.IntBuffer. Either way, much of the advantage of the B-tree is that you don't have to load it all at once, but can jump around it fairly efficiently.

B-Trees的常用技术是确保节点的大小等于磁盘的块大小,并对磁盘文件进行mmap。您没有指定您正在使用的编程语言,因此它可能像C中的强制转换一样简单,或者更复杂的东西,例如创建flyweight对象来包装java.nio.IntBuffer。无论哪种方式,B树的许多优点是你不必一次加载它,但可以相当有效地跳转它。

#3


For each node define some data structure which will keep for you the same information the node has and add to that structure additional field which will mark for you the offset in the file for next son. And make the top field of that structure it's actual size, since you don't know what kind of tree you are looking now. Now by jumping over file you will be able to reconstruct your tree. I'm sure my solution is not final, but I hope it could be the good star point for you.

对于每个节点定义一些数据结构,它将为您保留节点具有的相同信息,并添加到该结构的附加字段,该字段将标记下一个子文件中的偏移量。并且使该结构的顶部区域成为实际大小,因为您现在不知道您正在寻找什么样的树。现在通过跳过文件,您将能够重建您的树。我确信我的解决方案不是最终的,但我希望它对你来说可能是一个很好的明星点。

#4


You might want to check out Protocol Buffers. They're compact, binary, extensible, easy to read and write, and available in C++, Java and Python (as well as third party implementations in other languages).

您可能想要查看协议缓冲区。它们是紧凑的,二进制的,可扩展的,易于读写,并且可用于C ++,Java和Python(以及其他语言的第三方实现)。

You can define a protocol buffer message for a BTree node, with file offsets for child nodes, and simply serialize it to disk in the obvious manner.

您可以为BTree节点定义协议缓冲区消息,为子节点定义文件偏移量,并以明显的方式将其序列化为磁盘。