Graph database_neo4j 底层存储结构分析(7)

3.7 Relationship 的存储

下面是neo4j graph db 中，Relationship数据存储对应的文件：

neostore.relationshipgroupstore.db

neostore.relationshipgroupstore.db.id

neostore.relationshipstore.db

neostore.relationshipstore.db.id

neostore.relationshiptypestore.db

neostore.relationshiptypestore.db.id

neostore.relationshiptypestore.db.names

neostore.relationshiptypestore.db.names.id

neo4j 中, Relationship 的存储是由 RelationshipStore , RelationshipGroupStore, RelationshipTypeTokenStore和StringPropertyStore 4种类型的Store配合来完成的. 其中RelationshipStore 是Relationship最主要的存储结构；当一个Node 的关系数达到一定的阀值时，才会对关系分组(group), RelationshipGroupStore 用来保存关系分组数据；RelationshipTypeTokenStore和StringPropertyStore 配合用来存储关系的类型。

关系的类型的字符串描述值是存在StringPropertyStore这样的DynamicStore 中，如果长度超过一个block ，则分block存储，并将其在StringPropertyStore中的第1个block 的 block_id 保存到 RelationshipTypeTokenStore类型文件相应record 的name_id字段中。

ArrayPropertyStore的存储格式见< 3.3.2 DynamicStore 类型>，下面分别介绍一下RelationshipTypeTokenStore, RelationshipStore和RelationshipStore的文件存储格式。

3.7.1 RelationshipTypeTokenStore的主文件存储格式

类RelationshipTypeTokenStore对应的存储文件是neostore.relationshiptypestore.db，其对应的存储格式如上图所示:是一个长度为 RECORD_SIZE=5 Bytes 的 record 数组和和一个字符串描述符“RelationshipTypeStore v0.A.2”(文件类型描述TYPE_DESCRIPTOR和 neo4j 的 ALL_STORES_VERSION) 构成。访问时，可以通过 token_id 作为数组的下标进行访问。

record 是有 1Byte的 in_use 和 4Bytes 的 name_id 构成。

3.7.2 RelationshipStore的文件存储格式

类RelationshipTypeTokenStore对应的存储文件是neostore.relationshipstore.db,其文件存储格式示意图如下，整个文件是有一个 RECORD_SIZE=34Bytes 的定长数组和一个字符串描述符“RelationshipStore v0.A.2”(文件类型描述TYPE_DESCRIPTOR和 neo4j 的 ALL_STORES_VERSION构成)。访问时，可以通过 node_id 作为数组的下标进行访问。

 
        </pre> 
       
        < 
        div 
        > 
        // record header size 
       
        // directed|in_use(byte)+first_node(int)+second_node(int)+rel_type(int)+ 
       
        // first_prev_rel_id(int)+first_next_rel_id+second_prev_rel_id(int)+ 
       
        // second_next_rel_id+next_prop_id(int)+first-in-chain-markers(1) 
       
        public 
        static 
        final  
        int 
        RECORD_SIZE = 34;</ 
        div 
        > 
       
        <pre>

下面介绍一下 relationship record 中每个字段的含义：

in_use(1 Byte) : 第 1 字节, 分成3部分.

// [pppp,nnnx]
// [    ,   x] in use flag
// [    ,nnn ] first node high order bits
// [pppp,    ] next prop high order bits
第1 bit 表示 record 是否在 use;
第2~4 bit 表示first_node的node_id的高3位；
第 5~8 bit表示 next_prop_id 的property_id 的高4位

first_node(4 Bytes) : 第2~5字节是RelationShip的from_node 的node_id 的低32位. 加上inUse 字节的第 2~4 bit 作为高3位，构成一个完整的35位node_id。
second_node(4 Bytes) : 第6~9字节是RelationShip的to_node 的node_id 的低32位. 加上rel_type的第29~31 bit作为高3位，构成一个完整的35位node_id。
rel_type(4 Bytes) : 第 10~13 字节, 分成6部分;

// [ xxx, ][ , ][ , ][ , ] second node high order bits, 0×70000000

// [ ,xxx ][ , ][ , ][ , ] first prev rel high order bits, 0xE000000// [ , x][xx , ][ , ][ , ] first next rel high order bits, 0x1C00000// [ , ][ xx,x ][ , ][ , ] second prev rel high order bits, 0×380000// [ , ][ , xxx][ , ][ , ] second next rel high order bits, 0×70000

// [ , ][ , ][xxxx,xxxx][xxxx,xxxx] type

第29~31 位是second_node 的node_id高3位;
第26~28 位是first_next_rel_id 的 relationship_id高3位;
第23~25 位是first_next_rel_id 的relationship_id高3位;
第20~22 位是second_prev_rel_id 的relationship_id高3位;
第17~19 位是second_next_rel_id 的relationship_id高3位;
第 1~16 位表示 RelationShipType;

first_prev_rel_id(4 Bytes) : 第14~17字节是from_node 的排在本RelationShip 前面一个RelationShip的 relationship_id 的低32位. 加上rel_type的第 26~28 bit 作为高3位，构成一个完整的35位relationship_id。
first_next_rel_id(4 Bytes) : 第18~21字节是from_node 的排在本RelationShip 前面一个RelationShip的 relationship_id 的低32位. 加上rel_type的第 23~25 bit 作为高3位，构成一个完整的35位relationship_id。
second_prev_rel_id(4 Bytes) : 第22~25字节是from_node 的排在本RelationShip 前面一个RelationShip的 relationship_id 的低32位. 加上rel_type的第 20~22 bit 作为高3位，构成一个完整的35位relationship_id。
second_next_rel_id(4 Bytes) : 第26~29字节是from_node 的排在本RelationShip 前面一个RelationShip的 relationship_id 的低32位. 加上rel_type的第 17~19 bit 作为高3位，构成一个完整的35位relationship_id。
next_prop_id(4 Bytes) : 第30~33字节是本RelationShip第1个Property的property_id 的低32位. 加上in_use的第 5~8 bit 作为高3位，构成一个完整的36 位property_id。
first-in-chain-markers(1 Byte) : 目前只用了第1位和第2位，其作用笔者还没搞清楚。

3.7.2.1 RelationshipStore.java

与neostore.relationshipstore.db文件相对应的类是RelationshipStore,负责RelationshipRecord从neostore.relationshipstore.db文件的读写。下面看一下 neostore.relationshipstore.db 中 getRecord 成员函数，可以帮助理解 Relationship Record 的存储格式。

 
        </pre> 
       
        < 
        div 
        > 
       
        private 
        RelationshipRecord getRecord(  
        long 
        id, PersistenceWindow window,RecordLoad load ) 
       
        { 
       
        Buffer buffer = window.getOffsettedBuffer( id ); 
       
        // [    ,   x] in use flag 
       
        // [    ,xxx ] first node high order bits 
       
        // [xxxx,    ] next prop high order bits 
       
        long 
        inUseByte = buffer.get(); 
       
        boolean inUse = (inUseByte & 0x1) == Record.IN_USE.intValue(); 
       
        if 
        ( !inUse ) 
       
        { 
       
        switch 
        ( load ) 
       
        { 
       
        case 
        NORMAL: 
       
        throw 
        new 
        InvalidRecordException(  
        "RelationshipRecord[" 
        + id +  
        "] not in use" 
        ); 
       
        case 
        CHECK: 
       
        return 
        null; 
       
        } 
       
        } 
       
        long 
        firstNode = buffer.getUnsignedInt(); 
       
        long 
        firstNodeMod = (inUseByte & 0xEL) << 31; 
       
        long 
        secondNode = buffer.getUnsignedInt(); 
       
        // [ xxx,    ][    ,    ][    ,    ][    ,    ] second node high order bits,     0x70000000 
       
        // [    ,xxx ][    ,    ][    ,    ][    ,    ] first prev rel high order bits,  0xE000000 
       
        // [    ,   x][xx  ,    ][    ,    ][    ,    ] first next rel high order bits,  0x1C00000 
       
        // [    ,    ][  xx,x   ][    ,    ][    ,    ] second prev rel high order bits, 0x380000 
       
        // [    ,    ][    , xxx][    ,    ][    ,    ] second next rel high order bits, 0x70000 
       
        // [    ,    ][    ,    ][xxxx,xxxx][xxxx,xxxx] type 
       
        long 
        typeInt = buffer.getInt(); 
       
        long 
        secondNodeMod = (typeInt & 0x70000000L) << 4; 
       
        int 
        type = ( 
        int 
        )(typeInt & 0xFFFF); 
       
        RelationshipRecord record =  
        new 
        RelationshipRecord( id, 
       
        longFromIntAndMod( firstNode, firstNodeMod ), 
       
        longFromIntAndMod( secondNode, secondNodeMod ), type ); 
       
        record.setInUse( inUse ); 
       
        long 
        firstPrevRel = buffer.getUnsignedInt(); 
       
        long 
        firstPrevRelMod = (typeInt & 0xE000000L) << 7; 
       
        record.setFirstPrevRel( longFromIntAndMod( firstPrevRel, firstPrevRelMod ) ); 
       
        long 
        firstNextRel = buffer.getUnsignedInt(); 
       
        long 
        firstNextRelMod = (typeInt & 0x1C00000L) << 10; 
       
        record.setFirstNextRel( longFromIntAndMod( firstNextRel, firstNextRelMod ) ); 
       
        long 
        secondPrevRel = buffer.getUnsignedInt(); 
       
        long 
        secondPrevRelMod = (typeInt & 0x380000L) << 13; 
       
        record.setSecondPrevRel( longFromIntAndMod( secondPrevRel, secondPrevRelMod ) ); 
       
        long 
        secondNextRel = buffer.getUnsignedInt(); 
       
        long 
        secondNextRelMod = (typeInt & 0x70000L) << 16; 
       
        record.setSecondNextRel( longFromIntAndMod( secondNextRel, secondNextRelMod ) ); 
       
        long 
        nextProp = buffer.getUnsignedInt(); 
       
        long 
        nextPropMod = (inUseByte & 0xF0L) << 28; 
       
        byte extraByte = buffer.get(); 
       
        record.setFirstInFirstChain( (extraByte & 0x1) != 0 ); 
       
        record.setFirstInSecondChain( (extraByte & 0x2) != 0 ); 
       
        record.setNextProp( longFromIntAndMod( nextProp, nextPropMod ) ); 
       
        return 
        record; 
       
        }

3.7.3 RelationshipGroupStore类型的存储格式

当Node的Relationship数量超过一个阀值时，neo4j 会对 Relationship 进行分组，以便提供性能。neo4j 中用来实现这一功能的类是 RelationshipGroupStore.

其对应的文件存储格式如下：

整个文件是有一个 RECORD_SIZE=20Bytes 的定长数组和一个字符串“RelationshipGroupStore v0.A.2”(文件类型描述TYPE_DESCRIPTOR和 neo4j 的 ALL_STORES_VERSION构成)。访问时，可以通过 id 作为数组的下标进行访问。数组下标为0的 record 前4 Bytes 保存Relationship分组的阀值。

RelationshipGroupStore 的record 的格式如下：

inUse(1 Byte):第1字节,共分成4部分

// [ , x] in use

// [ ,xxx ] high next id bits

// [ xxx, ] high firstOut bits

long inUseByte = buffer.get();

第1 bit：表示 record 是否在 use;
第2~4 bit：表示 next 的高3位；
第 5~7 bit：表示 firstOut高3位
第8 bit：没有用。

highByte(1 Byte):第1字节,共分成4部分