Cassandra-0.7.0-beta1中的新特性

前一阵子Cassandra-0.7.0-beta1发布了，今天把代码拿下来粗略浏览了一下，发现主要有以下几点变化：

1 数据模型中的Keyspace和ColumnFamily可以动态修改：

之前的版本中，如果想在Cassandra中修改Keyspace和ColumnFamily，必须先停掉Cassandra，然后修改配置文件，最后再重启Cassandra才能生效。

在现在的版本中，我们只需要定义新的Keyspace和ColumnFamily，然后再调用Thrift接口将新的Keyspace和ColumnFamily定义发送给Cassandra即可。

相关的结构体和接口定义可以在cassandra.thrift文件中找到：

/* 相关结构体定义. */

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* describes a column family. */
struct CfDef {
    1: required string keyspace,
    2: required string name,
    3: optional string column_type="Standard",
    4: optional string clock_type="Timestamp",
    5: optional string comparator_type="BytesType",
    6: optional string subcomparator_type="",
    7: optional string reconciler="",
    8: optional string comment="",
    9: optional double row_cache_size=0,
    10: optional bool preload_row_cache=0,
    11: optional double key_cache_size=200000,
    12: optional double read_repair_chance=1.0
    13: optional list<ColumnDef> column_metadata
    14: optional i32 gc_grace_seconds
}

/* describes a keyspace. */
struct KsDef {
    1: required string name,
    2: required string strategy_class,
    3: optional map<string,string> strategy_options,
    4: required i32 replication_factor,
    5: required list<CfDef> cf_defs,
}

/* 相关接口定义. */

/** adds a column family. returns the new schema id. */
string system_add_column_family(1:required CfDef cf_def)
throws (1:InvalidRequestException ire),

/** drops a column family. returns the new schema id. */
string system_drop_column_family(1:required string column_family)
throws (1:InvalidRequestException ire),

/** renames a column family. returns the new schema id. */
string system_rename_column_family(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

/** adds a keyspace and any column families that are part of it. returns the new schema id. */
string system_add_keyspace(1:required KsDef ks_def)
throws (1:InvalidRequestException ire),

/** drops a keyspace and any column families that are part of it. returns the new schema id. */
string system_drop_keyspace(1:required string keyspace)
throws (1:InvalidRequestException ire),

/** renames a keyspace. returns the new schema id. */
string system_rename_keyspace(1:required string old_name, 2:required string new_name)
throws (1:InvalidRequestException ire),

2 增加二级索引，提供对Column的value进行查询的功能：

和几乎所有的K/V系统一样，Cassandra只能提供对key的查询，如果我们希望查询某一个key下的value值为一个特定值的情况，只能是将所有的数据取出来，然后遍历，或者使用一些其他的方案提供查询效率避免全表扫描。如：我之前的文章《反转Cassandra索引》，还有一个叫做Lucandra。

如果希望在新的版本中使用二级索引的功能，需要在ColumnFamily中指定要对哪个Column建立索引。同时指定的建立索引方式（目前只支持IndexType.KEYS）。

当包含索引的ColumnFamily在Cassandra建立的时候，Cassandra会额外为ColumnFamily中每一个需要建立索引的Column再建立独立的IndexedColumnFamily。

当写入数据的时候，数据不仅会出存储和数据相关的ColumnFamily中，IndexedColumnFamily中也会存储所有和本索引相关的数据。

当按照索引查询数据的时候，Cassandra将直接从IndexedColumnFamily查询相应的数据。

相关的结构体和接口定义可以在cassandra.thrift文件中找到：

/* 相关结构体定义. */

enum IndexType {
    KEYS,
}

/* describes a column in a column family. */
struct ColumnDef {
    1: required binary name,
    2: required string validation_class,
    3: optional IndexType index_type,
    4: optional string index_name
}

/* 相关接口定义. */

/** Returns the subset of columns specified in SlicePredicate for the rows matching the IndexClause */
list<KeySlice> get_indexed_slices(1:required ColumnParent column_parent,
				2:required IndexClause index_clause,
				3:required SlicePredicate column_predicate,
				4:required ConsistencyLevel consistency_level=ONE)
throws (1:InvalidRequestException ire, 2:UnavailableException ue, 3:TimedOutException te),

3 配置文件格式修改

新版本的Cassandra采用了yaml格式来进行配置，好处是可读性更好。

我们可以对比一下配置集群的名称这个选项，2中不同格式的区别：

老版本（storage-conf.xml）：

  
  
  
   
   
   
   
   
    
   
   
   <!--
   
   
    

   
   
   ~
   
   
    The name of 
   
   
   this
   
   
    cluster. This 
   
   
   is
   
   
    mainly used to prevent machines 
   
   
   in
   
   
   

   
   
   ~
   
   
    one logical cluster from joining another.

   
   
   -->
   
   
   

   
   
   <
   
   
   ClusterName
   
   
   >
   
   
   Test Cluster
   
   
   </
   
   
   ClusterName
   
   
   >

新版本（cassandra.yaml）：

  
  
  
   
   
   
   
   
   # name of the cluster
cluster_name: 
   
   
   '
   
   
   Test Cluster
   
   
   '

除此之外。还有大量的修改：

0.7-beta1
* sstable versioning (CASSANDRA-389)
* switched to slf4j logging (CASSANDRA-625)
* add (optional) expiration time for column (CASSANDRA-699)
* access levels for authentication/authorization (CASSANDRA-900)
* add ReadRepairChance to CF definition (CASSANDRA-930)
* fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
* convert to byte[] keys internally and all public APIs (CASSANDRA-767)
* ability to alter schema definitions on a live cluster (CASSANDRA-44)
* renamed configuration file to cassandra.xml, and log4j.properties to
   log4j-server.properties, which must now be loaded from
   the classpath (which is how our scripts in bin/ have always done it)
   (CASSANDRA-971)
* change get_count to require a SlicePredicate. create multi_get_count
   (CASSANDRA-744)
* re-organized endpointsnitch implementations and added SimpleSnitch
   (CASSANDRA-994)
* Added preload_row_cache option (CASSANDRA-946)
* add CRC to commitlog header (CASSANDRA-999)
* removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
* add truncate thrift method (CASSANDRA-531)
* http mini-interface using mx4j (CASSANDRA-1068)
* optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
* replace constant-size 2GB mmaped segments and special casing for index 
   entries spanning segment boundaries, with SegmentedFile that computes 
   segments that always contain entire entries/rows (CASSANDRA-1117)
* avoid reading large rows into memory during compaction (CASSANDRA-16)
* added hadoop OutputFormat (CASSANDRA-1101)
* efficient Streaming (no more anticompaction) (CASSANDRA-579)
* split commitlog header into separate file and add size checksum to
   mutations (CASSANDRA-1179)
* avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
* revise HH schema to be per-endpoint (CASSANDRA-1142)
* add joining/leaving status to nodetool ring (CASSANDRA-1115)
* allow multiple repair sessions per node (CASSANDRA-1190)
* optimize away MessagingService for local range queries (CASSANDRA-1261)
* make framed transport the default so malformed requests can't OOM the 
   server (CASSANDRA-475)
* significantly faster reads from row cache (CASSANDRA-1267)
* take advantage of row cache during range queries (CASSANDRA-1302)
* make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
* keep persistent row size and column count statistics (CASSANDRA-1155)
* add IntegerType (CASSANDRA-1282)
* page within a single row during hinted handoff (CASSANDRA-1327)
* push DatacenterShardStrategy configuration into keyspace definition,
   eliminating datacenter.properties. (CASSANDRA-1066)
* optimize forward slices starting with '' and single-index-block name 
   queries by skipping the column index (CASSANDRA-1338)
* streaming refactor (CASSANDRA-1189)
* faster comparison for UUID types (CASSANDRA-1043)
* secondary index support (CASSANDRA-749 and subtasks)

更多关于Cassandra的文章：http://www.cnblogs.com/gpcuster/tag/Cassandra/

秒客网

Cassandra-0.7.0-beta1中的新特性

相关文章