Google Cloud Bigtable与Google Cloud Datastore

时间:2022-04-30 15:52:03

What is the difference between Google Cloud Bigtable and Google Cloud Datastore / App Engine datastore, and what are the main practical advantages/disadvantages? AFAIK Cloud Datastore is build on top of Bigtable.

Google Cloud Bigtable和Google Cloud Datastore / App Engine数据存储区之间有什么区别,主要的实际优点/缺点是什么? AFAIK Cloud Datastore构建于Bigtable之上。

7 个解决方案

#1


Based on experience with Datastore and reading the Bigtable docs, the main differences are:

根据Datastore的经验和阅读Bigtable文档,主要区别是:

  • Bigtable seems to be designed for HBase compatibility, whereas Datastore is more geared towards Python/Java/Go web app developers (originally App Engine)
  • Bigtable似乎是为HBase兼容而设计的,而Datastore则更适合Python / Java / Go Web应用程序开发人员(最初是App Engine)

  • Bigtable is 'a bit more IaaS' than Datastore in that it's not 'just there' but requires a cluster to be configured.
  • Bigtable比数据存储更“IaaS”,因为它不仅仅是“存在”,而是需要配置集群。

  • Bigtable supports only one index - the 'row key' (the entity key in Datastore)
    • This means queries are on the Key, unlike Datastore's indexed properties
    • 这意味着查询位于Key上,与Datastore的索引属性不同

  • Bigtable仅支持一个索引 - “行密钥”(数据存储区中的实体密钥)这意味着查询位于密钥上,与数据存储区的索引属性不同

  • Bigtable supports atomicity only on a single row - there are no transactions
  • Bigtable仅支持单行的原子性 - 没有事务

  • Mutations and deletions appear not to be atomic in Bigtable, whereas Datastore provides eventual and strong consistency, depending on the read/query method
  • 变异和删除在Bigtable中似乎不是原子的,而数据存储提供最终和强一致性,具体取决于读/查询方法

  • The billing model is very different:
    • Datastore charges for read/write operations, storage and bandwidth
    • 数据存储区对读/写操作,存储和带宽收费

    • Bigtable charges for 'nodes', storage and bandwidth
    • 对“节点”,存储和带宽的巨额收费

  • 计费模式非常不同:数据存储对读/写操作,存储和带宽收费“节点”,存储和带宽的Bigtable收费

#2


Bigtable is optimized for high volumes of data and analytics

  • Cloud Bigtable doesn’t replicate data across zones or regions (data within a single cluster is replicated and durable), which means Bigtable is faster and more efficient, and costs are much lower, though it is less durable and available in the default configuration
  • Cloud Bigtable不会跨区域或区域复制数据(单个群集中的数据被复制且持久),这意味着Bigtable更快,更高效,成本也更低,尽管它不太耐用并且在默认配置中可用

  • It uses the HBase API - there’s no risk of lock-in or new paradigms to learn
  • 它使用HBase API - 不存在锁定或新范例学习的风险

  • It is integrated with the open-source Big Data tools, meaning you can analyze the data stored in Bigtable in most analytics tools customers use (Hadoop, Spark, etc.)
  • 它与开源大数据工具集成,这意味着您可以在客户使用的大多数分析工具(Hadoop,Spark等)中分析存储在Bigtable中的数据。

  • Bigtable is indexed by a single Row Key
  • Bigtable由单个Row Key索引

  • Bigtable is in a single zone
  • Bigtable位于单一区域

Cloud Bigtable is designed for larger companies and enterprises who often have larger data needs with complex backend workloads.

Cloud Bigtable专为大型公司和企业而设计,这些公司和企业通常具有较大的数据需求和复杂的后端工作负载。

Datastore is optimized to serve high-value transactional data to applications

  • Cloud Datastore has extremely high availability with replication and data synchronization
  • Cloud Datastore具有极高的可用性,可用于复制和数据同步

  • Datastore, because of its versatility and high availability, is more expensive
  • 数据存储由于其多功能性和高可用性而更加昂贵

  • Datastore is slower writing data due to synchronous replication
  • 由于同步复制,数据存储区写入数据的速度较慢

  • Datastore has much better functionality around transactions and queries (since secondary indexes exist)
  • 数据存储区在事务和查询方面具有更好的功能(因为存在二级索引)

#3


Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of wheels, and so a car is not much different from wheels.

Bigtable和Datastore非常不同。是的,数据存储区建立在Bigtable之上,但这并不像它那样。这有点像说汽车是在车轮顶部建造的,因此汽车与车轮没什么不同。

Bigtable and Datastore provide very different data models and very different semantics in how the data is changed.

Bigtable和Datastore提供了截然不同的数据模型,并且在数据更改方式上有着截然不同的语义。

The main difference is that the Datastore provides SQL-database-like ACID transactions on subsets of the data known as entity groups (though the query language GQL is much more restrictive than SQL). Bigtable is strictly NoSQL and comes with much weaker guarantees.

主要区别在于数据存储区在称为实体组的数据子集上提供类似SQL数据库的ACID事务(尽管查询语言GQL比SQL更具限制性)。 Bigtable严格来说是NoSQL,并且提供了更少的保证。

#4


If you read papers, BigTable is this and Datastore is MegaStore. Datastore is BigTable plus replication, transaction, and index. (and is much more expensive).

如果您阅读论文,BigTable就是这样,而Datastore就是MegaStore。数据存储区是BigTable以及复制,事务和索引。 (而且要贵得多)。

#5


A relatively minor point to consider, as of November 2016, bigtable python client library is still in Alpha, which means the future change might not be backward compatible. Also, bigtable python library is not compatible with App Engine's standard environment. You have to use the flexible one.

一个相对较小的一点需要考虑,截至2016年11月,bigtable python客户端库仍处于Alpha状态,这意味着未来的更改可能不会向后兼容。此外,bigtable python库与App Engine的标准环境不兼容。你必须使用灵活的。

#6


I am going to try to summarize all the answers above plus what is given in Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals

我将尝试总结以上所有答案以及Coursea Google Cloud Platform大数据和机器学习基础知识中给出的内容

+---------------------+------------------------------------------------------------------+------------------------------------------+--+
|      Category       |                             BigTable                             |                Datastore                 |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+
| Technology          | Based on HBase(uses HBase API)                                   | Uses BigTable itself                     |  |
| ----------------    |                                                                  |                                          |  |
| Access Mataphor     | Key/Value (column-families) like Hbase                           | Persistent hashmap                       |  |
| ----------------    |                                                                  |                                          |  |
| Read                | Scan Rows                                                        | Filter Objects on property               |  |
| ----------------    |                                                                  |                                          |  |
| Write               | Put Row                                                          | Put Object                               |  |
| ----------------    |                                                                  |                                          |  |
| Update Granularity  | can't update row ( you should write a new row, can't update one) | can update attribute                     |  |
| ----------------    |                                                                  |                                          |  |
| Capacity            | Petabytes                                                        | Terbytes                                 |  |
| ----------------    |                                                                  |                                          |  |
| Index               | Index key only (you should properly design the key)              | You can index any property of the object |  |
| Usage and use cases | High throughput, scalable flatten data                           | Structured data for Google App Engine    |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+

Check this image too: Google Cloud Bigtable与Google Cloud Datastore

也检查此图片:

Google Cloud Bigtable与Google Cloud Datastore

#7


I just found this useful analogy buried in the length page about eventual consistency in the Datastore documentation (emphasis mine):

我刚刚在长度页面中找到了这个有用的类比,关于数据存储文档中的最终一致性(强调我的):

One practice is to combine Cloud Datastore and BigQuery to fulfill different business requirements. Use Cloud Datastore for online transactional processing (OLTP) required for core application logic and use BigQuery for online analytical processing (OLAP) for backend operations. It may be necessary to implement a continuous data export flow from Cloud Datastore to BigQuery to move the data necessary for those queries.

一种做法是结合使用Cloud Datastore和BigQuery来满足不同的业务需求。使用Cloud Datastore进行核心应用程序逻辑所需的联机事务处理(OLTP),并将BigQuery用于后端操作的联机分析处理(OLAP)。可能需要实施从Cloud Datastore到BigQuery的连续数据导出流,以移动这些查询所需的数据。

#1


Based on experience with Datastore and reading the Bigtable docs, the main differences are:

根据Datastore的经验和阅读Bigtable文档,主要区别是:

  • Bigtable seems to be designed for HBase compatibility, whereas Datastore is more geared towards Python/Java/Go web app developers (originally App Engine)
  • Bigtable似乎是为HBase兼容而设计的,而Datastore则更适合Python / Java / Go Web应用程序开发人员(最初是App Engine)

  • Bigtable is 'a bit more IaaS' than Datastore in that it's not 'just there' but requires a cluster to be configured.
  • Bigtable比数据存储更“IaaS”,因为它不仅仅是“存在”,而是需要配置集群。

  • Bigtable supports only one index - the 'row key' (the entity key in Datastore)
    • This means queries are on the Key, unlike Datastore's indexed properties
    • 这意味着查询位于Key上,与Datastore的索引属性不同

  • Bigtable仅支持一个索引 - “行密钥”(数据存储区中的实体密钥)这意味着查询位于密钥上,与数据存储区的索引属性不同

  • Bigtable supports atomicity only on a single row - there are no transactions
  • Bigtable仅支持单行的原子性 - 没有事务

  • Mutations and deletions appear not to be atomic in Bigtable, whereas Datastore provides eventual and strong consistency, depending on the read/query method
  • 变异和删除在Bigtable中似乎不是原子的,而数据存储提供最终和强一致性,具体取决于读/查询方法

  • The billing model is very different:
    • Datastore charges for read/write operations, storage and bandwidth
    • 数据存储区对读/写操作,存储和带宽收费

    • Bigtable charges for 'nodes', storage and bandwidth
    • 对“节点”,存储和带宽的巨额收费

  • 计费模式非常不同:数据存储对读/写操作,存储和带宽收费“节点”,存储和带宽的Bigtable收费

#2


Bigtable is optimized for high volumes of data and analytics

  • Cloud Bigtable doesn’t replicate data across zones or regions (data within a single cluster is replicated and durable), which means Bigtable is faster and more efficient, and costs are much lower, though it is less durable and available in the default configuration
  • Cloud Bigtable不会跨区域或区域复制数据(单个群集中的数据被复制且持久),这意味着Bigtable更快,更高效,成本也更低,尽管它不太耐用并且在默认配置中可用

  • It uses the HBase API - there’s no risk of lock-in or new paradigms to learn
  • 它使用HBase API - 不存在锁定或新范例学习的风险

  • It is integrated with the open-source Big Data tools, meaning you can analyze the data stored in Bigtable in most analytics tools customers use (Hadoop, Spark, etc.)
  • 它与开源大数据工具集成,这意味着您可以在客户使用的大多数分析工具(Hadoop,Spark等)中分析存储在Bigtable中的数据。

  • Bigtable is indexed by a single Row Key
  • Bigtable由单个Row Key索引

  • Bigtable is in a single zone
  • Bigtable位于单一区域

Cloud Bigtable is designed for larger companies and enterprises who often have larger data needs with complex backend workloads.

Cloud Bigtable专为大型公司和企业而设计,这些公司和企业通常具有较大的数据需求和复杂的后端工作负载。

Datastore is optimized to serve high-value transactional data to applications

  • Cloud Datastore has extremely high availability with replication and data synchronization
  • Cloud Datastore具有极高的可用性,可用于复制和数据同步

  • Datastore, because of its versatility and high availability, is more expensive
  • 数据存储由于其多功能性和高可用性而更加昂贵

  • Datastore is slower writing data due to synchronous replication
  • 由于同步复制,数据存储区写入数据的速度较慢

  • Datastore has much better functionality around transactions and queries (since secondary indexes exist)
  • 数据存储区在事务和查询方面具有更好的功能(因为存在二级索引)

#3


Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of wheels, and so a car is not much different from wheels.

Bigtable和Datastore非常不同。是的,数据存储区建立在Bigtable之上,但这并不像它那样。这有点像说汽车是在车轮顶部建造的,因此汽车与车轮没什么不同。

Bigtable and Datastore provide very different data models and very different semantics in how the data is changed.

Bigtable和Datastore提供了截然不同的数据模型,并且在数据更改方式上有着截然不同的语义。

The main difference is that the Datastore provides SQL-database-like ACID transactions on subsets of the data known as entity groups (though the query language GQL is much more restrictive than SQL). Bigtable is strictly NoSQL and comes with much weaker guarantees.

主要区别在于数据存储区在称为实体组的数据子集上提供类似SQL数据库的ACID事务(尽管查询语言GQL比SQL更具限制性)。 Bigtable严格来说是NoSQL,并且提供了更少的保证。

#4


If you read papers, BigTable is this and Datastore is MegaStore. Datastore is BigTable plus replication, transaction, and index. (and is much more expensive).

如果您阅读论文,BigTable就是这样,而Datastore就是MegaStore。数据存储区是BigTable以及复制,事务和索引。 (而且要贵得多)。

#5


A relatively minor point to consider, as of November 2016, bigtable python client library is still in Alpha, which means the future change might not be backward compatible. Also, bigtable python library is not compatible with App Engine's standard environment. You have to use the flexible one.

一个相对较小的一点需要考虑,截至2016年11月,bigtable python客户端库仍处于Alpha状态,这意味着未来的更改可能不会向后兼容。此外,bigtable python库与App Engine的标准环境不兼容。你必须使用灵活的。

#6


I am going to try to summarize all the answers above plus what is given in Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals

我将尝试总结以上所有答案以及Coursea Google Cloud Platform大数据和机器学习基础知识中给出的内容

+---------------------+------------------------------------------------------------------+------------------------------------------+--+
|      Category       |                             BigTable                             |                Datastore                 |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+
| Technology          | Based on HBase(uses HBase API)                                   | Uses BigTable itself                     |  |
| ----------------    |                                                                  |                                          |  |
| Access Mataphor     | Key/Value (column-families) like Hbase                           | Persistent hashmap                       |  |
| ----------------    |                                                                  |                                          |  |
| Read                | Scan Rows                                                        | Filter Objects on property               |  |
| ----------------    |                                                                  |                                          |  |
| Write               | Put Row                                                          | Put Object                               |  |
| ----------------    |                                                                  |                                          |  |
| Update Granularity  | can't update row ( you should write a new row, can't update one) | can update attribute                     |  |
| ----------------    |                                                                  |                                          |  |
| Capacity            | Petabytes                                                        | Terbytes                                 |  |
| ----------------    |                                                                  |                                          |  |
| Index               | Index key only (you should properly design the key)              | You can index any property of the object |  |
| Usage and use cases | High throughput, scalable flatten data                           | Structured data for Google App Engine    |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+

Check this image too: Google Cloud Bigtable与Google Cloud Datastore

也检查此图片:

Google Cloud Bigtable与Google Cloud Datastore

#7


I just found this useful analogy buried in the length page about eventual consistency in the Datastore documentation (emphasis mine):

我刚刚在长度页面中找到了这个有用的类比,关于数据存储文档中的最终一致性(强调我的):

One practice is to combine Cloud Datastore and BigQuery to fulfill different business requirements. Use Cloud Datastore for online transactional processing (OLTP) required for core application logic and use BigQuery for online analytical processing (OLAP) for backend operations. It may be necessary to implement a continuous data export flow from Cloud Datastore to BigQuery to move the data necessary for those queries.

一种做法是结合使用Cloud Datastore和BigQuery来满足不同的业务需求。使用Cloud Datastore进行核心应用程序逻辑所需的联机事务处理(OLTP),并将BigQuery用于后端操作的联机分析处理(OLAP)。可能需要实施从Cloud Datastore到BigQuery的连续数据导出流,以移动这些查询所需的数据。