关于非关系数据库的问题(NoSQL)

时间:2022-11-04 08:47:49

Although I've not yet used any of the new NoSQL databases I've tried to keep myself informed by reading Wikipedia articles, blogs and the peeking into some of the NoSQL DBs documentation.

虽然我还没有使用任何新的NoSQL数据库,但我试图通过阅读*的文章,博客和窥视一些NoSQL DBs文档来了解自己。

I've just (re)read the August 2009 edition of php|architect, specifically the article about the Non-Relation Databases and a few questions popped up in my head, I understand that the article is pretty light on the subject but it was enough to get me confused...

我刚刚(重新)阅读了2009年8月版的php | architect,特别是关于非关系数据库的文章以及我头脑中出现的一些问题,我理解这篇文章对这个主题非常清楚,但它是足以让我困惑......

CouchDB

My main question regarding CouchDB is why so much hype?. From what I understood CouchDB provides a Web Service that lets you create databases and documents inside the database, the documents can have several JSON-encoded attributes and also have a special _id and _rev attribute for tracking revisions of the document.

关于CouchDB的主要问题是为什么这么多炒作?根据我的理解,CouchDB提供了一个允许您在数据库中创建数据库和文档的Web服务,这些文档可以具有多个JSON编码的属性,并且还具有用于跟踪文档修订的特殊_id和_rev属性。

I really don't get all the fuss about this, some years ago for a pet project I coded a similar (?) system for storing documents and the structure was something like this:

我真的不会对此大惊小怪,几年前我为一个宠物项目编写了一个用于存储文档的类似(?)系统,结构是这样的:

documents/
  document-name/
    (revision) timestamp/
      (contents) md5-hash.txt
        PHP Serialized Data

I'm sure I'm missing something very fundamental, otherwise (from the viewpoint of a PHP developer) this would have the same benefits as CouchDB and be faster - no need to encode and decode JSON.

我确信我缺少一些非常基础的东西,否则(从PHP开发人员的角度来看)这将与CouchDB具有相同的优点并且更快 - 不需要编码和解码JSON。


Amazon SimpleDB

Now this one really gets my head spinning... The author (Russell Smith) gives the following example:

现在这个真的让我头晕目眩......作者(Russell Smith)给出了以下例子:

$sdb->putAttributes('phparch', 'may', array('title' => array('value' => 'May 2009'), 'have' => array('value' => false)));
$sdb->putAttributes('phparch', 'june', array('title' => array('value' => 'June 2009'), 'have' => array('value' => true)));
$sdb->putAttributes('phparch', 'july', array('title' => array('value' => 'July 2009'), 'have' => array('value' => true)));

He then says that Amazon now supports a SQL-like interface and then executes the following query:

然后他说Amazon现在支持类似SQL的接口,然后执行以下查询:

$sdb->select('phparch', 'SELECT * FROM phparch WHERE have = "1"');

He doesn't give any analogous example of how to do that query in CouchDB (he leaves some hints on Views and Map/Reduce however) but I suppose it is also possible, so my question is: how does Amazon (and CouchDB) do it?

他没有给出如何在CouchDB中进行查询的任何类似示例(但是他在Views和Map / Reduce上留下了一些提示),但我想这也是可能的,所以我的问题是:Amazon(和CouchDB)如何做它?

My first guess would be that they open all documents (in possible in a distributed environment) and then apply a reduce operation to filter the documents whose attributes don't match the search criteria, but wouldn't this be overly expensive (CPU and Disk I/O) even in parallel computing?

我的第一个猜测是他们打开所有文档(可能在分布式环境中),然后应用reduce操作来过滤属性与搜索条件不匹配的文档,但这不会过于昂贵(CPU和磁盘) I / O)即使在并行计算中?


I know I'm ignoring some important stuff like distribution, consistency and so on but I'm just trying to grasp the very basic inner workings of NoSQL storages.

我知道我忽略了一些重要的东西,如分布,一致性等等,但我只是想要掌握NoSQL存储的基本内部工作原理。

PS: Also, can anyone explain me why both CouchDB and Amazon SimpleDB are built with Erlang?

PS:另外,任何人都可以解释为什么CouchDB和Amazon SimpleDB都是用Erlang构建的吗?

1 个解决方案

#1


3  

the fuss around nosql is down to indexing, availability, and scalability. indexing is what allows the document-oriented stores to NOT open all documents if you want to get the ones where have = 1. availablity and scalability allow these systems to easily scale out and be robust in the face of unreliable hardware.

围绕nosql的问题归结为索引,可用性和可伸缩性。索引是允许面向文档的存储不打开所有文档的原因,如果你想获得那些具有= 1的可用性和可扩展性,这些系统可以轻松地扩展并在面对不可靠的硬件时保持健壮。

erlang is designed for multi-processor systems and so is an ideal fit for distributed systems too.

erlang专为多处理器系统而设计,因此也非常适合分布式系统。

#1


3  

the fuss around nosql is down to indexing, availability, and scalability. indexing is what allows the document-oriented stores to NOT open all documents if you want to get the ones where have = 1. availablity and scalability allow these systems to easily scale out and be robust in the face of unreliable hardware.

围绕nosql的问题归结为索引,可用性和可伸缩性。索引是允许面向文档的存储不打开所有文档的原因,如果你想获得那些具有= 1的可用性和可扩展性,这些系统可以轻松地扩展并在面对不可靠的硬件时保持健壮。

erlang is designed for multi-processor systems and so is an ideal fit for distributed systems too.

erlang专为多处理器系统而设计,因此也非常适合分布式系统。