couchdb是否适合在多个服务器上使用带有文件附件的文档?

时间:2020-11-25 16:51:52

i would love to hear your thoughts about couchdb, and would it handle my use case.

我很想听听你对couchdb的看法,它能处理我的用例吗?

What i will do, i will have database where i store documents in size about 20kb with attachment of 1-10MB for each.

我要做的是,我将有一个数据库,其中我存储大小约为20kb的文档,每个文档的附件为1-10MB。

  1. will couch handle database 10TB or more per server with my schema?(in 4u case you can put 24 2TB drives is this too much per couch node?, there will be very less reads, so i down need speed)

    使用我的模式,每个服务器的couch是否可以处理10TB或更多的数据库?(在4u的情况下,你可以放242tb的硬盘,是不是每个沙发节点都太多了?),阅读量会减少,所以我需要速度

  2. will couch be able replicate all documents with attachments

    couch是否可以用附件复制所有文档

  3. how about splitting all data to multiple servers (for example to 4 nodes)? will it handle that much attachments?

    如何将所有数据分割到多个服务器(例如4个节点)?它能处理这么多附件吗?

what problems do you see here?

你在这里看到了什么问题?

need more info please ask :)

需要更多信息请问:)

1 个解决方案

#1


3  

I don't think you will hit a physical limitation with a 10TB file, that is I don't think couch has some inbuilt "can't use files bigger than X" with X being < 10TB.

我认为你不会对10TB文件有物理限制,也就是说我不认为couch有一些内置的“不能使用大于X的文件”,而X小于10TB。

However.

然而。

The biggest issue is the file compaction. In order to reclaim space, Couch wants to compress the file. This effectively means copying the file. So, for some point at least, 10TB needs to be 20TB as it duplicates the live data in the new copy.

最大的问题是文件压缩。为了回收空间,沙发要压缩文件。这实际上意味着复制文件。因此,至少在某些时候,10TB需要是20TB,因为它复制了新的副本中的实时数据。

If you are mostly appending to the file, that is you are simply adding new data and not updating or overwriting old data, then this will be less of a problem, as compaction won't gain you quite that much. If your data is basically static, then I would build the file and compact it a final time and be doe with it.

如果大多数情况下都是附加到文件中,那就是添加新数据,而不是更新或重写旧数据,那么这将不是一个问题,因为压缩不会给您带来太多的好处。如果您的数据基本上是静态的,那么我将构建这个文件,最后一次压缩它,并对它进行赋值。

There are "3rd party" sharding solution for Couch, Lounge is popular.

沙发、休息室有“第三方”分片解决方案。

When I approach a couch solution the primary thing to consider is what your query criteria is. Couch is all about the views, really. What kind of views are you looking at? If you're simply storing data by some simple key (file name, the date, or whatever), you may well be better off simply using a file system, and an appropriate directory structure, frankly.

当我接近一个沙发解决方案时,首先要考虑的是您的查询条件是什么。沙发上到处都是风景,真的。你在看什么观点?如果您只是通过一些简单的键(文件名、日期或其他)存储数据,那么您最好使用文件系统和适当的目录结构。

So I'd like to hear more about your views you plan to use since you don't intend to do a lot of reading.

所以我想听更多关于你计划使用的观点,因为你不想做很多阅读。

Addenda:

附录:

You still haven't mentioned what kind of queries you're looking for. The queries are, effectively, THE design component, especially for a Couch DB since it gets more and more difficult to add new queries on large datasets.

您还没有提到要查找的查询类型。这些查询实际上是设计组件,特别是对于一个Couch DB,因为在大型数据集上添加新的查询变得越来越困难。

When you said attachments, I assumed you meant attachments to the Couch DB payload (since it can handle attachments).

当您说到附件时,我假设您指的是沙发DB负载的附件(因为它可以处理附件)。

So, all that said, you could easily create meta-data document capturing all of the whatever information you want to capture, and as part of that document add a path name to the actual file stored on the file system. This will reduce the overall size of the Couch file dramatically, which makes the maintenance faster and more efficient. You lose some of the "Self contained" part of having it all in a single document, of course.

因此,您可以轻松地创建元数据文档,获取您想要捕获的所有信息,并且作为该文档的一部分,为存储在文件系统中的实际文件添加路径名。这将极大地减少沙发文件的总体大小,从而使维护速度更快、效率更高。当然,你会丢失一些“自我包含”的部分,所有这些都在一个文档中。

#1


3  

I don't think you will hit a physical limitation with a 10TB file, that is I don't think couch has some inbuilt "can't use files bigger than X" with X being < 10TB.

我认为你不会对10TB文件有物理限制,也就是说我不认为couch有一些内置的“不能使用大于X的文件”,而X小于10TB。

However.

然而。

The biggest issue is the file compaction. In order to reclaim space, Couch wants to compress the file. This effectively means copying the file. So, for some point at least, 10TB needs to be 20TB as it duplicates the live data in the new copy.

最大的问题是文件压缩。为了回收空间,沙发要压缩文件。这实际上意味着复制文件。因此,至少在某些时候,10TB需要是20TB,因为它复制了新的副本中的实时数据。

If you are mostly appending to the file, that is you are simply adding new data and not updating or overwriting old data, then this will be less of a problem, as compaction won't gain you quite that much. If your data is basically static, then I would build the file and compact it a final time and be doe with it.

如果大多数情况下都是附加到文件中,那就是添加新数据,而不是更新或重写旧数据,那么这将不是一个问题,因为压缩不会给您带来太多的好处。如果您的数据基本上是静态的,那么我将构建这个文件,最后一次压缩它,并对它进行赋值。

There are "3rd party" sharding solution for Couch, Lounge is popular.

沙发、休息室有“第三方”分片解决方案。

When I approach a couch solution the primary thing to consider is what your query criteria is. Couch is all about the views, really. What kind of views are you looking at? If you're simply storing data by some simple key (file name, the date, or whatever), you may well be better off simply using a file system, and an appropriate directory structure, frankly.

当我接近一个沙发解决方案时,首先要考虑的是您的查询条件是什么。沙发上到处都是风景,真的。你在看什么观点?如果您只是通过一些简单的键(文件名、日期或其他)存储数据,那么您最好使用文件系统和适当的目录结构。

So I'd like to hear more about your views you plan to use since you don't intend to do a lot of reading.

所以我想听更多关于你计划使用的观点,因为你不想做很多阅读。

Addenda:

附录:

You still haven't mentioned what kind of queries you're looking for. The queries are, effectively, THE design component, especially for a Couch DB since it gets more and more difficult to add new queries on large datasets.

您还没有提到要查找的查询类型。这些查询实际上是设计组件,特别是对于一个Couch DB,因为在大型数据集上添加新的查询变得越来越困难。

When you said attachments, I assumed you meant attachments to the Couch DB payload (since it can handle attachments).

当您说到附件时,我假设您指的是沙发DB负载的附件(因为它可以处理附件)。

So, all that said, you could easily create meta-data document capturing all of the whatever information you want to capture, and as part of that document add a path name to the actual file stored on the file system. This will reduce the overall size of the Couch file dramatically, which makes the maintenance faster and more efficient. You lose some of the "Self contained" part of having it all in a single document, of course.

因此,您可以轻松地创建元数据文档,获取您想要捕获的所有信息,并且作为该文档的一部分,为存储在文件系统中的实际文件添加路径名。这将极大地减少沙发文件的总体大小,从而使维护速度更快、效率更高。当然,你会丢失一些“自我包含”的部分,所有这些都在一个文档中。