在Riak中存储二进制数据的缺点?

时间:2021-06-11 17:00:21

What are the problems, if any, of storing binary data in Riak?

在Riak中存储二进制数据有什么问题?

Does it effect the maintainability and performance of the clustering?

它是否会影响群集的可维护性和性能?

What would the performance differences be between using Riak for this rather than a distributed file system?

使用Riak而不是分布式文件系统之间的性能差异是什么?

5 个解决方案

#1


12  

Adding to @Oscar-Godson's excellent answer, you're likely to experience problems with values much larger than 50MBs. Bitcask is best suited for values that are up to a few KBs. If you're storing large values, you may want to consider alternative storage backends, such as innostore.

除了@ Oscar-Godson的出色答案之外,您可能会遇到大于50MB的值的问题。 Bitcask最适合高达几KB的值。如果要存储大值,则可能需要考虑备用存储后端,例如innostore。

I don't have experience with storing binary values, but we've a medium-sized cluster in production (5 nodes, on the order of 100M values, 10's of TBs) and we're seeing frequent errors related to inserting and retrieving values that are 100's of KBs in size. Performance in this case is inconsistent - some times it works, others it doesn't - so if you're going to test, test at scale.

我没有存储二进制值的经验,但我们在生产中有一个中型集群(5个节点,大约100M值,10个TB),我们看到与插入和检索值有关的频繁错误这是100英尺的KB。在这种情况下的性能是不一致的 - 有时它可以工作,有些则不然 - 所以如果你要测试,那么大规模测试。

We're also seeing problems with large values when running map-reduce queries - they simply time out. However that may be less relevant to binary values... (as @Matt-Ranney mentioned).

运行map-reduce查询时,我们也看到了大值的问题 - 它们只是超时。然而,这可能与二进制值不太相关......(正如@ Matt-Ranney所提到的)。

Also see @Stephen-C's answer here

另见@ Stephen-C的答案

#2


6  

The only problem I can think of is storing binary data larger than 50MBs which they advise against. The whole point of Riak is just that:

我能想到的唯一问题是存储大于50MB的二进制数据,他们建议不要这样做。 Riak的重点就是:

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML.

人们可能选择Riak的另一个原因是灵活地建模数据。 Riak将以与内容无关的方式存储您告诉它的任何数据 - 它不会强制执行表,列或参照完整性。这意味着您可以将二进制文件与更多程序员透明的格式(如JSON或XML)一起存储。

Source: Schema Design in Riak - Introduction

资料来源:Riak的图式设计 - 简介

#3


4  

With Riak, the recommended maximum is 2MB per object. Above that, it's recommended to use either Riak CS, which has been tested with objects up to 5TB (Stored in Riak as 1MB objects) or by naturally breaking up your large object into 2MB chunks and linking by a key and suffix.

使用Riak时,建议的最大值为每个对象2MB。在此之上,建议使用Riak CS,它已经使用高达5TB的对象进行了测试(在Riak中存储为1MB对象),或者自然地将大对象分解为2MB块并通过键和后缀进行链接。

#4


3  

I personally haven't noticed any issues storing data such as images and documents (both DOC and PDF) into Riak. I don't have performance numbers but might be able to gather some should I remember.

我个人没有注意到将数据(如图像和文档(DOC和PDF))存储到Riak中的任何问题。我没有性能数字,但是如果我记得的话可能会收集一些。

Something of note, with Riak you can use Luwak which provides an api for storing large files. This has been pretty useful.

值得注意的是,使用Riak,你可以使用Luwak提供存储大文件的api。这非常有用。

#5


1  

One problem may be that it is difficult, if not impossible, to use JavaScript map/reduce across your binary data. You'll probably need Erlang for that.

一个问题可能是,如果不是不可能的话,在二进制数据中使用JavaScript map / reduce是很困难的。你可能需要Erlang。

#1


12  

Adding to @Oscar-Godson's excellent answer, you're likely to experience problems with values much larger than 50MBs. Bitcask is best suited for values that are up to a few KBs. If you're storing large values, you may want to consider alternative storage backends, such as innostore.

除了@ Oscar-Godson的出色答案之外,您可能会遇到大于50MB的值的问题。 Bitcask最适合高达几KB的值。如果要存储大值,则可能需要考虑备用存储后端,例如innostore。

I don't have experience with storing binary values, but we've a medium-sized cluster in production (5 nodes, on the order of 100M values, 10's of TBs) and we're seeing frequent errors related to inserting and retrieving values that are 100's of KBs in size. Performance in this case is inconsistent - some times it works, others it doesn't - so if you're going to test, test at scale.

我没有存储二进制值的经验,但我们在生产中有一个中型集群(5个节点,大约100M值,10个TB),我们看到与插入和检索值有关的频繁错误这是100英尺的KB。在这种情况下的性能是不一致的 - 有时它可以工作,有些则不然 - 所以如果你要测试,那么大规模测试。

We're also seeing problems with large values when running map-reduce queries - they simply time out. However that may be less relevant to binary values... (as @Matt-Ranney mentioned).

运行map-reduce查询时,我们也看到了大值的问题 - 它们只是超时。然而,这可能与二进制值不太相关......(正如@ Matt-Ranney所提到的)。

Also see @Stephen-C's answer here

另见@ Stephen-C的答案

#2


6  

The only problem I can think of is storing binary data larger than 50MBs which they advise against. The whole point of Riak is just that:

我能想到的唯一问题是存储大于50MB的二进制数据,他们建议不要这样做。 Riak的重点就是:

Another reason one might pick Riak is for flexibility in modeling your data. Riak will store any data you tell it to in a content-agnostic way — it does not enforce tables, columns, or referential integrity. This means you can store binary files right alongside more programmer-transparent formats like JSON or XML.

人们可能选择Riak的另一个原因是灵活地建模数据。 Riak将以与内容无关的方式存储您告诉它的任何数据 - 它不会强制执行表,列或参照完整性。这意味着您可以将二进制文件与更多程序员透明的格式(如JSON或XML)一起存储。

Source: Schema Design in Riak - Introduction

资料来源:Riak的图式设计 - 简介

#3


4  

With Riak, the recommended maximum is 2MB per object. Above that, it's recommended to use either Riak CS, which has been tested with objects up to 5TB (Stored in Riak as 1MB objects) or by naturally breaking up your large object into 2MB chunks and linking by a key and suffix.

使用Riak时,建议的最大值为每个对象2MB。在此之上,建议使用Riak CS,它已经使用高达5TB的对象进行了测试(在Riak中存储为1MB对象),或者自然地将大对象分解为2MB块并通过键和后缀进行链接。

#4


3  

I personally haven't noticed any issues storing data such as images and documents (both DOC and PDF) into Riak. I don't have performance numbers but might be able to gather some should I remember.

我个人没有注意到将数据(如图像和文档(DOC和PDF))存储到Riak中的任何问题。我没有性能数字,但是如果我记得的话可能会收集一些。

Something of note, with Riak you can use Luwak which provides an api for storing large files. This has been pretty useful.

值得注意的是,使用Riak,你可以使用Luwak提供存储大文件的api。这非常有用。

#5


1  

One problem may be that it is difficult, if not impossible, to use JavaScript map/reduce across your binary data. You'll probably need Erlang for that.

一个问题可能是,如果不是不可能的话,在二进制数据中使用JavaScript map / reduce是很困难的。你可能需要Erlang。