如何在Amazon Dynamodb中使用聚合函数

Am New to dynamodb I have a table in DynamoDB with more than 100k items in it. Also, this table gets refreshed frequently. On this table, I want to be able to do something similar to this in the relation database world: how i can get max value from the table.

我是Dynamodb的新手我在DynamoDB中有一个表,里面有超过10万个项目。此外,此表经常刷新。在这张桌子上,我希望能够在关系数据库世界中做类似的事情:我如何从表中获得最大值。

2 个解决方案

#1

DynamoDB is a NoSQL database and therefore is very limited on how you can query data. It is not possible to perform aggregations such as max value from a table by directly calling the DynamoDB API. You will have to look to different tools and approaches to solve this problem.

DynamoDB是NoSQL数据库,因此对查询数据的方式非常有限。通过直接调用DynamoDB API,无法从表中执行聚合(如max value)。您将不得不寻找不同的工具和方法来解决这个问题。

There are a number of possible solutions you can consider:

您可以考虑许多可能的解决方案:

Perform A Table Scan

执行表扫描

With more than 100k items in your table this is likely a very bad idea. A table scan will read through every single item and you can have application side logic identify the maximum value. This really isn't a workable solution.

你的桌子上有超过10万件物品,这可能是一个非常糟糕的主意。表扫描将读取每个项目,您可以让应用程序端逻辑识别最大值。这真的不是一个可行的解决方案。

Materialized Index in DynamoDB

DynamoDB中的物化指数

Depending on your use case you can use DynamoDB streams and a Lambda function to maintain an index in a separate DynamoDB table. If your table is write only, no updates and no deletions, you could store the maximum in a separate table and as new records get inserted you can compare them and perform the necessary updates.

根据您的使用情况,您可以使用DynamoDB流和Lambda函数在单独的DynamoDB表中维护索引。如果您的表是只写的,没有更新而没有删除,您可以将最大值存储在单独的表中,并且当插入新记录时,您可以比较它们并执行必要的更新。

This approach is workable under some constrained circumstances, but is not a generalized solution.

这种方法在某些受限制的情况下是可行的,但不是一种通用的解决方案。

Perform Analytic using Amazon Redshift

使用Amazon Redshift执行分析

DynamoDB is not meant to do analytical operations such as maximum, while Redshift is a very powerful big data platform that can perform these types of calculations with ease. Similar to the DynamoDB index, you can use DynamoDB streams to send data into Redshift as records get inserted to maintain a near real time copy of the table for analytical purposes.

DynamoDB并不意味着进行最大化等分析操作,而Redshift是一个非常强大的大数据平台,可以轻松执行这些类型的计算。与DynamoDB索引类似,您可以使用DynamoDB流将数据发送到Redshift,因为记录被插入以维护表的近实时副本以用于分析目的。

If you are looking for more of an offline or analytical solution this is a good choice.

如果您正在寻找更多离线或分析解决方案,这是一个不错的选择。

Perform Analytics using Elasticsearch

使用Elasticsearch执行分析

While DynamoDB is a powerful NoSQL solution with strong guarantees on data durability, Elasticsearch provides a very flexible querying method that allows for queries such as maximum and these aggregations can be sliced and diced on any attribute value in real time. Similar to the above solutions you can use DynamoDB streams to send record inserts updates and deletions into the Elasticsearch index in real time.

虽然DynamoDB是一个功能强大的NoSQL解决方案,对数据持久性有很强的保证,但Elasticsearch提供了一种非常灵活的查询方法,允许对最大和这些聚合等查询实时切片和切块任何属性值。与上述解决方案类似,您可以使用DynamoDB流实时将记录插入更新和删除发送到Elasticsearch索引中。

If you want to stick with DynamoDB but need some additional querying capability, this is really a good option especially when using the AWS ES service which will fully manage an Elasticsearch cluster for you. It is important to remember that Elasticsearch doesn't replace your DynamoDB table, it is just an easily searchable index of the same data.

如果您想坚持使用DynamoDB但需要一些额外的查询功能,这确实是一个很好的选择,尤其是在使用AWS ES服务时,它将为您完全管理Elasticsearch集群。重要的是要记住,Elasticsearch不会替换您的DynamoDB表,它只是相同数据的易于搜索的索引。

Just use a SQL Database

只需使用SQL数据库

The obvious solution is if you have SQL requirements then move from a NoSQL based system to a SQL based system. AWS's RDS offering provides a managed solution. While DynamoDB provides a lot of benefits, if your use case is pulling you towards a SQL solution the easiest thing to do may be to not fight it and just change solutions.

显而易见的解决方案是,如果您有SQL要求,那么从基于NoSQL的系统转移到基于SQL的系统。 AWS的RDS产品提供托管解决方案。虽然DynamoDB提供了很多好处,但如果您的用例将您拉向SQL解决方案,那么最简单的方法就是不要对抗它而只是改变解决方案。

This is not to say that a SQL based solution or NoSQL based solution is better, there are pros and cons to each and those vary based on the specific use case, but it is definitely an option to consider.

这并不是说基于SQL的解决方案或基于NoSQL的解决方案更好,每种方法都有优缺点,而且根据具体的使用情况而有所不同,但它绝对是一个需要考虑的选择。

#2

DynamoDB actually does have a MAX aggregate function: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Querying.html

DynamoDB确实具有MAX聚合函数:http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMRforDynamoDB.Querying.html

#1