Can someone illustrate how I can store and easily query hierarchical data in google app engine datastore?
有人可以说明我如何在谷歌应用引擎数据存储中存储和轻松查询分层数据?
3 个解决方案
#1
The best option depends on your requirements. Here's a few solutions (I'm assuming you're using Python, since you didn't specify):
最佳选择取决于您的要求。这里有一些解决方案(我假设你使用的是Python,因为你没有指定):
- If you need to do transactional updates on an entire tree, and you're not going to have more than about 1QPS of sustained updates to any one tree, you can use the built in support for heirarchial storage. When creating an entity, you can pass the "parent" attribute to specify a parent entity or key, and when querying, you can use the .ancestor() method (or 'ANCESTOR IS' in GQL to retrieve all descendants of a given entity.
- If you don't need transactional updates, you can replicate the functionality of entity groups without the contention issues (and transaction safety): Add a db.ListProperty(db.Key) to your model called 'ancestors', and populate it with the list of ancestors of the object you're inserting. Then you can easily retrieve everything that's descended from a given ancestor with MyModel.all().filter('ancestors =', parent_key).
- If you don't need transactions, and you only care about retrieving the direct children of an entity (not all descendants), use the approach outlined above, but instead of a ListProperty just use a ReferenceProperty to the parent entity. This is known as an Adjacency List.
如果您需要在整个树上执行事务更新,并且您对任何一棵树的持续更新不会超过大约1QPS,则可以使用内置支持进行层次存储。创建实体时,可以传递“parent”属性以指定父实体或键,并且在查询时,可以使用.ancestor()方法(或GQL中的“ANCESTOR IS”来检索给定实体的所有后代) 。
如果您不需要事务更新,则可以复制实体组的功能而不会出现争用问题(以及事务安全性):将db.ListProperty(db.Key)添加到名为“ancestors”的模型中,并使用要插入的对象的祖先列表。然后,您可以使用MyModel.all()过滤器('ancestors =',parent_key)轻松检索来自给定祖先的所有内容。
如果您不需要事务,并且只关心检索实体的直接子项(不是所有后代),请使用上面概述的方法,但是不要使用ListProperty而只是使用ReferenceProperty到父实体。这被称为邻接列表。
There are other approaches available, but those three should cover the most common cases.
还有其他方法,但这三种方法应涵盖最常见的情况。
#2
Well, you should try to keep your data as linear as possible. If you need to quickly query a tree structure of data, you would either have to store it pickled in the database (or JSON-encoded if you prefer) if that is possible for your data, or you would have to generate tree indices that can be used to quickly query a piece of a tree structure. I'm not sure how Google App Engine would perform when updating those indices, however.
那么,您应该尽量保持数据的线性。如果您需要快速查询数据的树结构,您可能必须将其存储在数据库中(或者如果您愿意,可以使用JSON编码),如果这对您的数据是可能的,或者您必须生成可以生成的树索引用于快速查询树结构的一部分。但是,我不确定Google App Engine在更新这些索引时的表现。
When it comes to Google App Engine, your main concern should be to reduce the number of queries you need to make, and that your queries return as little rows as possible. Operations are expensive, but storage is not, so redundancy should not be seen as a bad thing.
对于Google App Engine,您主要关心的是减少需要进行的查询次数,并确保查询尽可能少地返回。操作很昂贵,但存储不是,所以冗余不应该被视为坏事。
Here are some thoughts on the subject I found by googling (although for MySQL, but you can get the general idea from it): Managing Hierarchical Data in MySQL
以下是我通过谷歌搜索找到的主题的一些想法(虽然对于MySQL,但你可以从中得到一般的想法):在MySQL中管理分层数据
Ah and here's a discussion for Google App Engine: Modeling Hierarchical Data
啊,这是对Google App Engine的讨论:建模分层数据
#3
One way is to use the Model's parent attribute. You can then make use of query.ancestor() and model.parent() functions.
一种方法是使用Model的父属性。然后,您可以使用query.ancestor()和model.parent()函数。
I guess it depends on what kind of operations you want to do on this data which would determine how best to represent it.
我想这取决于你想对这些数据做什么样的操作,这将决定如何最好地表示它。
#1
The best option depends on your requirements. Here's a few solutions (I'm assuming you're using Python, since you didn't specify):
最佳选择取决于您的要求。这里有一些解决方案(我假设你使用的是Python,因为你没有指定):
- If you need to do transactional updates on an entire tree, and you're not going to have more than about 1QPS of sustained updates to any one tree, you can use the built in support for heirarchial storage. When creating an entity, you can pass the "parent" attribute to specify a parent entity or key, and when querying, you can use the .ancestor() method (or 'ANCESTOR IS' in GQL to retrieve all descendants of a given entity.
- If you don't need transactional updates, you can replicate the functionality of entity groups without the contention issues (and transaction safety): Add a db.ListProperty(db.Key) to your model called 'ancestors', and populate it with the list of ancestors of the object you're inserting. Then you can easily retrieve everything that's descended from a given ancestor with MyModel.all().filter('ancestors =', parent_key).
- If you don't need transactions, and you only care about retrieving the direct children of an entity (not all descendants), use the approach outlined above, but instead of a ListProperty just use a ReferenceProperty to the parent entity. This is known as an Adjacency List.
如果您需要在整个树上执行事务更新,并且您对任何一棵树的持续更新不会超过大约1QPS,则可以使用内置支持进行层次存储。创建实体时,可以传递“parent”属性以指定父实体或键,并且在查询时,可以使用.ancestor()方法(或GQL中的“ANCESTOR IS”来检索给定实体的所有后代) 。
如果您不需要事务更新,则可以复制实体组的功能而不会出现争用问题(以及事务安全性):将db.ListProperty(db.Key)添加到名为“ancestors”的模型中,并使用要插入的对象的祖先列表。然后,您可以使用MyModel.all()过滤器('ancestors =',parent_key)轻松检索来自给定祖先的所有内容。
如果您不需要事务,并且只关心检索实体的直接子项(不是所有后代),请使用上面概述的方法,但是不要使用ListProperty而只是使用ReferenceProperty到父实体。这被称为邻接列表。
There are other approaches available, but those three should cover the most common cases.
还有其他方法,但这三种方法应涵盖最常见的情况。
#2
Well, you should try to keep your data as linear as possible. If you need to quickly query a tree structure of data, you would either have to store it pickled in the database (or JSON-encoded if you prefer) if that is possible for your data, or you would have to generate tree indices that can be used to quickly query a piece of a tree structure. I'm not sure how Google App Engine would perform when updating those indices, however.
那么,您应该尽量保持数据的线性。如果您需要快速查询数据的树结构,您可能必须将其存储在数据库中(或者如果您愿意,可以使用JSON编码),如果这对您的数据是可能的,或者您必须生成可以生成的树索引用于快速查询树结构的一部分。但是,我不确定Google App Engine在更新这些索引时的表现。
When it comes to Google App Engine, your main concern should be to reduce the number of queries you need to make, and that your queries return as little rows as possible. Operations are expensive, but storage is not, so redundancy should not be seen as a bad thing.
对于Google App Engine,您主要关心的是减少需要进行的查询次数,并确保查询尽可能少地返回。操作很昂贵,但存储不是,所以冗余不应该被视为坏事。
Here are some thoughts on the subject I found by googling (although for MySQL, but you can get the general idea from it): Managing Hierarchical Data in MySQL
以下是我通过谷歌搜索找到的主题的一些想法(虽然对于MySQL,但你可以从中得到一般的想法):在MySQL中管理分层数据
Ah and here's a discussion for Google App Engine: Modeling Hierarchical Data
啊,这是对Google App Engine的讨论:建模分层数据
#3
One way is to use the Model's parent attribute. You can then make use of query.ancestor() and model.parent() functions.
一种方法是使用Model的父属性。然后,您可以使用query.ancestor()和model.parent()函数。
I guess it depends on what kind of operations you want to do on this data which would determine how best to represent it.
我想这取决于你想对这些数据做什么样的操作,这将决定如何最好地表示它。