Google App Engine上的博客标记系统的数据建模建议

Am wondering if anyone might provide some conceptual advice on an efficient way to build a data model to accomplish the simple system described below. Am somewhat new to thinking in a non-relational manner and want to try avoiding any obvious pitfalls. It's my understanding that a basic principal is that "storage is cheap, don't worry about data duplication" as you might in a normalized RDBMS.

我想知道是否有人可以提供一些有关构建数据模型以实现下述简单系统的有效方法的概念建议。对于以非关系方式思考并且想要尝试避免任何明显的陷阱,我有点新鲜。我的理解是,基本原则是“存储便宜,不要担心数据重复”,就像在规范化的RDBMS中一样。

What I'd like to model is:

我想建模的是:

A blog article which can be given 0-n tags. Many blog articles can share the same tag. When retrieving data would like to allow retrieval of all articles matching a tag. In many ways very similar to the approach taken here at *.

一篇可以给出0-n标签的博客文章。许多博客文章可以共享相同的标签。检索数据时,希望允许检索与标记匹配的所有文章。在许多方面与*中采用的方法非常相似。

My normal mindset would be to create a many-to-may relationship between tags and blog articles. However, I'm thinking in the context of GAE that this would be expensive, although I have seen examples of it being done.

我的正常心态是在标签和博客文章之间建立多对多的关系。但是,我在GAE的背景下考虑这将是昂贵的,虽然我已经看到它的例子。

Perhaps using a ListProperty containing each tag as part of the article entities, and a second data model to track tags as they're added and deleted? This way no need for any relationships and the ListProperty still allows queries where any list element matching will return results.

也许使用包含每个标记的ListProperty作为文章实体的一部分,并使用第二个数据模型来跟踪标记在添加和删除时的情况?这种方式不需要任何关系,ListProperty仍然允许查询,其中任何列表元素匹配将返回结果。

Any suggestions on the most efficient way to approach this on GAE?

有关GAE最有效方法的建议吗?

4 个解决方案

#1

Thanks to both of you for your suggestions. I've implemented (first iteration) as follows. Not sure if it's the best approach, but it's working.

感谢你们两位的建议。我已经实现了(第一次迭代)如下。不确定这是否是最佳方法,但它正在发挥作用。

Class A = Articles. Has a StringListProperty which can be queried on it's list elements

A类=文章。有一个StringListProperty,可以在它的列表元素上查询

Class B = Tags. One entity per tag, also keeps a running count of the total number of articles using each tag.

B类=标签。每个标签一个实体,还保持使用每个标签的文章总数的运行计数。

Data modifications to A are accompanied by maintenance work on B. Thinking that counts being pre-computed is a good approach in a read-heavy environment.

对A的数据修改伴随着对B的维护工作。在重读环境中考虑预先计算的数量是一种很好的方法。

#2

counts being pre-computed is ~~not only~~ practical ~~, but also necessary because the count() function returns a maximum of 1000~~ . if write-contention might be an issue, make sure to check out the sharded counter example.

计算预先计算的不仅是实用的,而且也是必要的,因为count()函数返回的最大值为1000。如果写入争用可能是个问题,请务必查看分片计数器示例。

http://code.google.com/appengine/articles/sharding_counters.html

#3

Many-to-many sounds reasonable. Perhaps you should try it first to see if it is actually expensive.

多对多听起来很合理。也许你应该先试试看它是否真的很贵。

Good thing about G.A.E. is that it will tell you when you are using too many cycles. Profiling for free!

关于G.A.E.的好事它会告诉你何时使用太多周期。分析免费!

#4

One possible way is with Expando, where you'd add a tag like:

一种可能的方法是使用Expando,您可以在其中添加如下标记:

setattr(entity, 'tag_'+tag_name, True)

Then you could query all the entities with a tag like:

然后,您可以使用以下标记查询所有实体:

def get_all_with_tag(model_class, tag):
    return model_class.all().filter('tag_%s =' % tag, True)

Of course you have to clean up your tags to be proper Python identifiers. I haven't tried this, so I'm not sure if it's really a good solution.

当然,你必须清理你的标签才能成为合适的Python标识符。我没试过这个,所以我不确定它是否真的是一个很好的解决方案。

#1