I have an NDB model. Once the data in the model becomes stale, I want to remove stale data items from searches and updates. I could have deleted them, which is explained in this SO post, if not for the need to analyze old data later. I see two choices
我有一个NDB模型。一旦模型中的数据变得陈旧,我想从搜索和更新中删除过时的数据项。我可以删除它们,这在SO帖子中有解释,如果不是因为以后需要分析旧数据。我看到两个选择
- adding a boolean status field, and simply mark entities deleted
- 添加布尔状态字段,并简单地标记已删除的实体
- move entities to a different model
- 将实体移动到不同的模型
My understanding of the trade off between these two options
我理解这两种选择之间的权衡
- mark-deleted is faster
- 标记删除更快
- mark-deleted is more error prone: having extra column would require modifying all the queries to exclude entities that are marked deleted. That will increase complexity and probability of bugs.
- mark-deleted更容易出错:拥有额外的列需要修改所有查询以排除标记为已删除的实体。这将增加错误的复杂性和可能性。
Question: Can move-entities option be made fast enough to be comparable to mark-deleted? Any sample code as to how to move entities between models efficiently?
问题:移动实体选项是否可以足够快地与标记删除相媲美?关于如何有效地在模型之间移动实体的任何示例代码?
Update: 2014-05-14, I decided for the time being to use mark-deleted. I figure there is an additional benefit of fewer RPCs.
更新:2014-05-14,我决定暂时使用mark-deleted。我认为还有更少的RPC的额外好处。
Related:
有关:
- How to delete all entities for NDB Model in Google App Engine for python?
- 如何删除Google App Engine中用于python的NDB模型的所有实体?
1 个解决方案
#1
0
You can use a combination, of the solutions you propose although in my head I think its an over engineering.
你可以使用你提出的解决方案的组合,虽然我认为它是一个过度工程。
1) In first place, write a task queue that will update all of your entities with your new field is_deleted
with a default value False
, this will prevent all the previous entities to return an error when you ask them if they are deleted.
1)首先,编写一个任务队列,使用新字段is_deleted更新所有实体,并使用默认值False,这将阻止所有先前的实体在询问它们是否被删除时返回错误。
2) Write your queries in a model level, so you don't have to alter them any time you make a change in your model, but only pass the extra parameter you want to filter on when you make the relevant query. You can get an idea from the model of the bootstrap project gae-init. You can query them with is_deleted = False
.
2)在模型级别编写查询,因此您无需在模型中进行任何更改时更改它们,而只需在进行相关查询时传递要过滤的额外参数。您可以从bootstrap项目gae-init的模型中获得一个想法。您可以使用is_deleted = False查询它们。
3) BigTable's performance will not be affected if you are querying 10 entities or 10 M entities, but if you want to move the deleted ones in an new Entity model you can try to create a crop job so in the end of the day or something copy them somewhere else and remove the original ones. Don't forget that will use your quota and you mind end up paying literally for the clean up.
3)如果您要查询10个实体或10个M实体,BigTable的性能不会受到影响,但如果您想在新的实体模型中移动已删除的实体,您可以尝试创建裁剪作业,以便在一天结束时或某事将它们复制到其他地方并删除原始的。不要忘记,这将使用你的配额,你介意最终支付字面上的清理。
Keep in mind also that if there are any dependencies on the entities you will move, you will have to update them also. So in my opinion its better to leave them flagged, and index your flag.
请记住,如果您要移动的实体存在任何依赖关系,您还必须更新它们。所以在我看来最好让它们标记,并为你的旗帜编制索引。
#1
0
You can use a combination, of the solutions you propose although in my head I think its an over engineering.
你可以使用你提出的解决方案的组合,虽然我认为它是一个过度工程。
1) In first place, write a task queue that will update all of your entities with your new field is_deleted
with a default value False
, this will prevent all the previous entities to return an error when you ask them if they are deleted.
1)首先,编写一个任务队列,使用新字段is_deleted更新所有实体,并使用默认值False,这将阻止所有先前的实体在询问它们是否被删除时返回错误。
2) Write your queries in a model level, so you don't have to alter them any time you make a change in your model, but only pass the extra parameter you want to filter on when you make the relevant query. You can get an idea from the model of the bootstrap project gae-init. You can query them with is_deleted = False
.
2)在模型级别编写查询,因此您无需在模型中进行任何更改时更改它们,而只需在进行相关查询时传递要过滤的额外参数。您可以从bootstrap项目gae-init的模型中获得一个想法。您可以使用is_deleted = False查询它们。
3) BigTable's performance will not be affected if you are querying 10 entities or 10 M entities, but if you want to move the deleted ones in an new Entity model you can try to create a crop job so in the end of the day or something copy them somewhere else and remove the original ones. Don't forget that will use your quota and you mind end up paying literally for the clean up.
3)如果您要查询10个实体或10个M实体,BigTable的性能不会受到影响,但如果您想在新的实体模型中移动已删除的实体,您可以尝试创建裁剪作业,以便在一天结束时或某事将它们复制到其他地方并删除原始的。不要忘记,这将使用你的配额,你介意最终支付字面上的清理。
Keep in mind also that if there are any dependencies on the entities you will move, you will have to update them also. So in my opinion its better to leave them flagged, and index your flag.
请记住,如果您要移动的实体存在任何依赖关系,您还必须更新它们。所以在我看来最好让它们标记,并为你的旗帜编制索引。