I have a model with three fields
我有一个有三个领域的模型
class MyModel(models.Model):
a = models.ForeignKey(A)
b = models.ForeignKey(B)
c = models.ForeignKey(C)
I want to enforce a unique constraint between these fields, and found django's unique_together
, which seems to be the solution. However, I already have an existing database, and there are many duplicates. I know that since unique_together
works at the database level, I need to unique-ify the rows, and then try a migration.
我想强制执行这些字段之间的唯一约束,并找到django的unique_together,这似乎是解决方案。但是,我已经有了一个现有的数据库,并且有很多重复数据库。我知道,因为unique_together在数据库级别工作,我需要使用unique-ify行,然后尝试迁移。
Is there a good way to go about removing duplicates (where a duplicate has the same (A,B,C)) so that I can run migration to get the unique_together
contstraint?
是否有一个很好的方法去删除重复项(副本具有相同的(A,B,C)),以便我可以运行迁移以获得unique_together contstraint?
1 个解决方案
#1
22
If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I've done something silly, since you are about to delete a bunch of data.
如果您乐意随意选择其中一个副本,我认为以下可能会有所帮助。也许不是最有效但足够简单,我想你只需要运行一次。如果我做了一些愚蠢的事情,请验证这一切都可以自己处理一些测试数据,因为你要删除一堆数据。
First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a "master" that we are going to keep. Our chosen method is to pick the one with lowest pk
首先,我们找到形成重复的对象组。对于每个组,(任意)选择我们将要保留的“主人”。我们选择的方法是选择pk最低的方法
master_pks = MyModel.objects.values('A', 'B', 'C'
).annotate(Min('pk'), count=Count('pk')
).filter(count__gt=1
).values_list('pk__min', flat=True)
we then loop over each master, and delete all its duplicates
然后我们遍历每个主服务器,并删除所有重复项
masters = MyModel.objects.in_bulk( list(master_pks) )
for master in masters.values():
MyModel.objects.filter(a=master.a, b=master.b, c=master.c
).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()
#1
22
If you are happy to choose one of the duplicates arbitrarily, I think the following might do the trick. Perhaps not the most efficient but simple enough and I guess you only need to run this once. Please verify this all works yourself on some test data in case I've done something silly, since you are about to delete a bunch of data.
如果您乐意随意选择其中一个副本,我认为以下可能会有所帮助。也许不是最有效但足够简单,我想你只需要运行一次。如果我做了一些愚蠢的事情,请验证这一切都可以自己处理一些测试数据,因为你要删除一堆数据。
First we find groups of objects which form duplicates. For each group, (arbitrarily) pick a "master" that we are going to keep. Our chosen method is to pick the one with lowest pk
首先,我们找到形成重复的对象组。对于每个组,(任意)选择我们将要保留的“主人”。我们选择的方法是选择pk最低的方法
master_pks = MyModel.objects.values('A', 'B', 'C'
).annotate(Min('pk'), count=Count('pk')
).filter(count__gt=1
).values_list('pk__min', flat=True)
we then loop over each master, and delete all its duplicates
然后我们遍历每个主服务器,并删除所有重复项
masters = MyModel.objects.in_bulk( list(master_pks) )
for master in masters.values():
MyModel.objects.filter(a=master.a, b=master.b, c=master.c
).exclude(pk=master.pk).del_ACCIDENT_PREVENTION_ete()