Django只选择具有重复字段值的行

suppose we have a model in django defined as follows:

假设我们在django有一个模型，定义如下:

class Literal:
    name = models.CharField(...)
    ...

Name field is not unique, and thus can have duplicate values. I need to accomplish the following task: Select all rows from the model that have at least one duplicate value of the name field.

Name字段不是唯一的，因此可以有重复的值。我需要完成以下任务:从模型中选择所有具有name字段至少一个重复值的行。

I know how to do it using plain SQL (may be not the best solution):

我知道如何使用纯SQL(可能不是最好的解决方案):

select * from literal where name IN (
    select name from literal group by name having count((name)) > 1
);

So, is it possible to select this using django ORM? Or better SQL solution?

那么，是否可以使用django ORM来选择它呢?SQL或更好的解决方案吗?

5 个解决方案

#1

135

Try:

试一试:

from django.db.models import Count
Literal.objects.values('name')
               .annotate(Count('id')) 
               .order_by()
               .filter(id__count__gt=1)

This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:

这和Django是一样的。问题是，这将返回一个只有名称和计数的ValuesQuerySet。但是，您可以使用它来构造一个常规的QuerySet，方法是将它返回到另一个查询中:

dupes = Literal.objects.values('name')
                       .annotate(Count('id'))
                       .order_by()
                       .filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])

#2

This was rejected as an edit. So here it is as a better answer

这被拒绝作为编辑。这是一个更好的答案

dups = (
    Literal.objects.values('name')
    .annotate(count=Count('id'))
    .values('name')
    .order_by()
    .filter(count__gt=1)
)

This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django orm is smart enough to combine these into a single query:

这将返回一个包含所有重复名称的ValuesQuerySet。但是，您可以使用它来构造一个常规的查询集，方法是将它返回到另一个查询中。django orm足够聪明，可以将它们组合为一个查询:

Literal.objects.filter(name__in=dups)

The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the orm into only selecting the name column for the subquery.

在注释调用之后对.values('name')的额外调用看起来有点奇怪。否则，子查询将失败。额外的值使orm只选择子查询的name列。

#3

try using aggregation

试着用聚合

Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)

#4

In case you use PostgreSQL, you can do something like this:

如果你使用PostgreSQL，你可以这样做:

from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value

duplicate_ids = (Literal.objects.values('name')
                 .annotate(ids=ArrayAgg('id'))
                 .annotate(c=Func('ids', Value(1), function='array_length'))
                 .filter(c__gt=1)
                 .annotate(ids=Func('ids', function='unnest'))
                 .values_list('ids', flat=True))

It results in this rather simple SQL query:

它导致这个相当简单的SQL查询:

SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1

#5

If you want to result only names list but not objects, you can use the following query

如果希望只生成名称列表而不生成对象，可以使用以下查询

repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')

#1

135