如何聚合(最小/最大等)Django JSONField数据?

时间:2021-08-26 20:01:46

I'm using django 1.9 with its built-in JSONField and postgres 9.4. In my model's attrs json field I store objects with some values, including numbers. And I need to aggregate over them to find min/max values. Something like this:

我正在使用django 1.9及其内置的JSONField和postgres 9.4。在我的模型的attrs json字段中,我使用一些值存储对象,包括数字。我需要聚合它们来找到最小/最大值。像这样的东西:

Model.objects.aggregate(min=Min('attrs__my_key'))

Also it would be useful to extract specific keys:

提取特定密钥也很有用:

Model.objects.values_list('attrs__my_key', flat=True)

The above queries fail with FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted."

上述查询因FieldError失败:“无法将关键字'my_key'解析为字段。不允许加入'attrs'。”

Is it possible somehow?

有可能吗?

Notes: 1) I know how to make plain postgres query to do the job so I search specifically for ORM solution to have the ability to filter etc. 2) I suppose this can be done with (relatively) new query expressions/lookups api but I haven't studied it yet.

注意:1)我知道如何进行简单的postgres查询来完成这项工作,所以我专门搜索ORM解决方案,以便能够过滤等.2)我想这可以用(相对)新的查询表达式/查找api完成但是我还没有研究过它。

4 个解决方案

#1


22  

From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform instead of RawSQL .

从django 1.11(尚未发布,所以这可能会改变)你可以使用django.contrib.postgres.fields.jsonb.KeyTextTransform而不是RawSQL。

In django 1.10 you have to copy/paste KeyTransform to you own KeyTextTransform and replace the -> operator with ->> and #> with #>> so it returns text instead of json objects.

在django 1.10中,你必须将KeyTransform复制/粘贴到你自己的KeyTextTransform,并用 - >>替换 - >运算符,用#>>替换#>,这样它就会返回文本而不是json对象。

Model.objects.annotate(
    val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')

You can even include KeyTextTransforms in SearchVectors for full text search

您甚至可以在SearchVectors中包含KeyTextTransforms以进行全文搜索

Model.objects.annotate(
    search=SearchVector(
        KeyTextTransform('jsonb_text_field_key', 'json_field'))
    )
).filter(search='stuff I am searching for')

Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.

请记住,您也可以在jsonb字段中编制索引,因此您应该根据您的特定工作负载来考虑这一点。

#2


15  

For those who interested, I've found the solution (or workaround at least).

对于那些感兴趣的人,我找到了解决方案(或至少解决方法)。

from django.db.models.expressions import RawSQL

Model.objects.annotate(
    val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')

Note that attrs->>%s expression will become smth like attrs->>'width' after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.

请注意,attrs - >>%s表达式将变得像attrs一样 - >>'width'处理后(我的意思是单引号)。因此,如果您对此名称进行硬编码,则应记得插入它们,否则您将收到错误消息。

/// A little bit offtopic ///

///有点offtopic ///

And one more tricky issue not related to django itself but that is needed to be handled somehow. As attrs is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width key. In this case you will get DataError from postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's ok. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.

还有一个棘手的问题与django本身无关,但需要以某种方式处理。因为attrs是json字段,并且它的键和值没有限制,你可以(取决于你的应用程序逻辑)获得一些非数字值,例如,宽度键。在这种情况下,由于执行上述查询,您将从postgres获取DataError。同时会忽略NULL值,所以没关系。如果你能抓住错误那么没问题,你很幸运。在我的情况下,我需要忽略错误的值,这里唯一的方法是编写自定义的postgres函数来抑制错误。

create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
    return cast($1 as numeric);
exception
    when invalid_text_representation then
        return null;
end;
$$ language plpgsql immutable;

And then use it to cast text to numbers:

然后使用它将文本转换为数字:

Model.objects.annotate(
    val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')

Thus we get quite solid solution for such a dynamic thing as json.

因此,我们为像json这样的动态事物得到了非常可靠的解决方案。

#3


1  

I know this is a bit late (several months) but I came across the post while trying to do this. Managed to do it by:

我知道这有点晚了(几个月),但我试图这样做时遇到了这个帖子。通过以下方式管理:

1) using KeyTextTransform to convert the jsonb value to text

1)使用KeyTextTransform将jsonb值转换为文本

2) using Cast to convert it to integer, so that the SUM works:

2)使用Cast将其转换为整数,以便SUM工作:

q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField()))  \
.aggregate(Sum('numeric_val'))

print(q)

where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.

其中'data'是jsonb属性,'numeric_val'是我通过注释创建的变量的名称。

Hope this helps somebody!

希望这有助于某人!

#4


0  

Seems there is no native way to do it.

似乎没有本地方法可以做到这一点。

I worked around like this:

我像这样工作:

my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)

This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).

这远非奇妙,因为它是在Python级别完成的(而不是在SQL级别)。

#1


22  

From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform instead of RawSQL .

从django 1.11(尚未发布,所以这可能会改变)你可以使用django.contrib.postgres.fields.jsonb.KeyTextTransform而不是RawSQL。

In django 1.10 you have to copy/paste KeyTransform to you own KeyTextTransform and replace the -> operator with ->> and #> with #>> so it returns text instead of json objects.

在django 1.10中,你必须将KeyTransform复制/粘贴到你自己的KeyTextTransform,并用 - >>替换 - >运算符,用#>>替换#>,这样它就会返回文本而不是json对象。

Model.objects.annotate(
    val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')

You can even include KeyTextTransforms in SearchVectors for full text search

您甚至可以在SearchVectors中包含KeyTextTransforms以进行全文搜索

Model.objects.annotate(
    search=SearchVector(
        KeyTextTransform('jsonb_text_field_key', 'json_field'))
    )
).filter(search='stuff I am searching for')

Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.

请记住,您也可以在jsonb字段中编制索引,因此您应该根据您的特定工作负载来考虑这一点。

#2


15  

For those who interested, I've found the solution (or workaround at least).

对于那些感兴趣的人,我找到了解决方案(或至少解决方法)。

from django.db.models.expressions import RawSQL

Model.objects.annotate(
    val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')

Note that attrs->>%s expression will become smth like attrs->>'width' after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.

请注意,attrs - >>%s表达式将变得像attrs一样 - >>'width'处理后(我的意思是单引号)。因此,如果您对此名称进行硬编码,则应记得插入它们,否则您将收到错误消息。

/// A little bit offtopic ///

///有点offtopic ///

And one more tricky issue not related to django itself but that is needed to be handled somehow. As attrs is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width key. In this case you will get DataError from postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's ok. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.

还有一个棘手的问题与django本身无关,但需要以某种方式处理。因为attrs是json字段,并且它的键和值没有限制,你可以(取决于你的应用程序逻辑)获得一些非数字值,例如,宽度键。在这种情况下,由于执行上述查询,您将从postgres获取DataError。同时会忽略NULL值,所以没关系。如果你能抓住错误那么没问题,你很幸运。在我的情况下,我需要忽略错误的值,这里唯一的方法是编写自定义的postgres函数来抑制错误。

create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
    return cast($1 as numeric);
exception
    when invalid_text_representation then
        return null;
end;
$$ language plpgsql immutable;

And then use it to cast text to numbers:

然后使用它将文本转换为数字:

Model.objects.annotate(
    val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')

Thus we get quite solid solution for such a dynamic thing as json.

因此,我们为像json这样的动态事物得到了非常可靠的解决方案。

#3


1  

I know this is a bit late (several months) but I came across the post while trying to do this. Managed to do it by:

我知道这有点晚了(几个月),但我试图这样做时遇到了这个帖子。通过以下方式管理:

1) using KeyTextTransform to convert the jsonb value to text

1)使用KeyTextTransform将jsonb值转换为文本

2) using Cast to convert it to integer, so that the SUM works:

2)使用Cast将其转换为整数,以便SUM工作:

q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField()))  \
.aggregate(Sum('numeric_val'))

print(q)

where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.

其中'data'是jsonb属性,'numeric_val'是我通过注释创建的变量的名称。

Hope this helps somebody!

希望这有助于某人!

#4


0  

Seems there is no native way to do it.

似乎没有本地方法可以做到这一点。

I worked around like this:

我像这样工作:

my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)

This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).

这远非奇妙,因为它是在Python级别完成的(而不是在SQL级别)。