如果数据库是Postgresql并且db_index = True，那么字符串在Django字段中有多大？

I'm working on a Django website that has a PostgreSQL database, and one of my model's has a 'description' field that I'd like to give an index. Is there a maximum string size that can be added to this column?

我正在开发一个拥有PostgreSQL数据库的Django网站，我的一个模型有一个'description'字段，我想给它一个索引。是否可以将最大字符串大小添加到此列中？

Django's documentation on PostgreSQL indices makes it seem like there is no limit, since you can create indices for TextFields that don't define a max_lenth. However, I found this post about btree column size errors, which makes me think that 2713 / 4 - 4 = 674.25 is the most UTF-8 characters that would always fit. Can anyone point me to documentation for this or share there experiences with trying to put indices on Django TextFields?

Django关于PostgreSQL索引的文档似乎没有限制，因为您可以为没有定义max_lenth的TextField创建索引。但是，我发现这篇关于btree列大小错误的帖子，这让我觉得2713/4 - 4 = 674.25是最适合的UTF-8字符。任何人都可以指出我的文档或分享尝试在Django TextFields上放置索引的经验吗？

1 个解决方案

#1

There is indeed a limit, but it's not tiny.

确实有一个限制，但它并不小。

ERROR: index row requires 9400 bytes, maximum size is 8191

错误：索引行需要9400个字节，最大大小为8191

To trigger this:

触发这个：

CREATE TABLE bigtext(x text);

CREATE INDEX bigtext_x ON bigtext(x);

 INSERT INTO bigtext(x) SELECT repeat('x', 819200);

Given the error you'd expect this to fail:

鉴于您希望失败的错误：

INSERT INTO bigtext(x) SELECT repeat('x', 8192);

but because of compression, it won't; you can tack an extra zero on and it'll still fit.

但由于压缩，它不会;你可以加上额外的零，它仍然适合。

Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.

在超越页面和失败之前，较小，较少重复且因此较少可压缩的文本将更少。从理论上讲，如果你有完全随机的垃圾，那么只有8191个字节应该适合，但实际上它仍然会在utf-8 db上多一点，因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。

For this reason you can't have a simple CHECK constraint, it's not as simple as "number of chars".

出于这个原因，你不能有一个简单的CHECK约束，它不像“chars数”那么简单。

You might find pg_column_size(...) useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK constraint though, because it always shows unTOASTed datums at full uncompressed size.

你可能会发现pg_column_size（...）很有用;它会告诉您磁盘的压缩大小。但它在CHECK约束中无法帮助您，因为它始终以完全未压缩的大小显示未经过处理的基准。

The PostgreSQL docs could describe this limit a lot better (or at all).

PostgreSQL文档可以更好地（或根本）描述这个限制。

For bigger fields you can index the left n bytes, or use a tool like tsearch2 to do fulltext search instead.

对于较大的字段，您可以索引左侧的n个字节，或者使用类似tsearch2的工具来进行全文搜索。

#1