如果数据库是Postgresql并且db_index = True,那么字符串在Django字段中有多大?

时间:2022-09-12 19:21:50

I'm working on a Django website that has a PostgreSQL database, and one of my model's has a 'description' field that I'd like to give an index. Is there a maximum string size that can be added to this column?

我正在开发一个拥有PostgreSQL数据库的Django网站,我的一个模型有一个'description'字段,我想给它一个索引。是否可以将最大字符串大小添加到此列中?

Django's documentation on PostgreSQL indices makes it seem like there is no limit, since you can create indices for TextFields that don't define a max_lenth. However, I found this post about btree column size errors, which makes me think that 2713 / 4 - 4 = 674.25 is the most UTF-8 characters that would always fit. Can anyone point me to documentation for this or share there experiences with trying to put indices on Django TextFields?

Django关于PostgreSQL索引的文档似乎没有限制,因为您可以为没有定义max_lenth的TextField创建索引。但是,我发现这篇关于btree列大小错误的帖子,这让我觉得2713/4 - 4 = 674.25是最适合的UTF-8字符。任何人都可以指出我的文档或分享尝试在Django TextFields上放置索引的经验吗?

1 个解决方案

#1


1  

There is indeed a limit, but it's not tiny.

确实有一个限制,但它并不小。

ERROR: index row requires 9400 bytes, maximum size is 8191

错误:索引行需要9400个字节,最大大小为8191

To trigger this:

触发这个:

CREATE TABLE bigtext(x text);

CREATE INDEX bigtext_x ON bigtext(x);

 INSERT INTO bigtext(x) SELECT repeat('x', 819200);

Given the error you'd expect this to fail:

鉴于您希望失败的错误:

INSERT INTO bigtext(x) SELECT repeat('x', 8192);

but because of compression, it won't; you can tack an extra zero on and it'll still fit.

但由于压缩,它不会;你可以加上额外的零,它仍然适合。

Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.

在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。

For this reason you can't have a simple CHECK constraint, it's not as simple as "number of chars".

出于这个原因,你不能有一个简单的CHECK约束,它不像“chars数”那么简单。

You might find pg_column_size(...) useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK constraint though, because it always shows unTOASTed datums at full uncompressed size.

你可能会发现pg_column_size(...)很有用;它会告诉您磁盘的压缩大小。但它在CHECK约束中无法帮助您,因为它始终以完全未压缩的大小显示未经过处理的基准。

The PostgreSQL docs could describe this limit a lot better (or at all).

PostgreSQL文档可以更好地(或根本)描述这个限制。

For bigger fields you can index the left n bytes, or use a tool like tsearch2 to do fulltext search instead.

对于较大的字段,您可以索引左侧的n个字节,或者使用类似tsearch2的工具来进行全文搜索。

#1


1  

There is indeed a limit, but it's not tiny.

确实有一个限制,但它并不小。

ERROR: index row requires 9400 bytes, maximum size is 8191

错误:索引行需要9400个字节,最大大小为8191

To trigger this:

触发这个:

CREATE TABLE bigtext(x text);

CREATE INDEX bigtext_x ON bigtext(x);

 INSERT INTO bigtext(x) SELECT repeat('x', 819200);

Given the error you'd expect this to fail:

鉴于您希望失败的错误:

INSERT INTO bigtext(x) SELECT repeat('x', 8192);

but because of compression, it won't; you can tack an extra zero on and it'll still fit.

但由于压缩,它不会;你可以加上额外的零,它仍然适合。

Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.

在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。

For this reason you can't have a simple CHECK constraint, it's not as simple as "number of chars".

出于这个原因,你不能有一个简单的CHECK约束,它不像“chars数”那么简单。

You might find pg_column_size(...) useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK constraint though, because it always shows unTOASTed datums at full uncompressed size.

你可能会发现pg_column_size(...)很有用;它会告诉您磁盘的压缩大小。但它在CHECK约束中无法帮助您,因为它始终以完全未压缩的大小显示未经过处理的基准。

The PostgreSQL docs could describe this limit a lot better (or at all).

PostgreSQL文档可以更好地(或根本)描述这个限制。

For bigger fields you can index the left n bytes, or use a tool like tsearch2 to do fulltext search instead.

对于较大的字段,您可以索引左侧的n个字节,或者使用类似tsearch2的工具来进行全文搜索。