I'm working on a Django website that has a PostgreSQL database, and one of my model's has a 'description' field that I'd like to give an index. Is there a maximum string size that can be added to this column?
我正在开发一个拥有PostgreSQL数据库的Django网站,我的一个模型有一个'description'字段,我想给它一个索引。是否可以将最大字符串大小添加到此列中?
Django's documentation on PostgreSQL indices makes it seem like there is no limit, since you can create indices for TextFields that don't define a max_lenth. However, I found this post about btree column size errors, which makes me think that 2713 / 4 - 4 = 674.25 is the most UTF-8 characters that would always fit. Can anyone point me to documentation for this or share there experiences with trying to put indices on Django TextFields?
Django关于PostgreSQL索引的文档似乎没有限制,因为您可以为没有定义max_lenth的TextField创建索引。但是,我发现这篇关于btree列大小错误的帖子,这让我觉得2713/4 - 4 = 674.25是最适合的UTF-8字符。任何人都可以指出我的文档或分享尝试在Django TextFields上放置索引的经验吗?
1 个解决方案
#1
1
There is indeed a limit, but it's not tiny.
确实有一个限制,但它并不小。
ERROR: index row requires 9400 bytes, maximum size is 8191
错误:索引行需要9400个字节,最大大小为8191
To trigger this:
触发这个:
CREATE TABLE bigtext(x text);
CREATE INDEX bigtext_x ON bigtext(x);
INSERT INTO bigtext(x) SELECT repeat('x', 819200);
Given the error you'd expect this to fail:
鉴于您希望失败的错误:
INSERT INTO bigtext(x) SELECT repeat('x', 8192);
but because of compression, it won't; you can tack an extra zero on and it'll still fit.
但由于压缩,它不会;你可以加上额外的零,它仍然适合。
Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.
在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。
For this reason you can't have a simple CHECK
constraint, it's not as simple as "number of chars".
出于这个原因,你不能有一个简单的CHECK约束,它不像“chars数”那么简单。
You might find pg_column_size(...)
useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK
constraint though, because it always shows unTOASTed datums at full uncompressed size.
你可能会发现pg_column_size(...)很有用;它会告诉您磁盘的压缩大小。但它在CHECK约束中无法帮助您,因为它始终以完全未压缩的大小显示未经过处理的基准。
The PostgreSQL docs could describe this limit a lot better (or at all).
PostgreSQL文档可以更好地(或根本)描述这个限制。
For bigger fields you can index the left
n bytes, or use a tool like tsearch2
to do fulltext search instead.
对于较大的字段,您可以索引左侧的n个字节,或者使用类似tsearch2的工具来进行全文搜索。
#1
1
There is indeed a limit, but it's not tiny.
确实有一个限制,但它并不小。
ERROR: index row requires 9400 bytes, maximum size is 8191
错误:索引行需要9400个字节,最大大小为8191
To trigger this:
触发这个:
CREATE TABLE bigtext(x text);
CREATE INDEX bigtext_x ON bigtext(x);
INSERT INTO bigtext(x) SELECT repeat('x', 819200);
Given the error you'd expect this to fail:
鉴于您希望失败的错误:
INSERT INTO bigtext(x) SELECT repeat('x', 8192);
but because of compression, it won't; you can tack an extra zero on and it'll still fit.
但由于压缩,它不会;你可以加上额外的零,它仍然适合。
Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.
在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。
For this reason you can't have a simple CHECK
constraint, it's not as simple as "number of chars".
出于这个原因,你不能有一个简单的CHECK约束,它不像“chars数”那么简单。
You might find pg_column_size(...)
useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK
constraint though, because it always shows unTOASTed datums at full uncompressed size.
你可能会发现pg_column_size(...)很有用;它会告诉您磁盘的压缩大小。但它在CHECK约束中无法帮助您,因为它始终以完全未压缩的大小显示未经过处理的基准。
The PostgreSQL docs could describe this limit a lot better (or at all).
PostgreSQL文档可以更好地(或根本)描述这个限制。
For bigger fields you can index the left
n bytes, or use a tool like tsearch2
to do fulltext search instead.
对于较大的字段,您可以索引左侧的n个字节,或者使用类似tsearch2的工具来进行全文搜索。