I'm working on a Django website that has a PostgreSQL database, and one of my model's has a 'description' field that I'd like to give an index. Is there a maximum string size that can be added to this column?
Django's documentation on PostgreSQL indices makes it seem like there is no limit, since you can create indices for TextFields that don't define a max_lenth. However, I found this post about btree column size errors, which makes me think that 2713 / 4 - 4 = 674.25 is the most UTF-8 characters that would always fit. Can anyone point me to documentation for this or share there experiences with trying to put indices on Django TextFields?
Django关于PostgreSQL索引的文档似乎没有限制,因为您可以为没有定义max_lenth的TextField创建索引。但是,我发现这篇关于btree列大小错误的帖子,这让我觉得2713/4 - 4 = 674.25是最适合的UTF-8字符。任何人都可以指出我的文档或分享尝试在Django TextFields上放置索引的经验吗?
1 个解决方案
There is indeed a limit, but it's not tiny.
ERROR: index row requires 9400 bytes, maximum size is 8191
To trigger this:
CREATE TABLE bigtext(x text);
CREATE INDEX bigtext_x ON bigtext(x);
INSERT INTO bigtext(x) SELECT repeat('x', 819200);
Given the error you'd expect this to fail:
INSERT INTO bigtext(x) SELECT repeat('x', 8192);
but because of compression, it won't; you can tack an extra zero on and it'll still fit.
Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.
在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。
For this reason you can't have a simple CHECK
constraint, it's not as simple as "number of chars".
You might find pg_column_size(...)
useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK
constraint though, because it always shows unTOASTed datums at full uncompressed size.
The PostgreSQL docs could describe this limit a lot better (or at all).
For bigger fields you can index the left
n bytes, or use a tool like tsearch2
to do fulltext search instead.
There is indeed a limit, but it's not tiny.
ERROR: index row requires 9400 bytes, maximum size is 8191
To trigger this:
CREATE TABLE bigtext(x text);
CREATE INDEX bigtext_x ON bigtext(x);
INSERT INTO bigtext(x) SELECT repeat('x', 819200);
Given the error you'd expect this to fail:
INSERT INTO bigtext(x) SELECT repeat('x', 8192);
but because of compression, it won't; you can tack an extra zero on and it'll still fit.
Smaller, less repetitive and therefore less compressible texts will fit less before overrunning a page and failing. In theory if you had totally random garbage then only 8191 bytes should fit, but in reality it'll still be a bit more on a utf-8 db because utf-8 doesn't permit total randomness; probably in the vicinity of 8191 totally random utf-8 chars though.
在超越页面和失败之前,较小,较少重复且因此较少可压缩的文本将更少。从理论上讲,如果你有完全随机的垃圾,那么只有8191个字节应该适合,但实际上它仍然会在utf-8 db上多一点,因为utf-8不允许完全随机性;可能在8191附近完全随机的utf-8字符。
For this reason you can't have a simple CHECK
constraint, it's not as simple as "number of chars".
You might find pg_column_size(...)
useful; it tells you the on-disk compressed size of a datum. It won't help you in a CHECK
constraint though, because it always shows unTOASTed datums at full uncompressed size.
The PostgreSQL docs could describe this limit a lot better (or at all).
For bigger fields you can index the left
n bytes, or use a tool like tsearch2
to do fulltext search instead.