不能索引大于缓冲区页面1/3的值

我对DB不太满意，所以请多多包涵。

我正在尝试将很长的JSON数据放入表中，该表是由Django框架创建的。

我在Heroku上使用Postgres。因此，当我尝试放入数据时，出现以下错误：

File "/app/.heroku/python/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
    return self.cursor.execute(sql, params)
psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"
HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

我的数据库和表如下所示：

gollahalli-me-django-test::DATABASE=> \dt
                      List of relations
 Schema |            Name            | Type  |     Owner
--------+----------------------------+-------+----------------
 public | auth_group                 | table | ffnyjettujyfck
 public | auth_group_permissions     | table | ffnyjettujyfck
 public | auth_permission            | table | ffnyjettujyfck
 public | auth_user                  | table | ffnyjettujyfck
 public | auth_user_groups           | table | ffnyjettujyfck
 public | auth_user_user_permissions | table | ffnyjettujyfck
 public | django_admin_log           | table | ffnyjettujyfck
 public | django_content_type        | table | ffnyjettujyfck
 public | django_migrations          | table | ffnyjettujyfck
 public | django_session             | table | ffnyjettujyfck
 public | editor_contentmodel        | table | ffnyjettujyfck
(11 rows)


gollahalli-me-django-test::DATABASE=> \d+ editor_contentmodel
                            Table "public.editor_contentmodel"
  Column   |           Type           | Modifiers | Storage  | Stats target | Description
-----------+--------------------------+-----------+----------+--------------+-------------
 ref_id    | character varying(120)   | not null  | extended |              |
 content   | text                     | not null  | extended |              |
 timestamp | timestamp with time zone | not null  | plain    |              |
Indexes:
    "editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
    "editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)
    "editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops)

看来我必须"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)换乘md5(content)

谁能帮我这个？我不知道该怎么做。

更新：

JSON内容-https://gist.github.com/akshaybabloo/0b3dc1fb4d964b10d09ccd6884fe3a40

更新2：

我创建了以下UNIQUE索引，应该在其中删除什么？

gollahalli_me_django=> create unique index on editor_contentmodel (ref_id, md5(content::text));
CREATE INDEX
gollahalli_me_django=> \d editor_contentmodel;
        Table "public.editor_contentmodel"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 ref_id    | character varying(120)   | not null
 content   | jsonb                    | not null
 timestamp | timestamp with time zone | not null
Indexes:
    "editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
    "editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id) <---- 1
    "editor_contentmodel_ref_id_md5_idx" UNIQUE, btree (ref_id, md5(content::text))
    "editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops) <----2

我应该删除1还是2（参见箭头）？

postgresql

— Akshay
source

您尝试索引TEXT列，而PostgreSQL（以及其他所有索引）都有限制，因此对其进行索引2713，所以是的-您可以尝试将其更改为MD5哈希值以使其更小

— a_vlad

@a_vlad我应该怎么做？不知道如何去做。

— akshay '17

什么是内容？是TEXT还是JSON？

— 埃文·卡罗尔

此外，对于相同的ref_id，您是否有两个内容？如果是这样，目的是什么？

— 埃文·卡罗尔

同意@EvanCarroll-可能是您根本不需要此索引吗？

— a_vlad

Answers:

您在上有一个唯一索引(content, ref_id)，名为editor_contentmodel_content_2192f49c_uniq

"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)

我不确定为什么要从这里开始。因此，让我们退后一步并解决此问题。这样可确保content和ref_id是唯一的。但是，在PostgreSQL中，UNIQUE约束是通过btree实现的，因此这是一个较差的解决方案。使用此方法，您将创建一个包含内容的btree，该内容本质上会复制此小表的大小，并建立一个巨大的索引。正如您所发现的，巨大的索引仍然受内容大小的限制。提出了一些问题

您是否在乎内容的独特性？如果您确实关心ref_id的内容是唯一的，那么您可能想要的是存储内容的哈希值。就像是..
```
CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,md5(content));
```
而是将内容的md5sum存储在btree上。只要ref_id在该ref_id上具有唯一的md5内容，就可以了。
如果您不在乎那content是独特的，请考虑将其完全删除。

当UNIQUE用btree 实现约束时（就像PostgreSQL一样），免费获得添加的索引可能毫无价值。在正常情况下，这具有附带好处。

CREATE TABLE foo ( ref_id int, content text );
CREATE UNIQUE INDEX ON foo (ref_id,content);

将加快查询

SELECT *
FROM foo
WHERE ref_id = 5
  AND content = 'This content'

但是，当您有机会使用功能md5()变体时，内容上将不再有索引，因此现在要使用该索引，您必须

仅查询ref_id，
在ref_id中添加一个子句 md5(content) = md5('This content')

整体text = text被高估。那几乎不是您想要的。如果您希望通过文本加快查询时间，则btree几乎没有用。您可能想研究一下

更新1

我建议基于您的JSON将其存储为jsonb，然后在上创建索引md5(content)；所以也许不是上面的而是运行这个。

ALTER TABLE public.editor_contentmodel
  ALTER COLUMN content
  SET DATA TYPE jsonb
  USING content::jsonb;

CREATE UNIQUE INDEX ON foo (ref_id,md5(content::text));

更新2

您问应该删除哪些索引

gollahalli_me_django=> create unique index on editor_contentmodel (ref_id, md5(content::text));
CREATE INDEX
gollahalli_me_django=> \d editor_contentmodel;
        Table "public.editor_contentmodel"
  Column   |           Type           | Modifiers
-----------+--------------------------+-----------
 ref_id    | character varying(120)   | not null
 content   | jsonb                    | not null
 timestamp | timestamp with time zone | not null
Indexes:
    "editor_contentmodel_pkey" PRIMARY KEY, btree (ref_id)
    "editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id) <---- 1
    "editor_contentmodel_ref_id_md5_idx" UNIQUE, btree (ref_id, md5(content::text))
    "editor_contentmodel_ref_id_8f74b4f3_like" btree (ref_id varchar_pattern_ops) <----2

这是一个令人惊讶的答案：您应该删除所有它们，除了：editor_contentmodel_pkey表示所有都ref_id必须是唯一的。

editor_contentmodel_content_2192f49c_uniq这个指标可以确保你UNIQUE的ref_idAND content，但如果你不能有重复的ref_id，你不能有一个重复的内容ref_id。因此，您不能在不违反的情况下违反该索引editor_contentmodel_pkey。这使其毫无意义。
editor_contentmodel_ref_id_md5_idx出于相同的原因，该索引也没有意义。您永远不会有重复md5(content::text)，ref_id因为无论价值md5(content::text)是多少，您都永远不会有重复ref_id。
editor_contentmodel_ref_id_8f74b4f3_like这也是一个坏主意，因为您正在复制索引ref_id。这不是没有用，它不是最佳的。相反，如果您需要varchar_pattern_ops使用它，而不仅仅是在content字段上使用。

最后一点，我们varchar在PostgreSQL中没有太多用处，因为它被实现为带有检查约束的varlena。没有任何收益，仅使用便没有任何损失text。因此，除非有具体的原因，为什么ref_id可以使用120个字符，但可以使用119个字符，否则我将只使用该text类型。

更新3

让我们回到您先前的问题。

psycopg2.OperationalError: index row size 3496 exceeds maximum 2712 for index "editor_contentmodel_content_2192f49c_uniq"

这是在告诉您问题专门在index上"editor_contentmodel_content_2192f49c_uniq"。您已将其定义为

"editor_contentmodel_content_2192f49c_uniq" UNIQUE CONSTRAINT, btree (content, ref_id)

因此，这里的问题是您试图在上创建索引content。但是，再次，索引本身存储的实际json内容content，这超出了限制。这实际上不是问题，因为即使没有设置该限制editor_contentmodel_content_2192f49c_uniq也将完全无用。为什么？同样，您不能为已经保证100％唯一的行添加更多唯一性。您似乎没有得到这个。让我们保持简单。

ref_id | content
1      | 1
1      | 1
1      | 2
2      | 1

在上面，唯一的唯一索引/约束（没有其他索引）(ref_id, content)是有意义的，因为它将停止的重复(1,1)。索引over (ref_id, md5(content))也是有意义的，因为它将(1,1)通过代理停止的重复来停止的重复(1, md5(1))。但是，所有这些工作都是有效的，因为在我给出的示例ref_id中，不能保证是UNIQUE。你ref_id不是这个ref_id。你ref_id是一个PRIMARY KEY。这意味着保证是唯一的。

这意味着永远不能插入重复(1,1)的行(1,2)。这也意味着除了ref_id以外，对任何内容的索引都不能保证更多的唯一性。他们必须是少比你目前拥有的指标严格。所以你的桌子只能像这样

ref_id | content
1      | 1
2      | 1

— 埃文·卡洛尔
source

我不能更改editor_contentmodel表column并为其添加md5唯一性吗？还是我们不能改变CONSTRAINT editor_contentmodel_content_2192f49c_uniq UNIQUE (content, ref_id)？为什么我必须为此创建一个新表？

— akshay'2

您不必创建新表，我只是向您展示了简化后的表的外观。只需忽略该CREATE TABLE命令并CREATE UNIQUE INDEX在其下面发出命令即可。然后是DROP您的旧索引。

— 埃文·卡罗尔

最后一个问题，您能看到我的Update 2

— akshay

@akshay已更新。

— 埃文·卡罗尔

非常感谢Evan，这对我有很大帮助。这个概念仍然有些不稳定（根本不是我的领域）。我会尝试学习。

— akshay '02

“ editor_contentmodel_pkey”主键，btree（ref_id）“ editor_contentmodel_content_2192f49c_uniq”唯一约束，btree（content，ref_id）

由于ref_id是主键，因此不能有重复的值。这意味着对组合（内容，ref_id）的唯一约束是无用的，因为任何违反该约束的行为都会违反主键约束。摆脱它。

— 詹妮丝
source

你的意思是摆脱它，放些类似的东西create unique index on editor_contentmodel (ref_id, md5(content::text))？或者我可以重新创建表并删除主键。

— akshay '17

我不知道你要什么如果要在ref_id上使用主键，请保留它。但是，如果您保留它，则editor_contentmodel_content_2192f49c_uniq没用，将其删除将解决您的标题问题。另外，如果保留主键，那么您建议的新索引也将是无用的（无用作为约束，它可能可用作索引，但这是非常不可能的）。

— jjanes