I wrote a function to create posts for a simple blogging engine:
我写了一个函数为一个简单的博客引擎创建帖子:
CREATE FUNCTION CreatePost(VARCHAR, TEXT, VARCHAR[])
RETURNS INTEGER AS $$
DECLARE
InsertedPostId INTEGER;
TagName VARCHAR;
BEGIN
INSERT INTO Posts (Title, Body)
VALUES ($1, $2)
RETURNING Id INTO InsertedPostId;
FOREACH TagName IN ARRAY $3 LOOP
DECLARE
InsertedTagId INTEGER;
BEGIN
-- I am concerned about this part.
BEGIN
INSERT INTO Tags (Name)
VALUES (TagName)
RETURNING Id INTO InsertedTagId;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
SELECT INTO InsertedTagId Id
FROM Tags
WHERE Name = TagName
FETCH FIRST ROW ONLY;
END;
INSERT INTO Taggings (PostId, TagId)
VALUES (InsertedPostId, InsertedTagId);
END;
END LOOP;
RETURN InsertedPostId;
END;
$$ LANGUAGE 'plpgsql';
Is this prone to race conditions when multiple users delete tags and create posts at the same time?
Specifically, do transactions (and thus functions) prevent such race conditions from happening?
I'm using PostgreSQL 9.2.3.
当多个用户同时删除标签并创建帖子时,这是否容易出现竞争条件?具体来说,交易(以及功能)是否会阻止这种竞争条件的发生?我正在使用PostgreSQL 9.2.3。
3 个解决方案
#1
37
It's the recurring problem of SELECT
or INSERT
under possible concurrent write load, related to (but different from) UPSERT
(which is INSERT
or UPDATE
).
这是可能的并发写入负载下的SELECT或INSERT反复出现的问题,与UPSERT(INSERT或UPDATE)相关(但不同)。
For Postgres 9.5 or later
Using the new UPSERT implementation INSERT ... ON CONFLICT .. DO UPDATE
, we can largely simplify. PL/pgSQL function to INSERT
or SELECT
a single row (tag):
使用新的UPSERT实现INSERT ... ON CONFLICT .. DO UPDATE,我们可以在很大程度上简化。 PL / pgSQL函数要INSERT或SELECT单行(标记):
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$ LANGUAGE plpgsql;
There is still a tiny window for a race condition. To make absolutely sure you get an ID:
竞争条件仍有一个小窗口。要确保你得到一个ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$ LANGUAGE plpgsql;
This keeps looping until either INSERT
or SELECT
succeeds. Call:
这将保持循环,直到INSERT或SELECT成功。呼叫:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT
statement with FOR SHARE
.
If the row gets inserted instead, it is locked until the end of the transaction anyway.
如果同一事务中的后续命令依赖于行的存在,并且实际上其他事务可能同时更新或删除它,则可以使用FOR SHARE锁定SELECT语句中的现有行。如果该行被插入,则它将被锁定,直到事务结束为止。
If a new row is inserted most of the time, start with INSERT
to make it faster.
如果在大多数时间插入新行,请从INSERT开始以使其更快。
Related:
有关:
- Get Id from a conditional INSERT
- 从条件INSERT获取Id
- How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
- 如何在INSERT ... ON CONFLICT中包含RETURNING中的排除行
Related (pure SQL) solution to INSERT
or SELECT
multiple rows (a set) at once:
一次性INSERT或SELECT多行(一组)的相关(纯SQL)解决方案:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
- 如何在PostgreSQL中使用RETURNING和ON CONFLICT?
What's wrong with this pure SQL solution?
I had previously also suggested this SQL function:
我之前也曾建议过这个SQL函数:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1
$func$ LANGUAGE sql;
Which isn't entirely wrong, but it fails to seal a loophole, like @FunctorSalad worked out in his added answer. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. All statements in a query with CTEs are virtually executed at the same time. The manual:
这并不是完全错误的,但它没有堵住漏洞,就像@FunctorSalad在他补充的答案中解决的那样。如果并发事务尝试同时执行相同操作,则该函数可能会产生空结果。具有CTE的查询中的所有语句实际上是同时执行的。手册:
All the statements are executed with the same snapshot
所有语句都使用相同的快照执行
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
如果并发事务稍早插入相同的新标记,但尚未提交,则:
-
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
在等待并发事务完成后,UPSERT部分变空。 (如果并发事务应该回滚,它仍然会插入新标记并返回一个新ID。)
-
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
SELECT部分也是空的,因为它基于相同的快照,其中来自(但未提交的)并发事务的新标记不可见。
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
我们一无所获。不是预期的。这对于天真的逻辑来说是违反直觉的(我被抓住了),但这就是Postgres的MVCC模型的工作方式 - 必须有效。
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads.
因此,如果多个事务可以尝试同时插入相同的标记,请不要使用此方法。或者循环,直到你真正得到一排。在常见的工作负载中几乎不会触发循环。
Original answer (Postgres 9.4 or older)
Given this (slightly simplified) table:
鉴于此(略微简化)表:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
... a practically 100% secure function to insert new tag / select existing one, could look like this.
Why not 100%? Consider the notes in the manual for the related UPSERT
example:
...插入新标签/选择现有标签的几乎100%安全功能,可能看起来像这样。为什么不100%?请考虑相关UPSERT示例手册中的注释:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int) AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
SQL小提琴。
Explanation
-
Try the
SELECT
first. This way you avoid the considerably more expensive exception handling 99.99% of the time.首先尝试SELECT。这样,您可以避免99.99%的时间内相当昂贵的异常处理。
-
Use a CTE to minimize the (already tiny) time slot for the race condition.
使用CTE最小化竞争条件的(已经很小的)时隙。
-
The time window between the
SELECT
and theINSERT
within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.SELECT和INSERT在一个查询中的时间窗口非常小。如果您没有繁重的并发负载,或者您可以每年使用一次异常,那么您可以忽略这种情况并使用更快的SQL语句。
-
No need for
FETCH FIRST ROW ONLY
(=LIMIT 1
). The tag name is obviouslyUNIQUE
.不需要FETCH FIRST行(= LIMIT 1)。标签名称显然是独一无二的。
-
Remove
FOR SHARE
in my example if you don't usually have concurrentDELETE
orUPDATE
on the tabletag
. Costs a tiny bit of performance.如果您通常在表标记上没有并发DELETE或UPDATE,请在我的示例中删除FOR SHARE。耗费一点点性能。
-
Never quote the language name:
'plpgsql'.plpgsql
is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.永远不要引用语言名称:'plpgsql'。 plpgsql是一个标识符。引用可能会导致问题,只能容忍向后兼容性。
-
Don't use non-descriptive column names like
id
orname
. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.不要使用非描述性列名,如id或name。当连接几个表(这是您在关系数据库中执行的操作)时,您最终会得到多个相同的名称,并且必须使用别名。
Built into your function
Using this function you could largely simplify your FOREACH LOOP
to:
使用此功能可以大大简化您的FOREACH LOOP:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest()
:
但是,使用unnest()作为单个SQL语句更快:
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
取代整个循环。
Alternative solution
This variant builds on the behavior of UNION ALL
with a LIMIT
clause: as soon as enough rows are found, the rest is never executed:
此变体建立在UNION ALL的行为的基础上,使用LIMIT子句:只要找到足够的行,其余的行就永远不会执行:
- Way to try multiple SELECTs till a result is available?
- 尝试多个SELECT直到结果可用的方法?
Building on this, we can outsource the INSERT
into a separate function. Only there we need exception handling. Just as safe as the first solution.
在此基础上,我们可以将INSERT外包给一个单独的函数。只有在那里我们需要异常处理。和第一个解决方案一样安全。
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$ LANGUAGE plpgsql;
Which is used in the main function:
在主要功能中使用:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
-
This is a bit cheaper if most of the calls only need
SELECT
, because the more expensive block withINSERT
containing theEXCEPTION
clause is rarely entered. The query is also simpler.如果大多数调用只需要SELECT,这会便宜一些,因为很少输入包含EXCEPTION子句的INSERT的更昂贵的块。查询也更简单。
-
FOR SHARE
is not possible here (not allowed inUNION
query).此处不存在FOR SHARE(UNION查询中不允许)。
-
LIMIT 1
would not be necessary (tested in pg 9.4). Postgres derivesLIMIT 1
fromINTO _tag_id
and only executes until the first row is found.不需要LIMIT 1(在第9.4页中测试)。 Postgres从INTO _tag_id派生LIMIT 1,只执行直到找到第一行。
#2
2
There's still something to watch out for even when using the ON CONFLICT
clause introduced in Postgres 9.5. Using the same function and example table as in @Erwin Brandstetter's answer, if we do:
即使使用Postgres 9.5中引入的ON CONFLICT子句,仍有一些需要注意的事项。使用与@Erwin Brandstetter的答案相同的函数和示例表,如果我们这样做:
Session 1: begin;
Session 2: begin;
Session 1: select f_tag_id('a');
f_tag_id
----------
11
(1 row)
Session 2: select f_tag_id('a');
[Session 2 blocks]
Session 1: commit;
[Session 2 returns:]
f_tag_id
----------
NULL
(1 row)
So f_tag_id
returned NULL
in session 2, which would be impossible in a single-threaded world!
所以f_tag_id在会话2中返回NULL,这在单线程世界中是不可能的!
If we raise the transaction isolation level to repeatable read
(or the stronger serializable
), session 2 throws ERROR: could not serialize access due to concurrent update
instead. So no "impossible" results at least, but unfortunately we now need to be prepared to retry the transaction.
如果我们将事务隔离级别提升到可重复读取(或更强的可序列化),则会话2抛出错误:由于并发更新而无法序列化访问。所以至少没有“不可能”的结果,但不幸的是我们现在需要准备重试交易。
Edit: With repeatable read
or serializable
, if session 1 inserts tag a
, then session 2 inserts b
, then session 1 tries to insert b
and session 2 tries to insert a
, one session detects a deadlock:
编辑:使用可重复读取或可序列化,如果会话1插入标记a,则会话2插入b,然后会话1尝试插入b,会话2尝试插入a,一个会话检测到死锁:
ERROR: deadlock detected
DETAIL: Process 14377 waits for ShareLock on transaction 1795501; blocked by process 14363.
Process 14363 waits for ShareLock on transaction 1795503; blocked by process 14377.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,3) in relation "tag"
SQL function "f_tag_id" statement 1
After the session that received the deadlock error rolls back, the other session continues. So I guess we should treat deadlock just like serialization_failure
and retry, in a situation like this?
收到死锁错误的会话回滚后,另一个会话继续。所以我想我们应该像serialization_failure一样处理死锁并重试,在这样的情况下?
Alternatively, insert the tags in a consistent order, but this is not easy if they don't all get added in one place.
或者,以一致的顺序插入标签,但如果不是所有标签都添加到一个地方,这并不容易。
#3
-1
I think there is a slight chance that when the tag already existed it might be deleted by another transaction after your transaction has found it. Using a SELECT FOR UPDATE should solve that.
我认为,当标签已经存在时,它很可能会在您的交易找到之后被另一个交易删除。使用SELECT FOR UPDATE可以解决这个问题。
#1
37
It's the recurring problem of SELECT
or INSERT
under possible concurrent write load, related to (but different from) UPSERT
(which is INSERT
or UPDATE
).
这是可能的并发写入负载下的SELECT或INSERT反复出现的问题,与UPSERT(INSERT或UPDATE)相关(但不同)。
For Postgres 9.5 or later
Using the new UPSERT implementation INSERT ... ON CONFLICT .. DO UPDATE
, we can largely simplify. PL/pgSQL function to INSERT
or SELECT
a single row (tag):
使用新的UPSERT实现INSERT ... ON CONFLICT .. DO UPDATE,我们可以在很大程度上简化。 PL / pgSQL函数要INSERT或SELECT单行(标记):
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$ LANGUAGE plpgsql;
There is still a tiny window for a race condition. To make absolutely sure you get an ID:
竞争条件仍有一个小窗口。要确保你得到一个ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$ LANGUAGE plpgsql;
This keeps looping until either INSERT
or SELECT
succeeds. Call:
这将保持循环,直到INSERT或SELECT成功。呼叫:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT
statement with FOR SHARE
.
If the row gets inserted instead, it is locked until the end of the transaction anyway.
如果同一事务中的后续命令依赖于行的存在,并且实际上其他事务可能同时更新或删除它,则可以使用FOR SHARE锁定SELECT语句中的现有行。如果该行被插入,则它将被锁定,直到事务结束为止。
If a new row is inserted most of the time, start with INSERT
to make it faster.
如果在大多数时间插入新行,请从INSERT开始以使其更快。
Related:
有关:
- Get Id from a conditional INSERT
- 从条件INSERT获取Id
- How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
- 如何在INSERT ... ON CONFLICT中包含RETURNING中的排除行
Related (pure SQL) solution to INSERT
or SELECT
multiple rows (a set) at once:
一次性INSERT或SELECT多行(一组)的相关(纯SQL)解决方案:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
- 如何在PostgreSQL中使用RETURNING和ON CONFLICT?
What's wrong with this pure SQL solution?
I had previously also suggested this SQL function:
我之前也曾建议过这个SQL函数:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1
$func$ LANGUAGE sql;
Which isn't entirely wrong, but it fails to seal a loophole, like @FunctorSalad worked out in his added answer. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. All statements in a query with CTEs are virtually executed at the same time. The manual:
这并不是完全错误的,但它没有堵住漏洞,就像@FunctorSalad在他补充的答案中解决的那样。如果并发事务尝试同时执行相同操作,则该函数可能会产生空结果。具有CTE的查询中的所有语句实际上是同时执行的。手册:
All the statements are executed with the same snapshot
所有语句都使用相同的快照执行
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
如果并发事务稍早插入相同的新标记,但尚未提交,则:
-
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
在等待并发事务完成后,UPSERT部分变空。 (如果并发事务应该回滚,它仍然会插入新标记并返回一个新ID。)
-
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
SELECT部分也是空的,因为它基于相同的快照,其中来自(但未提交的)并发事务的新标记不可见。
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
我们一无所获。不是预期的。这对于天真的逻辑来说是违反直觉的(我被抓住了),但这就是Postgres的MVCC模型的工作方式 - 必须有效。
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads.
因此,如果多个事务可以尝试同时插入相同的标记,请不要使用此方法。或者循环,直到你真正得到一排。在常见的工作负载中几乎不会触发循环。
Original answer (Postgres 9.4 or older)
Given this (slightly simplified) table:
鉴于此(略微简化)表:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
... a practically 100% secure function to insert new tag / select existing one, could look like this.
Why not 100%? Consider the notes in the manual for the related UPSERT
example:
...插入新标签/选择现有标签的几乎100%安全功能,可能看起来像这样。为什么不100%?请考虑相关UPSERT示例手册中的注释:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int) AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
SQL小提琴。
Explanation
-
Try the
SELECT
first. This way you avoid the considerably more expensive exception handling 99.99% of the time.首先尝试SELECT。这样,您可以避免99.99%的时间内相当昂贵的异常处理。
-
Use a CTE to minimize the (already tiny) time slot for the race condition.
使用CTE最小化竞争条件的(已经很小的)时隙。
-
The time window between the
SELECT
and theINSERT
within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.SELECT和INSERT在一个查询中的时间窗口非常小。如果您没有繁重的并发负载,或者您可以每年使用一次异常,那么您可以忽略这种情况并使用更快的SQL语句。
-
No need for
FETCH FIRST ROW ONLY
(=LIMIT 1
). The tag name is obviouslyUNIQUE
.不需要FETCH FIRST行(= LIMIT 1)。标签名称显然是独一无二的。
-
Remove
FOR SHARE
in my example if you don't usually have concurrentDELETE
orUPDATE
on the tabletag
. Costs a tiny bit of performance.如果您通常在表标记上没有并发DELETE或UPDATE,请在我的示例中删除FOR SHARE。耗费一点点性能。
-
Never quote the language name:
'plpgsql'.plpgsql
is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.永远不要引用语言名称:'plpgsql'。 plpgsql是一个标识符。引用可能会导致问题,只能容忍向后兼容性。
-
Don't use non-descriptive column names like
id
orname
. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.不要使用非描述性列名,如id或name。当连接几个表(这是您在关系数据库中执行的操作)时,您最终会得到多个相同的名称,并且必须使用别名。
Built into your function
Using this function you could largely simplify your FOREACH LOOP
to:
使用此功能可以大大简化您的FOREACH LOOP:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest()
:
但是,使用unnest()作为单个SQL语句更快:
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
取代整个循环。
Alternative solution
This variant builds on the behavior of UNION ALL
with a LIMIT
clause: as soon as enough rows are found, the rest is never executed:
此变体建立在UNION ALL的行为的基础上,使用LIMIT子句:只要找到足够的行,其余的行就永远不会执行:
- Way to try multiple SELECTs till a result is available?
- 尝试多个SELECT直到结果可用的方法?
Building on this, we can outsource the INSERT
into a separate function. Only there we need exception handling. Just as safe as the first solution.
在此基础上,我们可以将INSERT外包给一个单独的函数。只有在那里我们需要异常处理。和第一个解决方案一样安全。
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$ LANGUAGE plpgsql;
Which is used in the main function:
在主要功能中使用:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int) AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
-
This is a bit cheaper if most of the calls only need
SELECT
, because the more expensive block withINSERT
containing theEXCEPTION
clause is rarely entered. The query is also simpler.如果大多数调用只需要SELECT,这会便宜一些,因为很少输入包含EXCEPTION子句的INSERT的更昂贵的块。查询也更简单。
-
FOR SHARE
is not possible here (not allowed inUNION
query).此处不存在FOR SHARE(UNION查询中不允许)。
-
LIMIT 1
would not be necessary (tested in pg 9.4). Postgres derivesLIMIT 1
fromINTO _tag_id
and only executes until the first row is found.不需要LIMIT 1(在第9.4页中测试)。 Postgres从INTO _tag_id派生LIMIT 1,只执行直到找到第一行。
#2
2
There's still something to watch out for even when using the ON CONFLICT
clause introduced in Postgres 9.5. Using the same function and example table as in @Erwin Brandstetter's answer, if we do:
即使使用Postgres 9.5中引入的ON CONFLICT子句,仍有一些需要注意的事项。使用与@Erwin Brandstetter的答案相同的函数和示例表,如果我们这样做:
Session 1: begin;
Session 2: begin;
Session 1: select f_tag_id('a');
f_tag_id
----------
11
(1 row)
Session 2: select f_tag_id('a');
[Session 2 blocks]
Session 1: commit;
[Session 2 returns:]
f_tag_id
----------
NULL
(1 row)
So f_tag_id
returned NULL
in session 2, which would be impossible in a single-threaded world!
所以f_tag_id在会话2中返回NULL,这在单线程世界中是不可能的!
If we raise the transaction isolation level to repeatable read
(or the stronger serializable
), session 2 throws ERROR: could not serialize access due to concurrent update
instead. So no "impossible" results at least, but unfortunately we now need to be prepared to retry the transaction.
如果我们将事务隔离级别提升到可重复读取(或更强的可序列化),则会话2抛出错误:由于并发更新而无法序列化访问。所以至少没有“不可能”的结果,但不幸的是我们现在需要准备重试交易。
Edit: With repeatable read
or serializable
, if session 1 inserts tag a
, then session 2 inserts b
, then session 1 tries to insert b
and session 2 tries to insert a
, one session detects a deadlock:
编辑:使用可重复读取或可序列化,如果会话1插入标记a,则会话2插入b,然后会话1尝试插入b,会话2尝试插入a,一个会话检测到死锁:
ERROR: deadlock detected
DETAIL: Process 14377 waits for ShareLock on transaction 1795501; blocked by process 14363.
Process 14363 waits for ShareLock on transaction 1795503; blocked by process 14377.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (0,3) in relation "tag"
SQL function "f_tag_id" statement 1
After the session that received the deadlock error rolls back, the other session continues. So I guess we should treat deadlock just like serialization_failure
and retry, in a situation like this?
收到死锁错误的会话回滚后,另一个会话继续。所以我想我们应该像serialization_failure一样处理死锁并重试,在这样的情况下?
Alternatively, insert the tags in a consistent order, but this is not easy if they don't all get added in one place.
或者,以一致的顺序插入标签,但如果不是所有标签都添加到一个地方,这并不容易。
#3
-1
I think there is a slight chance that when the tag already existed it might be deleted by another transaction after your transaction has found it. Using a SELECT FOR UPDATE should solve that.
我认为,当标签已经存在时,它很可能会在您的交易找到之后被另一个交易删除。使用SELECT FOR UPDATE可以解决这个问题。