get_or_create()线程安全

时间:2022-05-07 21:06:26

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.

我有一个Django模型,只能使用get_or_create(session=session)访问它,其中会话是另一个Django模型的外键。

Since I am only accessing through get_or_create(), I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create() operate atomically?

由于我只是通过get_or_create()访问,所以我可以想象,我只有一个实例具有会话的键。但是,我已经找到了具有相同会话密钥的多个实例。发生了什么?这是一个竞争条件,还是get_or_create()以原子方式操作?

5 个解决方案

#1


12  

Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :

实际上它不是线程安全的,您可以查看QuerySet对象的get_or_create方法的代码。

try:
    return self.get(**lookup), False
except self.model.DoesNotExist:
    params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
    params.update(defaults)
    obj = self.model(**params)
    sid = transaction.savepoint(using=self.db)
    obj.save(force_insert=True, using=self.db)
    transaction.savepoint_commit(sid, using=self.db)
    return obj, True

So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.

因此,在连续保存实例之前,两个线程可能会发现这个实例在DB中不存在,并开始创建一个新的实例。

#2


34  

NO, get_or_create is not atomic.

不,get_or_create不是原子的。

It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.

它首先询问DB是否存在令人满意的行;数据库返回,python检查结果;如果它不存在,它就创造它。在get和create之间,任何事情都可能发生——而与get标准对应的一行由其他代码创建。

For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.

例如,如果用户同时打开了两个页面(或执行了几个ajax请求),那么可能会导致所有的页面都失败,并导致所有的页面都创建一个新的行——使用相同的会话。

It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.

因此重要的是只使用get_or_create当重复的问题将被数据库通过一些独特/ unique_together,所以,即使多个线程可以保存(),只有一个会成功,和其他人将提高一个IntegrityError,您可以捕获和处理。

If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.

如果您使用get_or_create(一组)字段,而这些字段在数据库中不是惟一的,那么您将在数据库中创建重复的字段,这很少是您想要的。

More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).

一般情况下:不要依赖于您的应用程序来执行唯一性和避免在数据库中复制!这是数据库的工作!(除非您使用一些os有效的锁包装关键函数,但是我仍然建议使用数据库)。

With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.

使用这些警告,正确地使用get_or_create是一种易于阅读、易于编写的构造,完全补充了数据库完整性检查。

Refs and citations:

参考文献和引用:

#3


7  

Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:

线程是一个问题,但是在默认隔离级别的MySQL中,get_or_create被破坏了。

#4


2  

I was having this problem with a view that calls get_or_create.

我在调用get_or_create的视图中遇到了这个问题。

I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.

我使用了Gunicorn和多个工人,为了测试它,我将工人的数量改为1,这样问题就消失了。

The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):

我找到的最简单的解决方案是锁住表以便访问。我使用这个decorator对每个视图进行了锁(对于PostgreSQL):

http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):

编辑:我在try/除了在不支持DB引擎的情况下使用DB引擎(在我的例子中是在单元测试时使用SQLite),在那个decorator中封装了锁语句:

try:
    cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError: 
    pass

#5


-1  

I think that this isn't a race condition.. A race condition occurs when 2 or more threads or processess are trying to access same resource to modify it at the same time. You are describing a situation that you are get_or_create many objects using same session, that is not a problem since you are not trying to concurrently access session to modify some attribute..

我认为这不是种族问题。当两个或多个线程或处理器试图同时访问相同的资源以修改它时,就会出现竞争条件。您描述的情况是,您正在使用相同的会话创建多个对象,这不是问题,因为您并没有试图同时访问会话来修改某些属性。

#1


12  

Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :

实际上它不是线程安全的,您可以查看QuerySet对象的get_or_create方法的代码。

try:
    return self.get(**lookup), False
except self.model.DoesNotExist:
    params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
    params.update(defaults)
    obj = self.model(**params)
    sid = transaction.savepoint(using=self.db)
    obj.save(force_insert=True, using=self.db)
    transaction.savepoint_commit(sid, using=self.db)
    return obj, True

So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.

因此,在连续保存实例之前,两个线程可能会发现这个实例在DB中不存在,并开始创建一个新的实例。

#2


34  

NO, get_or_create is not atomic.

不,get_or_create不是原子的。

It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.

它首先询问DB是否存在令人满意的行;数据库返回,python检查结果;如果它不存在,它就创造它。在get和create之间,任何事情都可能发生——而与get标准对应的一行由其他代码创建。

For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.

例如,如果用户同时打开了两个页面(或执行了几个ajax请求),那么可能会导致所有的页面都失败,并导致所有的页面都创建一个新的行——使用相同的会话。

It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.

因此重要的是只使用get_or_create当重复的问题将被数据库通过一些独特/ unique_together,所以,即使多个线程可以保存(),只有一个会成功,和其他人将提高一个IntegrityError,您可以捕获和处理。

If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.

如果您使用get_or_create(一组)字段,而这些字段在数据库中不是惟一的,那么您将在数据库中创建重复的字段,这很少是您想要的。

More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).

一般情况下:不要依赖于您的应用程序来执行唯一性和避免在数据库中复制!这是数据库的工作!(除非您使用一些os有效的锁包装关键函数,但是我仍然建议使用数据库)。

With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.

使用这些警告,正确地使用get_or_create是一种易于阅读、易于编写的构造,完全补充了数据库完整性检查。

Refs and citations:

参考文献和引用:

#3


7  

Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:

线程是一个问题,但是在默认隔离级别的MySQL中,get_or_create被破坏了。

#4


2  

I was having this problem with a view that calls get_or_create.

我在调用get_or_create的视图中遇到了这个问题。

I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.

我使用了Gunicorn和多个工人,为了测试它,我将工人的数量改为1,这样问题就消失了。

The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):

我找到的最简单的解决方案是锁住表以便访问。我使用这个decorator对每个视图进行了锁(对于PostgreSQL):

http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):

编辑:我在try/除了在不支持DB引擎的情况下使用DB引擎(在我的例子中是在单元测试时使用SQLite),在那个decorator中封装了锁语句:

try:
    cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError: 
    pass

#5


-1  

I think that this isn't a race condition.. A race condition occurs when 2 or more threads or processess are trying to access same resource to modify it at the same time. You are describing a situation that you are get_or_create many objects using same session, that is not a problem since you are not trying to concurrently access session to modify some attribute..

我认为这不是种族问题。当两个或多个线程或处理器试图同时访问相同的资源以修改它时,就会出现竞争条件。您描述的情况是,您正在使用相同的会话创建多个对象,这不是问题,因为您并没有试图同时访问会话来修改某些属性。