I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.
我正在编写一个相当复杂的Web应用程序。 Python后端运行一种算法,其状态取决于存储在几个相互关联的数据库表中的数据,这些数据表不经常更改,另外还有经常更改的用户特定数据。当用户使用应用程序时,算法的每用户状态会经历许多小的变化。在每个用户的工作中经常使用该算法来做出某些重要决定。
For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.
出于性能原因,在(半规范化的)数据库数据的每个请求上重新初始化状态很快就变得不可行。例如,以某种方式缓存状态的Python对象是非常可取的,这样可以在必要时简单地使用和/或更新它。但是,由于这是一个Web应用程序,因此有几个进程处理请求,因此使用全局变量是不可能的。
I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.
我已经尝试序列化相关对象(通过pickle)并将序列化数据保存到DB,现在我正在尝试通过memcached缓存序列化数据。但是,这仍然会导致经常序列化和反序列化对象的巨大开销。
I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.
我看过共享内存解决方案,但我发现唯一相关的东西是POSH。然而,POSH似乎没有被广泛使用,我觉得将这样的实验组件集成到我的应用程序中并不容易。
I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)
我需要一些建议!这是我开发Web应用程序的第一步,所以我希望这是一个很常见的问题,因为这些问题都有众所周知的解决方案。在这一点上,假设Python后端在单个服务器上运行的解决方案就足够了,但对于可扩展到多个服务器的解决方案来说也是额外的点:)
Notes:
- I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
- The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.
我有这个应用程序工作,目前在线和活跃的用户。我开始时没有进行任何过早优化,然后根据需要进行优化。我已经完成了测量和测试,以确保上述问题是实际的瓶颈。我敢肯定我可以从当前设置中挤出更多性能,但我想问一下是否有更好的方法。
设置本身仍在进行中;假设系统的架构可以是您的解决方案的任何套件。
6 个解决方案
#1
3
I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
我认为多处理框架在这里可能适用 - 即共享ctypes模块。
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing
.
多处理对于Python来说是相当新的,因此它可能有一些奇怪之处。我不太确定该解决方案是否适用于未通过多处理生成的流程。
#2
8
Be cautious of premature optimization.
过早优化要谨慎。
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
另外:“Python后端运行一个算法,其状态......”是Web框架中的会话。而已。让Django框架在缓存中维护会话状态。期。
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
“当用户使用应用程序时,算法的每用户状态会经历许多小的变化。”大多数Web框架都提供缓存的会话对象。通常它的性能非常高。请参阅Django的会话文档。
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
看来你有一些有用的东西。利用学习你的框架,学习工具,并学习你可以转动的旋钮,而不会流汗。具体来说,使用会话状态。
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
其次,摆弄缓存,会话管理和易于调整的事情,看看你是否有足够的速度。通过试用它们来了解MySQL套接字或命名管道是否更快。这些是无编程优化。
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
第三,衡量绩效以找到实际的瓶颈。准备好提供(并保护)测量结果,使其足够精细,足以提供有意义的替代品比较。
For example, show the performance difference between persistent sessions and cached sessions.
例如,显示持久会话和缓存会话之间的性能差异。
#3
2
I think you can give ZODB a shot.
我想你可以给ZODB一个机会。
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
“ZODB的一个主要特性是透明性。你不需要编写任何代码来显式地读取或写入数据库中的对象。你只需将持久对象放入一个像Python字典一样工作的容器中。字典被保存在数据库中。这个字典被称为数据库的“根”。它就像一个魔术包;你放入其中的任何Python对象都变得持久。“
Initailly it was a integral part of Zope, but lately a standalone package is also available.
最初它是Zope不可或缺的一部分,但最近也有一个独立的包。
It has the following limitation:
它有以下限制:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
“实际上,您可以在ZODB中存储一些限制。您可以将任何可以”腌制“的对象存储为标准的跨平台串行格式。可以对列表,字典和数字等对象进行腌制。像文件,套接字和Python代码对象一样,不能存储在数据库中,因为它们无法被腌制。“
I have read it but haven't given it a shot myself though.
我已经阅读了它但是我自己没有给它一个镜头。
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all. Note: In memory db is expensive on resources.
其他可能的事情可能是内存中的sqlite数据库,这可能会加快进程 - 作为内存数据库,但你仍然需要做序列化的东西和所有。注意:在内存中,db的资源很昂贵。
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1
这是一个链接:http://www.zope.org/Documentation/Articles/ZODB1
#4
2
First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
首先,您的方法不是常见的Web开发实践。即使正在使用多线程,Web应用程序也可以运行多处理环境,以实现可伸缩性和更轻松的部署。
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
如果您只需要初始化一个大对象,而不需要稍后更改,则可以通过使用在创建WSGI应用程序时初始化的全局变量,或者模块包含正在加载的对象等来轻松完成,多处理对你来说很好。
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
如果您需要更改对象并从每个线程访问它,您需要确保您的对象是线程安全的,使用锁来确保。并使用单个服务器上下文,一个进程。任何多线程python服务器都能很好地为您服务,FCGI也是这种设计的不错选择。
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.
但是,如果多个线程正在访问并更改您的对象,则锁定可能会对您的性能提升产生非常不利的影响,这可能会使所有好处消失。
#5
2
This is Durus, a persistent object system for applications written in the Python programming language. Durus offers an easy way to use and maintain a consistent collection of object instances used by one or more processes. Access and change of a persistent instances is managed through a cached Connection instance which includes commit() and abort() methods so that changes are transactional.
这是Durus,一种用Python编程语言编写的应用程序的持久对象系统。 Durus提供了一种简单的方法来使用和维护一个或多个进程使用的一致的对象实例集合。通过包含commit()和abort()方法的高速缓存Connection实例来管理持久化实例的访问和更改,以便更改是事务性的。
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.
我之前在一些研究代码中使用过它,我希望在某些计算结果中保留结果。我最终切换到pytables,因为它更好地满足了我的需求。
#6
1
Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
另一个选择是检查状态的要求,听起来如果序列化是瓶颈,那么对象非常大。你真的需要一个大的物体吗?
I know in the * podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.
我知道在*播客27中,reddit讨论了他们用于状态的内容,所以听听可能很有用。
#1
3
I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
我认为多处理框架在这里可能适用 - 即共享ctypes模块。
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing
.
多处理对于Python来说是相当新的,因此它可能有一些奇怪之处。我不太确定该解决方案是否适用于未通过多处理生成的流程。
#2
8
Be cautious of premature optimization.
过早优化要谨慎。
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
另外:“Python后端运行一个算法,其状态......”是Web框架中的会话。而已。让Django框架在缓存中维护会话状态。期。
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
“当用户使用应用程序时,算法的每用户状态会经历许多小的变化。”大多数Web框架都提供缓存的会话对象。通常它的性能非常高。请参阅Django的会话文档。
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
看来你有一些有用的东西。利用学习你的框架,学习工具,并学习你可以转动的旋钮,而不会流汗。具体来说,使用会话状态。
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
其次,摆弄缓存,会话管理和易于调整的事情,看看你是否有足够的速度。通过试用它们来了解MySQL套接字或命名管道是否更快。这些是无编程优化。
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
第三,衡量绩效以找到实际的瓶颈。准备好提供(并保护)测量结果,使其足够精细,足以提供有意义的替代品比较。
For example, show the performance difference between persistent sessions and cached sessions.
例如,显示持久会话和缓存会话之间的性能差异。
#3
2
I think you can give ZODB a shot.
我想你可以给ZODB一个机会。
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
“ZODB的一个主要特性是透明性。你不需要编写任何代码来显式地读取或写入数据库中的对象。你只需将持久对象放入一个像Python字典一样工作的容器中。字典被保存在数据库中。这个字典被称为数据库的“根”。它就像一个魔术包;你放入其中的任何Python对象都变得持久。“
Initailly it was a integral part of Zope, but lately a standalone package is also available.
最初它是Zope不可或缺的一部分,但最近也有一个独立的包。
It has the following limitation:
它有以下限制:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
“实际上,您可以在ZODB中存储一些限制。您可以将任何可以”腌制“的对象存储为标准的跨平台串行格式。可以对列表,字典和数字等对象进行腌制。像文件,套接字和Python代码对象一样,不能存储在数据库中,因为它们无法被腌制。“
I have read it but haven't given it a shot myself though.
我已经阅读了它但是我自己没有给它一个镜头。
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all. Note: In memory db is expensive on resources.
其他可能的事情可能是内存中的sqlite数据库,这可能会加快进程 - 作为内存数据库,但你仍然需要做序列化的东西和所有。注意:在内存中,db的资源很昂贵。
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1
这是一个链接:http://www.zope.org/Documentation/Articles/ZODB1
#4
2
First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
首先,您的方法不是常见的Web开发实践。即使正在使用多线程,Web应用程序也可以运行多处理环境,以实现可伸缩性和更轻松的部署。
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
如果您只需要初始化一个大对象,而不需要稍后更改,则可以通过使用在创建WSGI应用程序时初始化的全局变量,或者模块包含正在加载的对象等来轻松完成,多处理对你来说很好。
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
如果您需要更改对象并从每个线程访问它,您需要确保您的对象是线程安全的,使用锁来确保。并使用单个服务器上下文,一个进程。任何多线程python服务器都能很好地为您服务,FCGI也是这种设计的不错选择。
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.
但是,如果多个线程正在访问并更改您的对象,则锁定可能会对您的性能提升产生非常不利的影响,这可能会使所有好处消失。
#5
2
This is Durus, a persistent object system for applications written in the Python programming language. Durus offers an easy way to use and maintain a consistent collection of object instances used by one or more processes. Access and change of a persistent instances is managed through a cached Connection instance which includes commit() and abort() methods so that changes are transactional.
这是Durus,一种用Python编程语言编写的应用程序的持久对象系统。 Durus提供了一种简单的方法来使用和维护一个或多个进程使用的一致的对象实例集合。通过包含commit()和abort()方法的高速缓存Connection实例来管理持久化实例的访问和更改,以便更改是事务性的。
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.
我之前在一些研究代码中使用过它,我希望在某些计算结果中保留结果。我最终切换到pytables,因为它更好地满足了我的需求。
#6
1
Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
另一个选择是检查状态的要求,听起来如果序列化是瓶颈,那么对象非常大。你真的需要一个大的物体吗?
I know in the * podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.
我知道在*播客27中,reddit讨论了他们用于状态的内容,所以听听可能很有用。