在Django中,什么是用于小型且很少更改的数据的最快缓存?

时间:2021-07-07 03:52:33

I am in a process of switching from PHP to Python + Django and looking for equivalent of PHP's "Array cache".

我正在从PHP切换到Python + Django,并寻找与PHP的“数组缓存”等价的东西。

For small data sets from DB like "categories" that was changing very rarely but accessed very often i was using array cache.

对于像“类别”这样的来自DB的小数据集,它们很少变化,但我经常使用数组缓存。

http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/

http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/

Concept of it was to generate PHP source with the tree of categories and when the opcode was turned on it was working like embedding data into application sources. It was the fastest imaginable cache, very helpful for large load.

它的概念是用目录树生成PHP源代码,当操作码打开时,就像将数据嵌入到应用程序源代码中一样。它是可以想象的最快的缓存,对大负载非常有用。

Django manual(https://docs.djangoproject.com/en/1.4/topics/cache/) states:

Django手册(https://docs.djangoproject.com/en/1.4/topics/cache/):

By far the fastest, most efficient type of cache available to Django, Memcached..

到目前为止,对于Django来说,Memcached是最快、最高效的缓存类型。

So the questions are:

所以,问题是:

  • Would generating a .py file with python dictionaries/lists made any sense?
  • 用python字典/列表生成.py文件有意义吗?
  • Will this be faster than Memcached? If not why?
  • 这会比Memcached快吗?如果不是为什么?
  • Are there any known implementations of this?
  • 有任何已知的实现吗?
  • Does Python have anything like var_export() function from PHP?
  • Python有PHP中的var_export()函数吗?

EDIT:

编辑:

As pointed in an answer i can use repr() and this can be benchmarked easily so i have created a simple benchmark:

正如在回答中所指出的,我可以使用repr(),并且可以很容易地对其进行基准测试,因此我创建了一个简单的基准:

https://github.com/fsw/pythonCachesBenchmark

https://github.com/fsw/pythonCachesBenchmark

output of this on my local machine was:

我本地机器上的输出是:

FIRST RUN
get_categories_from_db
6.57282209396
get_categories_from_memcached
(SET CACHE IN 0.000940)
4.88948512077
get_categories_from_pickledfile
(SET CACHE IN 0.000917)
2.87856888771
get_categories_from_pythonsrc
(SET CACHE IN 0.000489)
0.0930788516998
SECOND RUN
get_categories_from_db
6.63035202026
get_categories_from_memcached
4.60877108574
get_categories_from_pickledfile
2.87137699127
get_categories_from_pythonsrc
0.0903170108795

get_categories_from_pythonsrc is simple implementation of PHP's arraycache i was talking about:

get_classies_from_pythonsrc是PHP的arraycache (arraycache)的简单实现:

def get_categories_from_pythonsrc():
    if not os.path.exists('catcache.py'):
        start = time.time()
        f = open( 'catcache.py', 'wb' )
        categories = get_categories_from_db()
        f.write('x = ' + repr(categories))
        f.close()
        print '(SET CACHE IN %f)' % (time.time() - start)
    import catcache
    return catcache.x

this is my simple pickledfile cache implementation:

这是我简单的pickledfile缓存实现:

def get_categories_from_pickledfile():
    path = 'catcache.p'
    if not os.path.exists(path):
        start = time.time()
        pickle.dump( get_categories_from_db(), open( path, 'wb' ) )
        print '(SET CACHE IN %f)' % (time.time() - start)
    return pickle.load(open( path, 'rb' ));

complete source:

完整的资料来源:

https://github.com/fsw/pythonCachesBenchmark/blob/master/test.py

https://github.com/fsw/pythonCachesBenchmark/blob/master/test.py

I will later add "Django's low-level cache APIs" to this benchmark to see what they are about.

稍后,我将向这个基准中添加“Django的低级缓存api”,以了解它们是什么。

So as my intuition suggested caching dictionary in a python .py file is the fastest way i could get (over 30 times faster than cPickle + file)

我的直觉告诉我,在python .py文件中缓存字典是最快的方法(比cPickle +文件快30倍)

As said i am new to Python so probably i am missing something here?

如前所述,我是Python新手,所以我可能在这里漏掉了什么?

If not: why isn't this solution widely used?

如果不是:为什么这个解决方案没有被广泛使用?

2 个解决方案

#1


1  

Python has several solutions that may work here:

Python有几种可能在这里工作的解决方案:

  • Memcached (as you already know),
  • Memcached(如您所知),
  • pickle (as Blender mentioned) - which of course can be used with eg. Memcached,
  • 腌菜(如Blender提到的)-当然可以和eg一起使用。Memcached,
  • several other caching (eg. for local memory) & serialization (eg. simplejson) solutions,
  • 其他缓存(如。用于本地内存)和序列化(例如。simplejson)的解决方案,

In general pickle is very fast (use cPickle if you need more speed) and in Python you do not need anything like var_export() (although you can use repr() on variables to have their valid literal, if they are of one of primitive types). pickle in Python is more similar to serialize in PHP.

一般来说,pickle速度非常快(如果需要更高的速度,请使用cPickle),在Python中,您不需要任何东西,比如var_export()(尽管如果变量是原始类型之一,您可以对变量使用repr()来获取它们的有效文字)。Python中的pickle更类似于PHP中的序列化。

Your question is not very specific, but the above should give you some insight. Also you need to take into account that PHP and Python have different philosophies, so solutions to the same problems may look differently. In this specific case pickle module should solve your issues.

你的问题不是很具体,但上面的内容应该会给你一些启发。此外,您还需要考虑到PHP和Python有不同的哲学,因此相同问题的解决方案看起来可能不同。在这个特定的案例中,pickle模块应该可以解决您的问题。

#2


1  

There is one other approach. You could use some ASYNC server like gevent and have live objects in some global namespace.
I do not know how familiar you are with such workflow, it is different from apache/php "each request starts bare".

还有一种方法。您可以使用一些异步服务器,如gevent,并在某些全局名称空间中具有活动对象。我不知道您对这样的工作流程有多熟悉,它与apache/php“每个请求都开始裸”是不同的。

Basically, you load your application, and use it to serve requests. It is alive all time and is sleeping if there are no requests. Once you load "categories" from database, store them in global variable or some module.

基本上,加载应用程序并使用它来服务请求。它一直都是活的,如果没有请求,它就会睡觉。从数据库加载“categories”之后,将它们存储在全局变量或某个模块中。

Let's say that you launch WSGI instance and give it name app. Afterwards, you can just have dictionary in that app and store cache there. So no serialization, network protocols, all data is directly available in RAM.

假设你启动了WSGI实例并给它命名应用程序,之后,你可以在那个应用程序中有字典并在那里存储缓存。所以没有串行化,网络协议,所有数据都可以直接在RAM中使用。

EDIT1: DO NOT USE globals often, this is just one of very rare cases where it is OK to store something in global namespace (in my opinion).

EDIT1:不要经常使用全局变量,这只是极少数情况下可以在全局名称空间中存储某些东西的情况(在我看来)。

#1


1  

Python has several solutions that may work here:

Python有几种可能在这里工作的解决方案:

  • Memcached (as you already know),
  • Memcached(如您所知),
  • pickle (as Blender mentioned) - which of course can be used with eg. Memcached,
  • 腌菜(如Blender提到的)-当然可以和eg一起使用。Memcached,
  • several other caching (eg. for local memory) & serialization (eg. simplejson) solutions,
  • 其他缓存(如。用于本地内存)和序列化(例如。simplejson)的解决方案,

In general pickle is very fast (use cPickle if you need more speed) and in Python you do not need anything like var_export() (although you can use repr() on variables to have their valid literal, if they are of one of primitive types). pickle in Python is more similar to serialize in PHP.

一般来说,pickle速度非常快(如果需要更高的速度,请使用cPickle),在Python中,您不需要任何东西,比如var_export()(尽管如果变量是原始类型之一,您可以对变量使用repr()来获取它们的有效文字)。Python中的pickle更类似于PHP中的序列化。

Your question is not very specific, but the above should give you some insight. Also you need to take into account that PHP and Python have different philosophies, so solutions to the same problems may look differently. In this specific case pickle module should solve your issues.

你的问题不是很具体,但上面的内容应该会给你一些启发。此外,您还需要考虑到PHP和Python有不同的哲学,因此相同问题的解决方案看起来可能不同。在这个特定的案例中,pickle模块应该可以解决您的问题。

#2


1  

There is one other approach. You could use some ASYNC server like gevent and have live objects in some global namespace.
I do not know how familiar you are with such workflow, it is different from apache/php "each request starts bare".

还有一种方法。您可以使用一些异步服务器,如gevent,并在某些全局名称空间中具有活动对象。我不知道您对这样的工作流程有多熟悉,它与apache/php“每个请求都开始裸”是不同的。

Basically, you load your application, and use it to serve requests. It is alive all time and is sleeping if there are no requests. Once you load "categories" from database, store them in global variable or some module.

基本上,加载应用程序并使用它来服务请求。它一直都是活的,如果没有请求,它就会睡觉。从数据库加载“categories”之后,将它们存储在全局变量或某个模块中。

Let's say that you launch WSGI instance and give it name app. Afterwards, you can just have dictionary in that app and store cache there. So no serialization, network protocols, all data is directly available in RAM.

假设你启动了WSGI实例并给它命名应用程序,之后,你可以在那个应用程序中有字典并在那里存储缓存。所以没有串行化,网络协议,所有数据都可以直接在RAM中使用。

EDIT1: DO NOT USE globals often, this is just one of very rare cases where it is OK to store something in global namespace (in my opinion).

EDIT1:不要经常使用全局变量,这只是极少数情况下可以在全局名称空间中存储某些东西的情况(在我看来)。