在Linux系统上存储Python数据

I have the need to create a system to store python data structures on a linux system but have concurrent read and write access to the data from multiple programs/daemons/scripts. My first thought is I would create a unix socket that would listen for connections and serve up requested data as pickled python data structures. Any writes by the clients would get synced to disk (maybe in batch, though I don't expect it to be high throughput so just Linux vfs caching would likely be fine). This ensures only a single process reads and writes to the data.

我需要创建一个系统来在Linux系统上存储python数据结构，但是需要对来自多个程序/守护进程/脚本的数据进行并发读写访问。我的第一个想法是，我将创建一个unix套接字，它将监听连接并将所请求的数据作为pickle python数据结构提供。客户端的任何写入都会同步到磁盘（可能是批量生成，但我不认为它是高吞吐量，因此只需Linux vfs缓存就可以了）。这确保只有一个进程读取和写入数据。

The other idea is to just keep the pickled data structure on disk and only allow a single process access through a lockfile or token... This requires all accessing clients to respect the locking mechanism / use the access module.

另一个想法是将pickle数据结构保留在磁盘上，只允许通过锁定文件或令牌进行单个进程访问......这要求所有访问客户端都要遵守锁定机制/使用访问模块。

What am I over looking? SQLite is available, but I'd like to keep this as simple as possible.

我在看什么？ SQLite是可用的，但我希望尽可能简单。

What would you do?

你会怎么做？

5 个解决方案

#1

I would just use SQLite if it's available.

如果它可用，我会使用SQLite。

See this FAQ: http://www.sqlite.org/faq.html#q5 -- SQLite (with pysqlite [0]) should be able handle your concurrency elegantly.

请参阅此常见问题解答：http：//www.sqlite.org/faq.html#q5 - SQLite（使用pysqlite [0]）应该能够优雅地处理您的并发性。

You can keep the data as simple key-value pairs if you like, there's no need to go all BNF on your data.

如果您愿意，可以将数据保存为简单的键值对，无需在数据上使用所有BNF。

[0] http://trac.edgewall.org/wiki/PySqlite

#2

If you want to just store name/value pairs (e.g. filename to pickled data) you can always use Berkley DB (http://code.activestate.com/recipes/189060-using-berkeley-db-database/). If your data is numbers-oriented, you might want to check out PyTables (http://www.pytables.org/moin). If you really want to use sockets (I would generally try to avoid that, since there's a lot of minutia you have to worry about) you may want to look at Twisted Python (good for handling multiple connections via Python with no threading required).

如果您只想存储名称/值对（例如文件名到pickle数据），您可以随时使用Berkley DB（http://code.activestate.com/recipes/189060-using-berkeley-db-database/）。如果您的数据是面向数字的，您可能需要查看PyTables（http://www.pytables.org/moin）。如果你真的想使用套接字（我通常会试图避免这种情况，因为你需要担心很多细节）你可能想看看Twisted Python（适合通过Python处理多个连接而不需要线程）。

#3

I'd use a database. A real one. This is why they exist (well, one of the reasons). Don't reinvent the wheel if you don't have to.

我用的是数据库。一个真实的。这就是它们存在的原因（其中一个原因）。如果你不需要，不要重新发明*。

#4

Leaving backend storage aside (plenty of options here, including ConfigParser, shelf, sqlite and anydbm), the idea with a single process handling storage and others connecting to it may be usable. My first thought for doing that is Pyro (Python remote objects). Sockets, while always available, can get tricky.

将后端存储放在一边（这里有很多选项，包括ConfigParser，shelf，sqlite和anydbm），可以使用单个进程处理存储和其他连接到存储的想法。我首先想到的是Pyro（Python远程对象）。套接字虽然总是可用，但可能会变得棘手。

#5

You could serialize the data structures and store them as values using ConfigParser. If you created your own access lib/module to the access the data, you could do the serialization in the lib so the client code would just send and receive python objects. You could also handle concurrency in the lib.

您可以使用ConfigParser序列化数据结构并将其存储为值。如果你创建了自己的访问lib /模块来访问数据，你可以在lib中进行序列化，这样客户端代码就可以发送和接收python对象。您还可以在lib中处理并发。

#1