What I think I'm looking for is a no-SQL, library-embedded, on disk (ie not in-memory) database, thats accessible from java (and preferably runs inside my instance of the JVM). That's not really much of a database, and I'm tempted to roll-my-own. Basically I'm looking for the "should we keep this in memory or put it on disk" portion of a database.
我认为我正在寻找的是一个无SQL,嵌入式磁盘,在磁盘上(即非内存中)数据库,可从java访问(最好在我的JVM实例中运行)。这不是一个真正的数据库,我很想自己动手。基本上我正在寻找“我们应该将它保存在内存中还是放在磁盘上”的数据库部分。
Our model has grown to several gigabytes. Right now this is all done in memory, meaning we're pushing the JVM for upward of several gigabytes. It's currently all stored in a flat XML file, serialized and deserialized with xstream and compressed with Java'a built in gzip libraries. That's worked well when our model stays under 100MB, but now that its larger than that its becoming a problem.
我们的模型已经增长到几千兆字节。现在这都是在内存中完成的,这意味着我们正在推动JVM达到几千兆字节。它目前全部存储在一个扁平的XML文件中,使用xstream进行序列化和反序列化,并使用内置的gzip库中的Java进行压缩。当我们的模型保持在100MB以下时,这种方法运行良好,但现在它的大于它成为一个问题。
loosely speaking that model can be broken down as
松散地说,模型可以分解为
- Project
- configuration component (directed-acyclic-graph), not at all database friendly
- 配置组件(有向非循环图),并不是所有数据库友好的
- a list of a dozen "experiment" structures
- each containing a list of about a dozen "run-model" structures.
- each run-model contains hundreds of megs of data. Once written they are never edited.
- 每个运行模型包含数百兆的数据。一旦编写,他们永远不会被编辑
- 每个都包含大约十二个“运行模型”结构的列表。每个运行模型包含数百兆的数据。一旦编写,他们永远不会被编辑
- each containing a list of about a dozen "run-model" structures.
- 一个包含十几个“运行模型”结构列表的“实验”结构列表。每个运行模型包含数百兆的数据。一旦编写,他们永远不会被编辑
- 项目配置组件(有向非循环图),不是所有数据库友好的十几个“实验”结构的列表,每个结构包含大约十二个“运行模型”结构的列表。每个运行模型包含数百兆的数据。一旦编写,他们永远不会被编辑
What I'd like to do is have something that conforms to a map interface, of guid -> run-model. This mini-database would keep a flat table of these objects. On our experiment model, we would replace the list of run-models with a list of guids, and add, at the application layer, a get call to this map, which would pull it off the disk and into memory.
我想做的是有一些符合guid - > run-model的地图界面的东西。这个迷你数据库将保留这些对象的平面表。在我们的实验模型中,我们将使用guid列表替换运行模型列表,并在应用程序层添加对此映射的get调用,这将把它从磁盘中拉出并进入内存。
That means we can keep configuration of our program in XML (which I'm very happy with) and keep a table of the big data in a DBMS that will keep us from consuming multi-GB of memory. On program start and exit I could then load and unload the two portions of our model (the config section in XML, and the run-models in the database format) from an archiving format.
这意味着我们可以在XML中保持程序的配置(我非常满意)并在DBMS中保留一个大数据表,这将使我们不会消耗多GB的内存。在程序启动和退出时,我可以从存档格式加载和卸载我们模型的两个部分(XML中的配置部分和数据库格式的运行模型)。
I'm sort've feeling gung-ho about this, and think that I could probably implement it with some of X-Stream's XML inspection strategies and a custom map implementation, but something a voice in the back of my head is telling me I should find a library to do it instead.
我对这一点感觉很好,并且认为我可以用一些X-Stream的XML检查策略和自定义地图实现来实现它,但是我脑子里的一个声音告诉我我应该找到一个库来代替。
Should I roll my own or is there a database that's small enough to fit this bill?
我应该自己动手还是有一个足够小的数据库来满足这个要求?
Thanks guys,
多谢你们,
-Geoff
-Geoff
2 个解决方案
#1
13
http://www.mapdb.org/
Also take a look at this question: Alternative to BerkeleyDB?
另请看一下这个问题:BerkeleyDB的替代方案?
#2
2
Since MapDB is a possible solution for your problem, Chronicle Map is also worth consideration. It's an embeddable Java key-value store, optionally persistent, offering a very similar programming model to MapDB: it also via the vanilla java.util.Map
interface and transparent serialization of keys and values.
由于MapDB是您问题的可能解决方案,因此Chronicle Map也值得考虑。它是一个可嵌入的Java键值存储,可选地是持久的,为MapDB提供了一个非常相似的编程模型:它还通过vanilla java.util.Map接口以及键和值的透明序列化。
The major difference is that according to third-party benchmarks, Chronicle Map is times faster than MapDB.
主要区别在于根据第三方基准测试,Chronicle Map比MapDB快。
Regarding stability, no bugs were reported about the Chronicle Map data storage for months now, while it is in active use in many projects.
关于稳定性,几个月来没有关于Chronicle Map数据存储的报告错误,而它在许多项目中都在积极使用。
Disclaimer: I'm the developer of Chronicle Map.
免责声明:我是Chronicle Map的开发者。
#1
13
http://www.mapdb.org/
Also take a look at this question: Alternative to BerkeleyDB?
另请看一下这个问题:BerkeleyDB的替代方案?
#2
2
Since MapDB is a possible solution for your problem, Chronicle Map is also worth consideration. It's an embeddable Java key-value store, optionally persistent, offering a very similar programming model to MapDB: it also via the vanilla java.util.Map
interface and transparent serialization of keys and values.
由于MapDB是您问题的可能解决方案,因此Chronicle Map也值得考虑。它是一个可嵌入的Java键值存储,可选地是持久的,为MapDB提供了一个非常相似的编程模型:它还通过vanilla java.util.Map接口以及键和值的透明序列化。
The major difference is that according to third-party benchmarks, Chronicle Map is times faster than MapDB.
主要区别在于根据第三方基准测试,Chronicle Map比MapDB快。
Regarding stability, no bugs were reported about the Chronicle Map data storage for months now, while it is in active use in many projects.
关于稳定性,几个月来没有关于Chronicle Map数据存储的报告错误,而它在许多项目中都在积极使用。
Disclaimer: I'm the developer of Chronicle Map.
免责声明:我是Chronicle Map的开发者。