在持久存储中存储图形的最佳方式是什么

时间:2022-07-07 15:22:41

I am wondering what the best ways to store graphs in persistent storage are, for later analysis, search, clustering, etc.

我想知道在持久存储中存储图形的最佳方式是什么,以便以后进行分析、搜索、集群等。

I see neo4j being an option, I am curious if there are also other graph databases available. Does anyone have any insights into how larger social networks store their graph based data (or other sites that require the storage of graph like models, e.g. RDF).

我认为neo4j是一个选项,我很好奇是否还有其他的图形数据库可用。对于大型社交网络如何存储基于图形的数据(或其他需要存储像模型这样的图形的站点,例如RDF),有人有任何见解吗?

What about options like Cassandra, or MySQL?

那么Cassandra或者MySQL呢?

4 个解决方案

#1


14  

Graph Databases:

图形数据库:

  1. HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
  2. HyperGraphDB:通用的、可扩展的、可移植的、分布式的、可嵌入的、开源的数据存储机制。
  3. InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
  4. InfoGrid:具有许多附加软件组件的Internet图形数据库,这些组件使得在图形基础上开发可靠的web应用程序变得很容易。
  5. vertexdb: a high performance graph database server that supports automatic garbage collection.
  6. vertexdb:支持自动垃圾收集的高性能图形数据库服务器。

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

来源:http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

图形库:

  1. WebGraph is a framework to study the web graph. From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
  2. WebGraph是一个研究web graph的框架。从他们的页面——“它提供了管理非常大的图形的简单方法,利用了现代的压缩技术。”
  3. Dex is a high performance library to manage very large graphs or networks.
  4. Dex是一个高性能库,用于管理非常大的图形或网络。
  5. This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the technique they use is "memory-mapped I/O, disk-based linear-hashing".
  6. 这篇博文构建了一个愚蠢的快速图形数据库——提供了构建图形数据库的一些指导原则——他们使用的技术是“内存映射I/O,基于磁盘的线性哈希”。

#2


4  

Disclaimer: I am speaking form the graph analysis standpoint.

免责声明:我是站在图表分析的立场上发言的。

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

用于存储图形数据的文件格式有几种:GraphML、GXL和其他几种。但存储通常不是问题。处理图形而不完全将它们加载到RAM中是很棘手的部分。

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

RDF模型太通用了,不能进行严肃的图形分析。如果你不介意你的分析太慢,也不介意自己编写算法,那就使用现有的图形数据库——请参阅*。

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

对于真正的分析,使用现有的图分析库(如SNAP或see This question)将所有数据加载到RAM中。

#3


2  

There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).

这里没有绝对正确的答案;有各种各样的选择,其中的选择严重取决于您的需要。有了大规模的检索/遍历(例如社交网络和类似的后端),你很快就会遇到随机I/O瓶颈;我相信在RAM中存储您的图形是目前唯一的实际操作过程。不太关注延迟的应用程序有很多选择,包括neo4j(具有商业风格的开源)和寓言图(带有限量免费版本的商业)。

At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.

在Delver,我们最终实现了自己的非规范化数据模型(本质上是一个表示图的邻接列表),在GigaSpaces之上的RAM中(在这个表示中可以找到一些信息),使用定制的地图减少代码进行查询和数据分析。如果你走这条路,Cassandra似乎是一个可行的开源平台。

#4


0  

You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)

你可以看看InfiniteGraph (http://www.infinitegraph.com/)

If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable. Let us know if you have a real project, we could help you figure out you graph requirements. Thanks, Warren wdavidson@objectivity.com

如果这是用于商业用途,那么你会看到它的目标网站将有更大的图形。社交网站建立了自定义的解决方案,这在当时是行之有效的。但它们是内部解决方案比使用无穷小图更有局限性。像Cassandra或MySQL这样的产品并不是为多对多问题集而设计的,你能做到吗?当然,但是它是大量手写的代码,而且不具有可扩展性。如果您有一个真正的项目,请告诉我们,我们可以帮助您确定图表需求。谢谢,沃伦wdavidson@objectivity.com

#1


14  

Graph Databases:

图形数据库:

  1. HyperGraphDB: a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism.
  2. HyperGraphDB:通用的、可扩展的、可移植的、分布式的、可嵌入的、开源的数据存储机制。
  3. InfoGrid: an Internet Graph Database with a many additional software components that make the development of REST-ful web applications on a graph foundation easy.
  4. InfoGrid:具有许多附加软件组件的Internet图形数据库,这些组件使得在图形基础上开发可靠的web应用程序变得很容易。
  5. vertexdb: a high performance graph database server that supports automatic garbage collection.
  6. vertexdb:支持自动垃圾收集的高性能图形数据库服务器。

Source: http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

来源:http://nosql.mypopescu.com/post/498705278/quick-review-of-existing-graph-databases

Graph Libraries:

图形库:

  1. WebGraph is a framework to study the web graph. From their page - "It provides simple ways to manage very large graphs, exploiting modern compression techniques."
  2. WebGraph是一个研究web graph的框架。从他们的页面——“它提供了管理非常大的图形的简单方法,利用了现代的压缩技术。”
  3. Dex is a high performance library to manage very large graphs or networks.
  4. Dex是一个高性能库,用于管理非常大的图形或网络。
  5. This blog post - On Building a Stupidly Fast Graph Database - provides some guidelines on building a graph database - the technique they use is "memory-mapped I/O, disk-based linear-hashing".
  6. 这篇博文构建了一个愚蠢的快速图形数据库——提供了构建图形数据库的一些指导原则——他们使用的技术是“内存映射I/O,基于磁盘的线性哈希”。

#2


4  

Disclaimer: I am speaking form the graph analysis standpoint.

免责声明:我是站在图表分析的立场上发言的。

There are several file formats for storing graph data: GraphML, GXL and several others. But storage usually is not a problem. Working with the graphs without fully loading them into RAM is the tricky part.

用于存储图形数据的文件格式有几种:GraphML、GXL和其他几种。但存储通常不是问题。处理图形而不完全将它们加载到RAM中是很棘手的部分。

The RDF model is too generic to do serious graph analysis stuff. If you don't mind your analysis being slow and programming the algorithms yourself, go with the existing graph databases - see wikipedia on this.

RDF模型太通用了,不能进行严肃的图形分析。如果你不介意你的分析太慢,也不介意自己编写算法,那就使用现有的图形数据库——请参阅*。

For real analysis, load all data into RAM using existing graph analysis libraries, like SNAP or see This question.

对于真正的分析,使用现有的图分析库(如SNAP或see This question)将所有数据加载到RAM中。

#3


2  

There is no absolutely correct answer here; there is a large variety of options, the choice of which seriously depends on your needs. With large-scale retrievals/traversals (e.g. social networks and similar back-ends) you're quickly going to run into the random I/O bottleneck; I believe storing your graph in RAM is currently the only practical course of action. Less latency-sensitive applications have quite a wide variety of options, including neo4j (open source with a commercial flavor) and Allegrograph (commercial with a limited free edition).

这里没有绝对正确的答案;有各种各样的选择,其中的选择严重取决于您的需要。有了大规模的检索/遍历(例如社交网络和类似的后端),你很快就会遇到随机I/O瓶颈;我相信在RAM中存储您的图形是目前唯一的实际操作过程。不太关注延迟的应用程序有很多选择,包括neo4j(具有商业风格的开源)和寓言图(带有限量免费版本的商业)。

At Delver we ended up implementing our own denormalized data model (essentially an adjacency list to represent the graph) in RAM on top of GigaSpaces (some info can be found in this presentation), with custom map-reduce code for queries and data analysis. If you go this route, Cassandra seems to be a viable open source platform to build on.

在Delver,我们最终实现了自己的非规范化数据模型(本质上是一个表示图的邻接列表),在GigaSpaces之上的RAM中(在这个表示中可以找到一些信息),使用定制的地图减少代码进行查询和数据分析。如果你走这条路,Cassandra似乎是一个可行的开源平台。

#4


0  

You could look at InfiniteGraph, which will be released for beta very soon (http://www.infinitegraph.com/)

你可以看看InfiniteGraph (http://www.infinitegraph.com/)

If this is for commercial use then you'll see it's targeted towards sites that will have larger graphs. The social networking sites built custom solutions, which worked for them at the time. But they're in-house solutions are more limiting than using something like InfiniteGraph. Products like Cassandra or MySQL weren't designed for this many-to-many problem set. Can you do it? Sure, but it's a lot of hand-written coding, and not scalable. Let us know if you have a real project, we could help you figure out you graph requirements. Thanks, Warren wdavidson@objectivity.com

如果这是用于商业用途,那么你会看到它的目标网站将有更大的图形。社交网站建立了自定义的解决方案,这在当时是行之有效的。但它们是内部解决方案比使用无穷小图更有局限性。像Cassandra或MySQL这样的产品并不是为多对多问题集而设计的,你能做到吗?当然,但是它是大量手写的代码,而且不具有可扩展性。如果您有一个真正的项目,请告诉我们,我们可以帮助您确定图表需求。谢谢,沃伦wdavidson@objectivity.com