将神经网络存储在数据库中的最佳实践

时间:2021-06-29 17:01:01

I am developing an application that uses a neural network. Currently I am looking at either trying to put it into a relational database based on SQL (probably SQL server) or a graph database.

我正在开发一个使用神经网络的应用程序。目前我正在考虑将其置于基于SQL(可能是SQL服务器)或图形数据库的关系数据库中。

From a performance viewpoint, the neural net will be very large.

从性能的角度来看,神经网络将非常庞大。

My questions:

我的问题:

  1. Do relational databases suffer a performance hit when dealing with a neural net in comparison to graph databases?
  2. 与图形数据库相比,关系数据库在处理神经网络时会遭受性能损失吗?
  3. What graph-database technology would be best suited to dealing with a large neural net?
  4. 什么图形数据库技术最适合处理大型神经网络?
  5. Can a geospatial database such as PostGIS be used to represent a neural net efficiently?
  6. 可以使用PostGIS等地理空间数据库有效地表示神经网络吗?

2 个解决方案

#1


5  

That depends on the intent of progress on the model.

这取决于模型进展的意图。

  1. Do you have a fixated idea on an immutable structure of the network? Like a Kohonnen map. Or an off-the-shelf model.
  2. 你对网络的不可变结构有一个固定的想法吗?像Kohonnen地图。或现成的模型。
  3. Do you have several relationship structures you need to test out, so that you wish to be able flip a switch to alternate between various structures.
  4. 你有几个关系结构需要测试,所以你希望能够翻转开关在各种结构之间交替。
  5. Does your model treat the nodes as fluid automatons, free to seek their own neighbours? Where each automaton develops unique characteristic values of a common set of parameters, and you need to analyse how those values affect their "choice" of neighbours.
  6. 您的模型是否将节点视为流体自动机,可以*寻找自己的邻居?每个自动机产生一组共同参数的独特特征值,并且您需要分析这些值如何影响它们对邻居的“选择”。
  7. Do you have a fixed set of parameters for a fixed number of types/classes of nodes? Or is a node expected to develop a unique range of attributes and relationships?
  8. 对于固定数量的类型/类节点,您是否有一组固定的参数?或者是一个节点,期望开发一系列独特的属性和关系?
  9. Do you have frequent need to access each node, especially those embedded deep in the network layers, to analyse and correlate them?
  10. 您是否经常需要访问每个节点,尤其是那些深入网络层的节点,以分析和关联它们?
  11. Is your network perceivable as, or quantizable into, set of state-machines?
  12. 您的网络是否可以被视为状态机的集合或可量化?

Disclaimer
First of all, I need to disclaim that I am familiar only with Kohonnen maps. (So, I admit having been derided for Kohonnen as being only entry-level of anything barely neural-network.) The above questions are the consequence of personal mental exploits I've had over the years fantasizing after random and lowly-educated reading of various neural shemes.

免责声明首先,我要声明我只熟悉Kohonnen地图。 (所以,我承认曾被Kohonnen嘲笑为只有神经网络的入门级别。)以上问题是我多年来随意和低教育阅读后的幻想的个人心理攻击的结果。各种神经系统。

Category vs Parameter vs Attribute
Can we class vehicles by the number of wheels or tonnage? Should wheel-quantity or tonnage be attributes, parameters or category-characteristics.

类别与参数对比属性我们可以按车轮数量或吨位分类车辆吗?轮量或吨位应该是属性,参数还是类别特征。

Understanding this debate is a crucial step in structuring your repository. This debate is especially relevant to disease and patient vectors. I have seen patient information relational schemata, designed by medical experts but obviously without much training in information science, that presume a common set of parameters for every patient. With thousands of columns, mostly unused, for each patient record. And when they exceed column limits for a table, they create a new table with yet thousands more of sparsely used columns.

理解这一争论是构建存储库的关键步骤。这场辩论与疾病和患者病媒特别相关。我见过患者信息关系图式,由医学专家设计,但显然没有太多的信息科学培训,为每位患者设定了一套共同的参数。每个患者记录中有数千列,大多数未使用。当它们超出表的列限制时,它们会创建一个新表,其中包含数千个稀疏使用的列。

  • Type 1: All nodes have a common set of parameters and hence a node can be modeled into a table with a known number of columns.

    类型1:所有节点都有一组公共参数,因此可以将节点建模为具有已知列数的表。

  • Type 2: There are various classes of nodes. There is a fixed number of classes of nodes. Each class has a fixed set of parameters. Therefore, there is a characteristic table for each class of node.

    类型2:有各种类型的节点。有固定数量的节点类。每个类都有一组固定的参数。因此,每个节点类都有一个特征表。

  • Type 3: There is no intent to pigeon-hole the nodes. Each node is free to develop and acquire its own unique set of attributes.

    类型3:没有意图对节点进行打孔。每个节点都可以*开发并获得自己独特的属性集。

  • Type 4: There are fixed number of classes of nodes. Each node within a class is free to develop and acquire its own unique set of attributes. Each class has a restricted set of attributes a node is allowed to acquire.

    类型4:有固定数量的节点类。类中的每个节点都可以*开发并获取自己独特的属性集。每个类都有一组允许获取节点的有限属性。

Read on EAV model to understand the issue of parameters vs attributes. In an EAV table, a node needs only three characterising columns:

阅读EAV模型以了解参数与属性的问题。在EAV表中,节点只需要三个特征列:

  • node id
  • 节点ID
  • attribute name
  • 属性名称
  • attribute value
  • 属性值

However, under constraints of technology, an attribute could be number, string, enumerable or category. Therefore, there would be four more attribute tables, one for each value type, plus the node table:

但是,在技术限制下,属性可以是数字,字符串,可枚举或类别。因此,将有四个属性表,每个值类型一个,加上节点表:

  • node id
  • 节点ID
  • attriute type
  • attriute类型
  • attribute name
  • 属性名称
  • attribute value
  • 属性值

Sequential/linked access versus hashed/direct-address access
Do you have to access individual nodes directly rather than traversing the structural tree to get to a node quickly?

顺序/链接访问与散列/直接地址访问您是否必须直接访问单个节点而不是遍历结构树以快速访问节点?

Do you need to find a list of nodes that have acquired a particular trait (set of attributes) regardless of where they sit topologically on the network? Do you need to perform classification (aka principal component analysis) on the nodes of your network?

您是否需要找到已获取特定特征(属性集)的节点列表,而不管它们在网络拓扑上的位置?您是否需要在网络节点上执行分类(也称主成分分析)?

State-machine
Do you wish to perceive the regions of your network as a collection of state-machines? State machines are very useful quantization entities. State-machine quatization helps you to form empirical entities over a range of nodes based on neighbourhood similarities and relationships.

状态机您是否希望将网络区域视为状态机的集合?状态机是非常有用的量化实体。状态机定量可以帮助您根据邻域相似性和关系在一系列节点上形成经验实体。

Instead of trying to understand and track individual behaviour of millions of nodes, why not clump them into regions of similarity. And track the state-machine flow of those regions.

而不是试图理解和跟踪数百万个节点的个体行为,为什么不将它们聚集成相似的区域。并跟踪这些地区的状态机流量。

Conclusion

结论

This is my recommendation. You should start initially using a totally relational database. The reason is that relational database and the associated SQL provides information with a very liberal view of relationship. With SQL on a relational model, you could inquire or correlate relationships that you did not know exist.

这是我的建议。您应该从最初使用完全关系数据库开始。原因是关系数据库和关联的SQL提供了非常宽松的关系视图信息。使用关系模型上的SQL,您可以查询或关联您不知道存在的关系。

As your experiments progress and you might find certain relationship modeling more suitable to a network-graph repository, you should then move those parts of the schema to such suitable repository.

随着实验的进展,您可能会发现某些关系建模更适合网络图形存储库,您应该将模式的这些部分移动到这样合适的存储库中。

In the final state of affairs. I would maintain a dual mode information repo. You maintain a relational repo to keep track of nodes and their attributes. So you store the dynamically mutating structure in a network-graph repository but each node refers to a node id in a relational database. Where the relational database allows you to query nodes based on attributes and their values. For example,

在最后的状态。我会维护一个双模式信息回购。您维护一个关系回购以跟踪节点及其属性。因此,您将动态变异结构存储在网络图库中,但每个节点都引用关系数据库中的节点ID。关系数据库允许您根据属性及其值查询节点。例如,

SELECT id FROM Nodes a, NumericAttributes b
WHERE a.attributeName = $name
  AND b.value WItHIN $range
  AND a.id = b.id

I am thinking, perhaps, hadoop could be used instead of a traditional network-graph database. But, I don't know how well hadoop adapts to dynamically changing relationships. My understanding is that hadoop is good for write-once read-by-many. However, a dynamic neural network may not perform well in frequent relationship changes. Whereas, a relational table modeling network relationships is not efficient.

我想,也许可以使用hadoop而不是传统的网络图数据库。但是,我不知道hadoop如何适应动态变化的关系。我的理解是,hadoop对于一次性多次写入是有益的。然而,动态神经网络在频繁的关系变化中可能表现不佳。然而,建模网络关系的关系表效率不高。

Still, I believe I have only exposed questions you need to consider rather than providing you with a definite answer, especially with a rusty knowledge on many concepts.

尽管如此,我相信我只是暴露了你需要考虑的问题,而不是给你一个明确的答案,特别是对许多概念的生锈知识。

#2


0  

Trees can be stored in a table by using self-referencing foreign keys. I'm assuming the only two things that need to be stored are topology and the weights; both of these can be stored in a flattened tree structure. Of course, this can require a lot of recursive selects, which depending on your RDBMS may be a pain to implement natively (thus requiring many SQL queries to achieve). I cannot comment on the comparison, but hopefully that helps with the relational point of view :)

可以使用自引用外键将树存储在表中。我假设只需要存储的两件事就是拓扑和权重;这两者都可以存储在扁平的树结构中。当然,这可能需要大量的递归选择,这取决于你的RDBMS本身可能很难实现(因此需要很多SQL查询来实现)。我不能评论比较,但希望这有助于关系的观点:)

#1


5  

That depends on the intent of progress on the model.

这取决于模型进展的意图。

  1. Do you have a fixated idea on an immutable structure of the network? Like a Kohonnen map. Or an off-the-shelf model.
  2. 你对网络的不可变结构有一个固定的想法吗?像Kohonnen地图。或现成的模型。
  3. Do you have several relationship structures you need to test out, so that you wish to be able flip a switch to alternate between various structures.
  4. 你有几个关系结构需要测试,所以你希望能够翻转开关在各种结构之间交替。
  5. Does your model treat the nodes as fluid automatons, free to seek their own neighbours? Where each automaton develops unique characteristic values of a common set of parameters, and you need to analyse how those values affect their "choice" of neighbours.
  6. 您的模型是否将节点视为流体自动机,可以*寻找自己的邻居?每个自动机产生一组共同参数的独特特征值,并且您需要分析这些值如何影响它们对邻居的“选择”。
  7. Do you have a fixed set of parameters for a fixed number of types/classes of nodes? Or is a node expected to develop a unique range of attributes and relationships?
  8. 对于固定数量的类型/类节点,您是否有一组固定的参数?或者是一个节点,期望开发一系列独特的属性和关系?
  9. Do you have frequent need to access each node, especially those embedded deep in the network layers, to analyse and correlate them?
  10. 您是否经常需要访问每个节点,尤其是那些深入网络层的节点,以分析和关联它们?
  11. Is your network perceivable as, or quantizable into, set of state-machines?
  12. 您的网络是否可以被视为状态机的集合或可量化?

Disclaimer
First of all, I need to disclaim that I am familiar only with Kohonnen maps. (So, I admit having been derided for Kohonnen as being only entry-level of anything barely neural-network.) The above questions are the consequence of personal mental exploits I've had over the years fantasizing after random and lowly-educated reading of various neural shemes.

免责声明首先,我要声明我只熟悉Kohonnen地图。 (所以,我承认曾被Kohonnen嘲笑为只有神经网络的入门级别。)以上问题是我多年来随意和低教育阅读后的幻想的个人心理攻击的结果。各种神经系统。

Category vs Parameter vs Attribute
Can we class vehicles by the number of wheels or tonnage? Should wheel-quantity or tonnage be attributes, parameters or category-characteristics.

类别与参数对比属性我们可以按车轮数量或吨位分类车辆吗?轮量或吨位应该是属性,参数还是类别特征。

Understanding this debate is a crucial step in structuring your repository. This debate is especially relevant to disease and patient vectors. I have seen patient information relational schemata, designed by medical experts but obviously without much training in information science, that presume a common set of parameters for every patient. With thousands of columns, mostly unused, for each patient record. And when they exceed column limits for a table, they create a new table with yet thousands more of sparsely used columns.

理解这一争论是构建存储库的关键步骤。这场辩论与疾病和患者病媒特别相关。我见过患者信息关系图式,由医学专家设计,但显然没有太多的信息科学培训,为每位患者设定了一套共同的参数。每个患者记录中有数千列,大多数未使用。当它们超出表的列限制时,它们会创建一个新表,其中包含数千个稀疏使用的列。

  • Type 1: All nodes have a common set of parameters and hence a node can be modeled into a table with a known number of columns.

    类型1:所有节点都有一组公共参数,因此可以将节点建模为具有已知列数的表。

  • Type 2: There are various classes of nodes. There is a fixed number of classes of nodes. Each class has a fixed set of parameters. Therefore, there is a characteristic table for each class of node.

    类型2:有各种类型的节点。有固定数量的节点类。每个类都有一组固定的参数。因此,每个节点类都有一个特征表。

  • Type 3: There is no intent to pigeon-hole the nodes. Each node is free to develop and acquire its own unique set of attributes.

    类型3:没有意图对节点进行打孔。每个节点都可以*开发并获得自己独特的属性集。

  • Type 4: There are fixed number of classes of nodes. Each node within a class is free to develop and acquire its own unique set of attributes. Each class has a restricted set of attributes a node is allowed to acquire.

    类型4:有固定数量的节点类。类中的每个节点都可以*开发并获取自己独特的属性集。每个类都有一组允许获取节点的有限属性。

Read on EAV model to understand the issue of parameters vs attributes. In an EAV table, a node needs only three characterising columns:

阅读EAV模型以了解参数与属性的问题。在EAV表中,节点只需要三个特征列:

  • node id
  • 节点ID
  • attribute name
  • 属性名称
  • attribute value
  • 属性值

However, under constraints of technology, an attribute could be number, string, enumerable or category. Therefore, there would be four more attribute tables, one for each value type, plus the node table:

但是,在技术限制下,属性可以是数字,字符串,可枚举或类别。因此,将有四个属性表,每个值类型一个,加上节点表:

  • node id
  • 节点ID
  • attriute type
  • attriute类型
  • attribute name
  • 属性名称
  • attribute value
  • 属性值

Sequential/linked access versus hashed/direct-address access
Do you have to access individual nodes directly rather than traversing the structural tree to get to a node quickly?

顺序/链接访问与散列/直接地址访问您是否必须直接访问单个节点而不是遍历结构树以快速访问节点?

Do you need to find a list of nodes that have acquired a particular trait (set of attributes) regardless of where they sit topologically on the network? Do you need to perform classification (aka principal component analysis) on the nodes of your network?

您是否需要找到已获取特定特征(属性集)的节点列表,而不管它们在网络拓扑上的位置?您是否需要在网络节点上执行分类(也称主成分分析)?

State-machine
Do you wish to perceive the regions of your network as a collection of state-machines? State machines are very useful quantization entities. State-machine quatization helps you to form empirical entities over a range of nodes based on neighbourhood similarities and relationships.

状态机您是否希望将网络区域视为状态机的集合?状态机是非常有用的量化实体。状态机定量可以帮助您根据邻域相似性和关系在一系列节点上形成经验实体。

Instead of trying to understand and track individual behaviour of millions of nodes, why not clump them into regions of similarity. And track the state-machine flow of those regions.

而不是试图理解和跟踪数百万个节点的个体行为,为什么不将它们聚集成相似的区域。并跟踪这些地区的状态机流量。

Conclusion

结论

This is my recommendation. You should start initially using a totally relational database. The reason is that relational database and the associated SQL provides information with a very liberal view of relationship. With SQL on a relational model, you could inquire or correlate relationships that you did not know exist.

这是我的建议。您应该从最初使用完全关系数据库开始。原因是关系数据库和关联的SQL提供了非常宽松的关系视图信息。使用关系模型上的SQL,您可以查询或关联您不知道存在的关系。

As your experiments progress and you might find certain relationship modeling more suitable to a network-graph repository, you should then move those parts of the schema to such suitable repository.

随着实验的进展,您可能会发现某些关系建模更适合网络图形存储库,您应该将模式的这些部分移动到这样合适的存储库中。

In the final state of affairs. I would maintain a dual mode information repo. You maintain a relational repo to keep track of nodes and their attributes. So you store the dynamically mutating structure in a network-graph repository but each node refers to a node id in a relational database. Where the relational database allows you to query nodes based on attributes and their values. For example,

在最后的状态。我会维护一个双模式信息回购。您维护一个关系回购以跟踪节点及其属性。因此,您将动态变异结构存储在网络图库中,但每个节点都引用关系数据库中的节点ID。关系数据库允许您根据属性及其值查询节点。例如,

SELECT id FROM Nodes a, NumericAttributes b
WHERE a.attributeName = $name
  AND b.value WItHIN $range
  AND a.id = b.id

I am thinking, perhaps, hadoop could be used instead of a traditional network-graph database. But, I don't know how well hadoop adapts to dynamically changing relationships. My understanding is that hadoop is good for write-once read-by-many. However, a dynamic neural network may not perform well in frequent relationship changes. Whereas, a relational table modeling network relationships is not efficient.

我想,也许可以使用hadoop而不是传统的网络图数据库。但是,我不知道hadoop如何适应动态变化的关系。我的理解是,hadoop对于一次性多次写入是有益的。然而,动态神经网络在频繁的关系变化中可能表现不佳。然而,建模网络关系的关系表效率不高。

Still, I believe I have only exposed questions you need to consider rather than providing you with a definite answer, especially with a rusty knowledge on many concepts.

尽管如此,我相信我只是暴露了你需要考虑的问题,而不是给你一个明确的答案,特别是对许多概念的生锈知识。

#2


0  

Trees can be stored in a table by using self-referencing foreign keys. I'm assuming the only two things that need to be stored are topology and the weights; both of these can be stored in a flattened tree structure. Of course, this can require a lot of recursive selects, which depending on your RDBMS may be a pain to implement natively (thus requiring many SQL queries to achieve). I cannot comment on the comparison, but hopefully that helps with the relational point of view :)

可以使用自引用外键将树存储在表中。我假设只需要存储的两件事就是拓扑和权重;这两者都可以存储在扁平的树结构中。当然,这可能需要大量的递归选择,这取决于你的RDBMS本身可能很难实现(因此需要很多SQL查询来实现)。我不能评论比较,但希望这有助于关系的观点:)