Curious what is recognized as a solid algorithm/approach for judging the strength of a directed acyclic graph - particularly the strength of certain nodes. The main question I have about this can be boiled down to the following two graphs:
好奇被认为是一种用于判断有向无环图强度的可靠算法/方法 - 特别是某些节点的强度。我对此的主要问题可以归结为以下两个图:
(if graph doesn't show up, click here or visit this link: http://www.flickr.com/photos/86396568@N00/2893003041/
(如果图表没有显示,请点击此处或访问此链接:http://www.flickr.com/photos/86396568@N00/2893003041/
To my eyes, A is in a stronger position than a. I am judging strength by how strong a node can remain if a link is knocked out. I'd call the first a thin "stilt", and the second a thick "stalk".
对我而言,A处于比...更强的位置。如果链接被淘汰,节点可以保持多强,我判断力量。我称之为第一个薄的“高跷”,第二个是厚厚的“秆”。
Here are the approaches I've considered so far for judging the strength of a node:
以下是我迄今为止用于判断节点强度的方法:
1) Counting the number of nodes below, subtracting the number of nodes above.
1)计算下面的节点数,减去上面的节点数。
- A=7, a=7, B=5, b=1
A = 7,a = 7,B = 5,b = 1
2) Counting the number of complete paths (to termination) for each node, summing their lengths.
2)计算每个节点的完整路径(到终止)的数量,将它们的长度相加。
- A=17 (1+5+5+5+1), B=12 (4+4+4), a=9 (3+3+3), b=2
- This make the stilt stronger, rather than the stalk.
A = 17(1 + 5 + 5 + 5 + 1),B = 12(4 + 4 + 4),a = 9(3 + 3 + 3),b = 2
这使得高跷更强,而不是茎。
3) Counting every possible path, treating every node as a destination.
3)计算每个可能的路径,将每个节点视为目的地。
- A=9 (A->B, A->C, A->D, A->E, A->G, 2xA->F, 2xA->H), B=6, a=9, b=2
A = 9(A-> B,A-> C,A-> D,A-> E,A-> G,2xA-> F,2xA-> H),B = 6,a = 9,b = 2
3 seems like the best option so far, but is there one that is better, generalized for DAGs? Is this something that has a known best approach? The principles are to use as much information in the graph as possible, and for the solution to be explainable in an intuitive way.
到目前为止,3似乎是最好的选择,但有没有一个更好的,广泛用于DAG?这是一种具有已知最佳方法的东西吗?原则是尽可能多地使用图表中的信息,并以直观的方式解释解决方案。
4 个解决方案
#1
2
This really depends on what you mean by strength. Because of the versatility of the DAG in representing information, you could be discussing anything from a multiple-outcome control flow to argument clauses of non-adverbial discourse connectives or even the full set of dependencies between different words in a sentence.
这实际上取决于你的力量意味着什么。由于DAG在表示信息方面的多功能性,您可以讨论从多结果控制流程到非状语话语连接词的参数从句甚至句子中不同词之间的完整依赖关系。
All of these would view node strength in a different way. For instance, a control flow may consider the node with the most amount of outcomes (therefore the most outward arcs) as the strongest, because it has the most power over the eventual outcome of the diagram. In discourse, the strongest node is the discourse connective, but it is found in speech and text after the first connective and before the third. Selection of the lexical "head" of a sentence is not directly related to the amount of arcs directly interacting with it.
所有这些都将以不同的方式查看节点强度。例如,控制流可以认为具有最多结果的节点(因此最外向弧)是最强的,因为它对图的最终结果具有最大的功率。在话语中,最强的节点是话语连词,但是在第一个连词和第三连词之后的语音和文本中找到它。选择句子的词汇“头”与直接与之相互作用的弧的数量没有直接关系。
What I'm getting at is that there is no real panacea for computing "strength" in this data type because of the polysemous character of the term "strength" and the kind of data a DAG fits. I would say that in a machine learning problem, all three of these approaches would be very informative in selecting particular types of nodes for a classification or ranking problem, but in the end, the answer depends upon the data type's practical application.
我所得到的是,由于术语“强度”的多义性和DAG适合的数据种类,在这种数据类型中计算“强度”并没有真正的灵丹妙药。我要说的是,在机器学习问题中,所有这三种方法在为分类或排序问题选择特定类型的节点时都会提供非常丰富的信息,但最终,答案取决于数据类型的实际应用。
#2
1
I think you need to define "strength" more clearly. Is this related to the maximum flow problem?
我认为你需要更清楚地定义“力量”。这与最大流量问题有关吗?
#3
0
Okay, the practical application is sports teams. Each node is a team, each link is a victory over another team. Assume there are no circular victory paths, like A->B->C->A. The objective is to get a power ranking that doesn't conflict with the graph, and ranks the teams in order of a team's strength. The site in question is my (somewhat tongue-in-cheek) football site, http://beatpaths.com/ , where you can see full graphs of the entire NFL season each week. (And other sports.) I'm basically looking for ranking algorithms other than the ones I listed above that might make even more sense and can be defended as using all the information in the graph. The goal isn't necessarily to be more accurate in terms of future picks (although a stronger algorithm probably would be), but instead to be as reasonably descriptive as possible of the season so far.
好的,实际应用是运动队。每个节点都是一个团队,每个链接都胜过另一个团队。假设没有圆形胜利路径,如A-> B-> C-> A.目标是获得与图表不冲突的权力排名,并按照团队的力量对团队进行排名。有问题的网站是我的(有点诙谐的)足球网站http://beatpaths.com/,在那里你可以看到每周整个NFL赛季的完整图表。 (以及其他运动。)我基本上都在寻找排名算法,而不是我上面列出的那些可能更有意义的排名算法,可以使用图表中的所有信息进行辩护。目标未必在未来选择方面更准确(尽管可能会采用更强的算法),而是尽可能合理地描述到目前为止的赛季。
You can see Week 3 of the NFL Season up on the site. I've removed two "beatloops" (long story) that were ambiguous, relying on the rest of the graph to determine the pecking order.
你可以在网站上看到NFL Season的第3周。我删除了两个含糊不清的“beatloops”(长篇故事),依靠图表的其余部分来确定啄食顺序。
#4
0
If you're looking to make predictions, your best bet is probably going to be a maximum entropy ranking algorithm. The problem is going to be developing a large enough data set for your learner--the more, the better. It sounds like you can use the standings at each week of games as a single ranking instance.
如果您正在寻找预测,那么您最好的选择可能是最大熵排名算法。问题是为学习者开发足够大的数据集 - 越多越好。听起来你可以将每周游戏中的排名用作单个排名实例。
#1
2
This really depends on what you mean by strength. Because of the versatility of the DAG in representing information, you could be discussing anything from a multiple-outcome control flow to argument clauses of non-adverbial discourse connectives or even the full set of dependencies between different words in a sentence.
这实际上取决于你的力量意味着什么。由于DAG在表示信息方面的多功能性,您可以讨论从多结果控制流程到非状语话语连接词的参数从句甚至句子中不同词之间的完整依赖关系。
All of these would view node strength in a different way. For instance, a control flow may consider the node with the most amount of outcomes (therefore the most outward arcs) as the strongest, because it has the most power over the eventual outcome of the diagram. In discourse, the strongest node is the discourse connective, but it is found in speech and text after the first connective and before the third. Selection of the lexical "head" of a sentence is not directly related to the amount of arcs directly interacting with it.
所有这些都将以不同的方式查看节点强度。例如,控制流可以认为具有最多结果的节点(因此最外向弧)是最强的,因为它对图的最终结果具有最大的功率。在话语中,最强的节点是话语连词,但是在第一个连词和第三连词之后的语音和文本中找到它。选择句子的词汇“头”与直接与之相互作用的弧的数量没有直接关系。
What I'm getting at is that there is no real panacea for computing "strength" in this data type because of the polysemous character of the term "strength" and the kind of data a DAG fits. I would say that in a machine learning problem, all three of these approaches would be very informative in selecting particular types of nodes for a classification or ranking problem, but in the end, the answer depends upon the data type's practical application.
我所得到的是,由于术语“强度”的多义性和DAG适合的数据种类,在这种数据类型中计算“强度”并没有真正的灵丹妙药。我要说的是,在机器学习问题中,所有这三种方法在为分类或排序问题选择特定类型的节点时都会提供非常丰富的信息,但最终,答案取决于数据类型的实际应用。
#2
1
I think you need to define "strength" more clearly. Is this related to the maximum flow problem?
我认为你需要更清楚地定义“力量”。这与最大流量问题有关吗?
#3
0
Okay, the practical application is sports teams. Each node is a team, each link is a victory over another team. Assume there are no circular victory paths, like A->B->C->A. The objective is to get a power ranking that doesn't conflict with the graph, and ranks the teams in order of a team's strength. The site in question is my (somewhat tongue-in-cheek) football site, http://beatpaths.com/ , where you can see full graphs of the entire NFL season each week. (And other sports.) I'm basically looking for ranking algorithms other than the ones I listed above that might make even more sense and can be defended as using all the information in the graph. The goal isn't necessarily to be more accurate in terms of future picks (although a stronger algorithm probably would be), but instead to be as reasonably descriptive as possible of the season so far.
好的,实际应用是运动队。每个节点都是一个团队,每个链接都胜过另一个团队。假设没有圆形胜利路径,如A-> B-> C-> A.目标是获得与图表不冲突的权力排名,并按照团队的力量对团队进行排名。有问题的网站是我的(有点诙谐的)足球网站http://beatpaths.com/,在那里你可以看到每周整个NFL赛季的完整图表。 (以及其他运动。)我基本上都在寻找排名算法,而不是我上面列出的那些可能更有意义的排名算法,可以使用图表中的所有信息进行辩护。目标未必在未来选择方面更准确(尽管可能会采用更强的算法),而是尽可能合理地描述到目前为止的赛季。
You can see Week 3 of the NFL Season up on the site. I've removed two "beatloops" (long story) that were ambiguous, relying on the rest of the graph to determine the pecking order.
你可以在网站上看到NFL Season的第3周。我删除了两个含糊不清的“beatloops”(长篇故事),依靠图表的其余部分来确定啄食顺序。
#4
0
If you're looking to make predictions, your best bet is probably going to be a maximum entropy ranking algorithm. The problem is going to be developing a large enough data set for your learner--the more, the better. It sounds like you can use the standings at each week of games as a single ranking instance.
如果您正在寻找预测,那么您最好的选择可能是最大熵排名算法。问题是为学习者开发足够大的数据集 - 越多越好。听起来你可以将每周游戏中的排名用作单个排名实例。