I've been trying to solve an archived ITA Software puzzle known as Sling Blade Runner for a few days now. The gist of the puzzle is as follows:
我已经试着解决一个被归档的ITA软件难题了,它被称为“投刃者”(Sling Blade Blade Runner),已经有几天了。这个谜题的要点如下:
"How long a chain of overlapping movie titles, like Sling Blade Runner, can you find?"
“你能找到一串重叠的电影片名,比如Sling Blade Runner吗?”
Use the following listing of movie titles: MOVIES.TXT. Multi-word overlaps, as in "License to Kill a Mockingbird," are allowed. The same title may not be used more than once in a solution. Heuristic solutions that may not always produce the greatest number of titles will be accepted: seek a reasonable tradeoff of efficiency and optimality.
使用以下电影标题列表:MOVIES.TXT。多词重叠,如《杀死一只知更鸟》,是允许的。在解决方案中,同一个标题可能不会被重复使用。启发式的解决方案可能并不总是产生最多的标题将被接受:寻求效率和最优性的合理权衡。
The file MOVIES.TXT contains 6561 movie titles in alphabetical order.
电影的文件。TXT包含按字母顺序排列的6561个电影标题。
My attempt at a solution has several parts.
我对解决方案的尝试有几个方面。
Graph Construction:
What I did was map every movie title to every other movie title it could chain to (on it's right). What I end up with as my graph is a Map[String, List[String]]
. You can see the graph that was built using this process here.
我所做的是将每个电影的标题映射到它可以链接到的其他电影的标题(在右边)。我最后得到的图形是一个Map[String, List[String]]你可以看到用这个过程建立的图。
Graph Traversal:
I did a Depth First Search using every node (every key in the map) as a starting node of the search. I kept track of the depth at which each node was visited during the search, and this was tagged in the nodes returned by the DFS. What I ended up with was a List[List[Node]]
where every List[Node]
in the List was the DFS tree from a particular search.
我使用每个节点(map中的每个键)作为搜索的起始节点进行了深度优先搜索。我跟踪在搜索过程中访问每个节点的深度,并在DFS返回的节点中标记该深度。我最后得到的是一个列表[List[Node]],其中列表中的每个列表[Node]都是来自特定搜索的DFS树。
Finding the longest chain:
I took the results of all the graph traversals in the previous step, and for every List[Node]
I sorted the list by the depth values I tagged the nodes with previously, in descending order. Then starting with the head of the list (which gives me the deepest node visited in the DFS) I backtrack through the nodes to build a chain. This gave me a List[List[String]]
where every List[String]
in the List was the longest chain for that particular DFS. Sorting the List[List[String]]
by the size of each List[String]
and grabbing the head then gave me the largest chain.
我获取了前面步骤中所有图遍历的结果,对于每个列表[节点],我按照之前标记的节点的深度值按降序对列表进行排序。然后从列表的头部开始(这给了我在DFS中访问的最深的节点),我回溯节点,构建一个链。这给了我一个列表[List[String]]],其中列表中的每个列表[String]都是该特定DFS的最长链。根据每个列表[字符串]的大小对列表[List[String]]进行排序,然后获取头部,这样就得到了最大的链。
Results:
The longest chain found with my algorithm was 217 titles long. The output can be viewed here.
用我的算法找到的最长的链有217个标题。输出可以在这里查看。
I've only been able to find a few other attempts by Googling, and it seems every other attempt has produced longer chains than what I was able to accomplish. For example this post states that Eric Burke found a chain 245 titles long, and a user by the name of icefox on Reddit found a chain that was 312 titles long.
通过谷歌搜索,我只找到了一些其他的尝试,似乎每一次尝试都产生了比我所能完成的更长的链。例如,在这篇文章中,Eric Burke发现了一个长245个标题的链,而一个名为icefox的用户在Reddit上发现了一个长达312个标题的链。
I can't think of where my algorithm is failing to find the longest chain, given other people have found longer chains. Any help/guidance is much appreciated. If you'd like to review my code, it can be found here (it's written in Scala and I just started learning Scala so forgive me if I made some noob mistakes).
我想不出我的算法在哪里没有找到最长的链,因为其他人发现了更长的链。非常感谢您的帮助/指导。如果您想查看我的代码,可以在这里找到(它是用Scala编写的,我刚开始学习Scala,所以如果我犯了一些noob错误,请见谅)。
Update:
I made some changes to my algorithm, which now finds chains of length 240+. See here
我对我的算法做了一些修改,现在它找到长度240+的链。在这里看到的
1 个解决方案
#1
1
The issue is that, since the movie graph (I'm assuming) has cycles, no matter how you assign depths to the vertices of the cycle, there exists a subpath that is not monotone in the depth and thus is not considered by your algorithm. Sling Blade Runner is NP-hard, since we want no, so no known polynomial-time strategy is going to produce optimal solutions on every input.
问题是,因为电影图(我假设)有循环,不管你如何分配深度到周期的顶点,在深度上存在一个不单调的子路径,因此你的算法没有考虑到它。吊带叶片流道是np困难的,因为我们不想要,所以没有已知的多项式时间策略会在每个输入上产生最优解。
(Sling Blade Runner isn't quite the NP-hard longest path problem, which specifies paths with no repeated vertices instead of no repeated arcs, but there is an easy polynomial-time reduction from the latter to the former. Split each vertex v
into v_in -> v_out
, moving arc heads to the in vertex and arc tails to the out vertex. Make additional arcs from a source vertex to another source vertex to each in vertex, and from each out vertex to a sink vertex to another sink vertex.
(Sling Blade Runner并不是NP-hard最长路径问题,它指定没有重复顶点的路径而不是没有重复弧的路径,但从后者到前者有一个简单的多项式时间约简。将每个顶点v分割为v_in -> v_out,将弧首移动到in顶点,弧尾移动到out顶点。从一个源顶点到另一个源顶点再到每个顶点,从每个外顶点到一个汇聚顶点再到另一个汇聚顶点。
To find the longest path on the graph a->b, b->c, c->a, c->d
, the input to Sling Blade Runner would be
图a->b、b->c、c->a、c->d上的最长路径为吊带叶片流道的输入
s1->s2,
s2->a_in, s2->b_in, s2->c_in, s2->d_in,
a_in->a_out, b_in->b_out, c_in->c_out, d_in->d_out,
a_out->b_in, b_out->c_in, c_out->a_in, c_out->d_in,
a_out->t1, b_out->t1, c_out->t1, d_out->t1,
t1->t2.
The longest path problem forbids repeated vertices, so the optimal solution is a->b->c->d
rather than c->a->b->c->d
. The corresponding chain in Sling Blade Runner is s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1->t2
. The corresponding transformation of the path with a repeated vertex would repeat the arc c_in->c_out
and thus be infeasible for Sling Blade Runner.)
最长路径问题禁止重复顶点,所以最优解是a->b->c->d而不是c->a->b->c->d。吊带叶片流道对应的链为s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1- >0 t2。具有重复顶点的路径对应的变换将重复圆弧c_in->c_out,因此对于吊带式叶片流道来说是不可行的。
Suppose that the movies titles are
假设电影的标题是
S A
S B
A B
A E
B C
C D
D A
E F
, so that the graph looks like
这个图看起来是这样的。
F
^
|
E
^
|
S-->A-->B<--
| ^ | \
| | v |
| D<--C |
\___________/
We start the DFS from S
and get the following tree (because I said so; this is not the only possible DFS tree).
我们从S开始DFS并得到下面的树(因为我这样说过;这不是唯一可能的DFS树)。
S-->A-->B-->C-->D
\
->E-->F
. The depths are
。深度是
S 0
A 1
B 2
C 3
D 4
E 2
F 3
, so the longest depth-monotone path is S A B C D
. The longest path is S B C D A E F
. If you start the DFS elsewhere, then you won't even assign S
a depth.
,所以最长的深度单调路径是,sdb C D,最长的路径是,如果你在别的地方启动DFS,你甚至不会分配S A深度。
A simpler example is
一个简单的例子是
A B
B C
C D
D A
, where, no matter where you start, the optimal path, that goes all the way around the cycle, is not depth-monotone: A B C D A
or B C D A B
or C D A B C
or D A B C D
.
,无论你从哪里开始,最优路径,沿着整个循环,都不是深度单调的:A B C D A或B C D B或C D B C或D D B C D D D。
#1
1
The issue is that, since the movie graph (I'm assuming) has cycles, no matter how you assign depths to the vertices of the cycle, there exists a subpath that is not monotone in the depth and thus is not considered by your algorithm. Sling Blade Runner is NP-hard, since we want no, so no known polynomial-time strategy is going to produce optimal solutions on every input.
问题是,因为电影图(我假设)有循环,不管你如何分配深度到周期的顶点,在深度上存在一个不单调的子路径,因此你的算法没有考虑到它。吊带叶片流道是np困难的,因为我们不想要,所以没有已知的多项式时间策略会在每个输入上产生最优解。
(Sling Blade Runner isn't quite the NP-hard longest path problem, which specifies paths with no repeated vertices instead of no repeated arcs, but there is an easy polynomial-time reduction from the latter to the former. Split each vertex v
into v_in -> v_out
, moving arc heads to the in vertex and arc tails to the out vertex. Make additional arcs from a source vertex to another source vertex to each in vertex, and from each out vertex to a sink vertex to another sink vertex.
(Sling Blade Runner并不是NP-hard最长路径问题,它指定没有重复顶点的路径而不是没有重复弧的路径,但从后者到前者有一个简单的多项式时间约简。将每个顶点v分割为v_in -> v_out,将弧首移动到in顶点,弧尾移动到out顶点。从一个源顶点到另一个源顶点再到每个顶点,从每个外顶点到一个汇聚顶点再到另一个汇聚顶点。
To find the longest path on the graph a->b, b->c, c->a, c->d
, the input to Sling Blade Runner would be
图a->b、b->c、c->a、c->d上的最长路径为吊带叶片流道的输入
s1->s2,
s2->a_in, s2->b_in, s2->c_in, s2->d_in,
a_in->a_out, b_in->b_out, c_in->c_out, d_in->d_out,
a_out->b_in, b_out->c_in, c_out->a_in, c_out->d_in,
a_out->t1, b_out->t1, c_out->t1, d_out->t1,
t1->t2.
The longest path problem forbids repeated vertices, so the optimal solution is a->b->c->d
rather than c->a->b->c->d
. The corresponding chain in Sling Blade Runner is s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1->t2
. The corresponding transformation of the path with a repeated vertex would repeat the arc c_in->c_out
and thus be infeasible for Sling Blade Runner.)
最长路径问题禁止重复顶点,所以最优解是a->b->c->d而不是c->a->b->c->d。吊带叶片流道对应的链为s1->s2->a_in->a_out->b_in->b_out->c_in->c_out->d_in->d_out->t1- >0 t2。具有重复顶点的路径对应的变换将重复圆弧c_in->c_out,因此对于吊带式叶片流道来说是不可行的。
Suppose that the movies titles are
假设电影的标题是
S A
S B
A B
A E
B C
C D
D A
E F
, so that the graph looks like
这个图看起来是这样的。
F
^
|
E
^
|
S-->A-->B<--
| ^ | \
| | v |
| D<--C |
\___________/
We start the DFS from S
and get the following tree (because I said so; this is not the only possible DFS tree).
我们从S开始DFS并得到下面的树(因为我这样说过;这不是唯一可能的DFS树)。
S-->A-->B-->C-->D
\
->E-->F
. The depths are
。深度是
S 0
A 1
B 2
C 3
D 4
E 2
F 3
, so the longest depth-monotone path is S A B C D
. The longest path is S B C D A E F
. If you start the DFS elsewhere, then you won't even assign S
a depth.
,所以最长的深度单调路径是,sdb C D,最长的路径是,如果你在别的地方启动DFS,你甚至不会分配S A深度。
A simpler example is
一个简单的例子是
A B
B C
C D
D A
, where, no matter where you start, the optimal path, that goes all the way around the cycle, is not depth-monotone: A B C D A
or B C D A B
or C D A B C
or D A B C D
.
,无论你从哪里开始,最优路径,沿着整个循环,都不是深度单调的:A B C D A或B C D B或C D B C或D D B C D D D。