最有效的数据结构来表示Java中的线程注释？

I want to represent threaded comments in Java. This would look similar to the way comments are threaded on reddit.com

我想用Java表示线程注释。这看起来类似于reddit.com上的注释的方式

hello
   hello
      hello
      hello
   hello
   hello
      hello

As in the example above, responses are nested in the HTML with appropriate indentation to reflect their relationship to prior comments.

如上例所示,响应嵌套在HTML中,并带有适当的缩进,以反映它们与先前注释的关系。

What would be an efficient way to represent this in Java?

用Java表示这个的有效方法是什么?

I'm thinking some kind of tree data structure would be appropriate.

我认为某种树数据结构是合适的。

But is there one in particular which would be most efficient to minimize tree traversals?

但有没有一个特别是最有效的最小化树遍历?

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

如果我对每条评论进行投票,这将非常重要。因为在每次投票之后树需要重新排序 - 计算上可能是昂贵的操作。

By the way, if anyone knows of an open source existing implementation of this in Java, that would help too.

顺便说一句,如果有人知道Java中的这个开源现有实现,那也会有所帮助。

3 个解决方案

#1

I would use levels of linked lists.

我会使用链接列表的级别。

message1
    message2
        message3
        message4
    message5
    message6
        message7

Each node would have a pointer to its:

每个节点都有一个指向它的指针:

- forward sibling  (2->5, 3->4, 5->6,                   1/4/6/7->NULL).
- backward sibling (4->3, 5->2, 6->5,                   1/2/3/7->NULL).
- first child      (1->2, 2->3, 6->7,                   3/4/5/7->NULL).
- parent           (2->1, 3->2, 4->2, 5->1, 6->1, 7->6,       1->NULL).

Within each level, messages would be sorted in the list by vote count (or whatever other score you wanted to use).

在每个级别中,消息将按照投票计数(或您想要使用的任何其他分数)在列表中进行排序。

That would give you maximum flexibility for moving things around and you could move whole sub-trees (e.g., message2) just by changing the links at the parent and that level.

这将为您提供最大的移动灵活性,您可以通过更改父级和该级别的链接来移动整个子树(例如,message2)。

For example, say message6 gets a influx of votes that makes it more popular than message5. The changes are (adjusting both the next and previous sibling pointers):

例如,假设message6获得大量投票,这使得它比message5更受欢迎。更改是(调整下一个和上一个兄弟指针):

message2 -> message6

message2 - > message6

message6 -> message5

message6 - > message5

message5 -> NULL.

message5 - > NULL。

to get:

message1
    message2
        message3
        message4
    message6
        message7
    message5

If it continues until it garners more votes than message2, the following occurs:

如果它继续获得比message2更多的投票,则会发生以下情况:

message6 -> message2

message6 - > message2

message2 -> message5

message2 - > message5

AND the first-child pointer of message1 is set to message6 (it was message2), still relatively easy, to get:

和message1的第一个子指针设置为message6(它是message2),仍然相对容易,得到:

message1
    message6
        message7
    message2
        message3
        message4
    message5

Re-ordering only needs to occur when a score change results in a message becoming more than its upper sibling or less than its lower sibling. You don't need to re-order after every score change.

只有当分数变化导致消息变得超过其上级兄弟或低于其下级兄弟时,才需要重新排序。每次分数变更后,您无需重新订购。

#2

The tree is right (with getLastSibling and getNextSibling), but if you're storing/querying the data, you probably want to store a lineage for each entry, or number by a preorder traversal:

树是正确的(使用getLastSibling和getNextSibling),但是如果您正在存储/查询数据,您可能希望为每个条目存储一个谱系,或者通过前序遍历来存储数字:

http://www.sitepoint.com/article/hierarchical-data-database/2/

For loss of the exact number of subnodes, you can leave gaps to minimise renumbering. Still, I'm not certain that this will be noticeably faster than traversing the tree each time. I guess it depends how deep your tree grows.

如果丢失了确切的子节点数,您可以留下间隙以最小化重新编号。不过,我不确定这会明显快于每次遍历树。我想这取决于你的树有多深。

#3

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

如果我对每条评论进行投票,这将非常重要。因为在每次投票之后树需要重新排序 - 计算上可能是昂贵的操作。

Sounds like a premature optimization to me, possibly even a faulty optimization.

听起来像是对我的过早优化,甚至可能是错误的优化。

Your tree data structure sounds logical for representing your data. I say stick with it. Optimize it later only if a performance problem is detected and measured, and can be compared with alternatives.

您的树数据结构听起来很合理,可用于表示数据。我说坚持下去。只有在检测到并测量到性能问题时才能对其进行优化,并且可以与备选方案进行比较。

#1