最有效的博客数据库设计(文章和评论)

时间:2022-08-01 12:53:54

What would be the best way of designing a database to store blog posts and comments? I am currently thinking one table for posts, and another for comments, each with a post ID.

设计一个数据库来存储博客文章和评论的最佳方式是什么?我目前正在考虑一个表用于贴子,另一个表用于评论,每个表都有一个贴子ID。

It seems to me, however, trawling through a large table of comments to find those for the relevant post would be expensive, and would be done every time a blog post is loaded (perhaps with some amount of caching).

然而,在我看来,搜索大量的评论以找到相关文章的评论将是昂贵的,而且每次博客文章被加载时都要这样做(可能需要一些缓存)。

Is there a better way?

有更好的办法吗?

5 个解决方案

#1


17  

It seems to me, however, trawling through a large table of comments

然而,在我看来,似乎是在浏览一大堆评论

All the database vendors agree with you.

所有的数据库供应商都同意您的观点。

They offer "indexes" to limit this.

他们提供“指数”来限制这一点。

#2


13  

Every database system you would be using to implement your blog will use indexing. What this means is that, rather than "trawling through a large table", your database system maintains a seperate list of comments and which posts they are associated with, much like the index at the back of a book. This allows the database system to load comments associated with a post extremely quickly, and I don't see any problems with your proposed design for a blog of any size.

您将使用的每个数据库系统实现您的博客将使用索引。这意味着,您的数据库系统维护的不是“浏览一个大表”,而是一个独立的注释列表以及与之相关的文章,就像书后面的索引一样。这允许数据库系统以极快的速度加载与文章相关的评论,而且我不认为您对任何大小的博客的设计有任何问题。

Indexes are routinely used to associate tables with millions of rows with other tables with millions of rows - you would have to have an exceptionally large blog to require denormalization of comments, and even still, caching would probably serve you far better than denormalizing the database.

索引通常用于将数百万行的表与数百万行的表相关联,而将数百万行的表与其他表相关联——您必须拥有一个非常大的博客,才能要求注释的非规范化,即使如此,缓存可能比数据库的非规范化要好得多。

You will need to define an index on your comments table, and associate it with whatever column holds the Post ID. How that's done is dependent on what database system you are using.

您将需要在comments表上定义一个索引,并将其与包含Post ID的任何列相关联。

#3


7  

try something like this:

试试这样:

Blog
BlogID     int auto number PK
BlogName   string
...

BlogPost
BlogPostID   int auto number PK
BlogID       int FK to Blog.BlogID, index
BlogContent  string
....

Comment
CommentID       int auto number PK
BlogPostID      int FK to BlogPost.BlogPostID, index   
ReplyToCommentID int FK to Comment.CommentID  <<for comments on comments
...

#4


1  

trawling through a large table of comments to find those for the relevant post would be expensive,

搜索大量的评论来找到相关的文章将是昂贵的,

An index is always there to rescue you! First index on postId and another of commentdate (desc)

总会有一个索引来拯救你!有关《职业介绍所》的首个索引及另一篇评论(desc)

#5


1  

Okay, let's see.

好的,让我们看看。

trawling through a large table of comments to find those for the relevant post would be expensive

搜索大量的评论来找到相关的帖子将是昂贵的

Why do you think it'd be expensive? Because you possibly believe that a linear search will be done every time taking O(n) time. For a billion comments, a billion iterations will be done.

为什么你认为它会很贵?因为你可能认为线性搜索每次都需要O(n)时间。对于十亿个评论,十亿个迭代将被完成。

Now suppose a binary search tree is constructed for comment_ID. To look up any comment, you need log(n) time [base 2]. So for even 1 billion comments, only around 32 iterations will be needed.

现在假设为comment_ID构造了一个二叉搜索树。为了查找任何注释,您需要log(n) time [base 2]。因此,即使是10亿个评论,也只需要大约32次迭代。

Now consider a slightly modified BST, where each node contains k elements instead of 1 (in a list) and has k+1 children nodes. The same properties of BST are followed in this data structure as well. What we've got here is called a B-tree. More reading : GeeksForGeeks - B Tree Introduction

现在考虑一个稍微修改过的BST,其中每个节点包含k个元素,而不是列表中的1个,并且有k+1个子节点。在这个数据结构中也遵循了BST的相同属性。这里我们有一个b树。更多阅读:极客- B树介绍

For a B-Tree, the lookup time is log(n) [base k]. Hence, if k=10, for 1 billion entries, only 9 iterations will be needed.

对于b树,查找时间为log(n)[以k为基数]。因此,如果k=10,对于10亿个条目,只需要9次迭代。

All databases save indexes for primary keys in B-Trees. Hence, the stated task would not be expensive, and you should go ahead and design the database the way it seemed obvious.

所有数据库都为b树中的主键保存索引。因此,声明的任务不会很昂贵,您应该继续设计数据库,这是显而易见的。

PS: You can build an index on any column of the table. By default primary key indexes are already stored. But be careful, do not make unnecessary indexes as they take up disk space.

你可以在表格的任何一列建立索引。默认情况下,主键索引已经存储。但是要小心,不要在索引占用磁盘空间时使用不必要的索引。

#1


17  

It seems to me, however, trawling through a large table of comments

然而,在我看来,似乎是在浏览一大堆评论

All the database vendors agree with you.

所有的数据库供应商都同意您的观点。

They offer "indexes" to limit this.

他们提供“指数”来限制这一点。

#2


13  

Every database system you would be using to implement your blog will use indexing. What this means is that, rather than "trawling through a large table", your database system maintains a seperate list of comments and which posts they are associated with, much like the index at the back of a book. This allows the database system to load comments associated with a post extremely quickly, and I don't see any problems with your proposed design for a blog of any size.

您将使用的每个数据库系统实现您的博客将使用索引。这意味着,您的数据库系统维护的不是“浏览一个大表”,而是一个独立的注释列表以及与之相关的文章,就像书后面的索引一样。这允许数据库系统以极快的速度加载与文章相关的评论,而且我不认为您对任何大小的博客的设计有任何问题。

Indexes are routinely used to associate tables with millions of rows with other tables with millions of rows - you would have to have an exceptionally large blog to require denormalization of comments, and even still, caching would probably serve you far better than denormalizing the database.

索引通常用于将数百万行的表与数百万行的表相关联,而将数百万行的表与其他表相关联——您必须拥有一个非常大的博客,才能要求注释的非规范化,即使如此,缓存可能比数据库的非规范化要好得多。

You will need to define an index on your comments table, and associate it with whatever column holds the Post ID. How that's done is dependent on what database system you are using.

您将需要在comments表上定义一个索引,并将其与包含Post ID的任何列相关联。

#3


7  

try something like this:

试试这样:

Blog
BlogID     int auto number PK
BlogName   string
...

BlogPost
BlogPostID   int auto number PK
BlogID       int FK to Blog.BlogID, index
BlogContent  string
....

Comment
CommentID       int auto number PK
BlogPostID      int FK to BlogPost.BlogPostID, index   
ReplyToCommentID int FK to Comment.CommentID  <<for comments on comments
...

#4


1  

trawling through a large table of comments to find those for the relevant post would be expensive,

搜索大量的评论来找到相关的文章将是昂贵的,

An index is always there to rescue you! First index on postId and another of commentdate (desc)

总会有一个索引来拯救你!有关《职业介绍所》的首个索引及另一篇评论(desc)

#5


1  

Okay, let's see.

好的,让我们看看。

trawling through a large table of comments to find those for the relevant post would be expensive

搜索大量的评论来找到相关的帖子将是昂贵的

Why do you think it'd be expensive? Because you possibly believe that a linear search will be done every time taking O(n) time. For a billion comments, a billion iterations will be done.

为什么你认为它会很贵?因为你可能认为线性搜索每次都需要O(n)时间。对于十亿个评论,十亿个迭代将被完成。

Now suppose a binary search tree is constructed for comment_ID. To look up any comment, you need log(n) time [base 2]. So for even 1 billion comments, only around 32 iterations will be needed.

现在假设为comment_ID构造了一个二叉搜索树。为了查找任何注释,您需要log(n) time [base 2]。因此,即使是10亿个评论,也只需要大约32次迭代。

Now consider a slightly modified BST, where each node contains k elements instead of 1 (in a list) and has k+1 children nodes. The same properties of BST are followed in this data structure as well. What we've got here is called a B-tree. More reading : GeeksForGeeks - B Tree Introduction

现在考虑一个稍微修改过的BST,其中每个节点包含k个元素,而不是列表中的1个,并且有k+1个子节点。在这个数据结构中也遵循了BST的相同属性。这里我们有一个b树。更多阅读:极客- B树介绍

For a B-Tree, the lookup time is log(n) [base k]. Hence, if k=10, for 1 billion entries, only 9 iterations will be needed.

对于b树,查找时间为log(n)[以k为基数]。因此,如果k=10,对于10亿个条目,只需要9次迭代。

All databases save indexes for primary keys in B-Trees. Hence, the stated task would not be expensive, and you should go ahead and design the database the way it seemed obvious.

所有数据库都为b树中的主键保存索引。因此,声明的任务不会很昂贵,您应该继续设计数据库,这是显而易见的。

PS: You can build an index on any column of the table. By default primary key indexes are already stored. But be careful, do not make unnecessary indexes as they take up disk space.

你可以在表格的任何一列建立索引。默认情况下,主键索引已经存储。但是要小心,不要在索引占用磁盘空间时使用不必要的索引。