在数据库中存储标签最有效的方法是什么?

I am implementing a tagging system on my website similar to one * uses, my question is - what is the most effective way to store tags so that they may be searched and filtered?

我正在我的网站上实现一个标签系统，类似于*的一个应用，我的问题是:如何最有效地存储标签，以便搜索和过滤?

My idea is this:

我的想法是这样的:

Table: Items
Columns: Item_ID, Title, Content

Table: Tags
Columns: Title, Item_ID

Is this too slow? Is there a better way?

这是太慢了吗?有更好的方法吗?

8 个解决方案

#1

167

One item is going to have many tags. And one tag will belong to many items. This implies to me that you'll quite possibly need an intermediary table to overcome the many-to-many obstacle.

一项有很多标签。其中一个标签将属于许多项。这意味着您很可能需要一个中间表来克服多对多障碍。

Something like:

喜欢的东西:

Table: Items
Columns: Item_ID, Item_Title, Content

表:项目列:Item_ID、Item_Title、内容

Table: Tags
Columns: Tag_ID, Tag_Title

表:标记列:Tag_ID、Tag_Title

Table: Items_Tags
Columns: Item_ID, Tag_ID

表:Items_Tags列:Item_ID, Tag_ID

It might be that your web app is insanely popular and need denormalising down the road, but it's pointless muddying the waters too early.

也许你的web应用程序非常受欢迎，需要在未来的道路上进行非软件化，但过早地混水摸鱼是毫无意义的。

#2

You should read Philipp Keller's blog posts about tagging database schemas. He tries a few and reports his results, both in terms of ease of constructing common queries, and in terms of performance. Number of tags, number of tagged items, and number of tags per item were all factors. The posts are from 2005; I'm not aware of any updates since then.

您应该阅读Philipp Keller关于标记数据库模式的博文。他尝试了一些，并报告了他的结果，包括构造常见查询的易用性和性能。标签的数量、标签项的数量以及每个项目标签的数量都是因素。职位从2005年开始;从那以后我就不知道有什么更新了。

#3

Actually I believe de-normalising the tags table might be a better way forward, depending on scale.

实际上，我认为，根据规模的不同，取消标记表的规范化可能是一种更好的方式。

This way, the tags table simply has tagid, itemid, tagname.

这样，标记表就有tagid、itemid、tagname。

You'll get duplicate tagnames, but it makes adding/removing/editing tags for specific items MUCH more simple. You don't have to create a new tag, remove the allocation of the old one and re-allocate a new one, you just edit the tagname.

您将得到重复的标记名，但是它使为特定项添加/删除/编辑标记变得更加简单。您不必创建一个新标记、删除旧标记的分配并重新分配新标记，只需编辑标记名。

For displaying a list of tags, you simply use DISTINCT or GROUP BY, and of course you can count how many times a tag is used easily, too.

为了显示标签列表，您只需使用不同的或组，当然您也可以计算一个标签的使用次数。

#4

I'd suggest using intermediary third table for storing tags<=>items associations, since we have many-to-many relations between tags and items, i.e. one item can be associated with multiple tags and one tag can be associated with multiple items. HTH, Valve.

我建议使用中间的第三个表来存储标签<=>项目关联，因为我们在标签和项目之间有多对多关系，即一个项目可以与多个标签关联，一个标签可以与多个项目关联。HTH,阀门。

#5

If you don't mind using a bit of non-standard stuff, Postgres version 9.4 and up has an option of storing a record of type JSON text array.

如果您不介意使用一些非标准的东西，Postgres version 9.4和up可以选择存储JSON文本数组类型的记录。

Your schema would be:

您的方案是:

Table: Items
Columns: Item_ID:int, Title:text, Content:text

Table: Tags
Columns: Item_ID:int, Tag_Title:text[]

For more info, see this excellent post by Josh Berkus: http://www.databasesoup.com/2015/01/tag-all-things.html

更多信息，请参见Josh Berkus的精彩文章:http://www.databasesoup.com/2015/01/tag-all-things.html

There are more various options compared thoroughly for performance and the one suggested above is the best overall.

相对于性能而言，有更多不同的选择，上面建议的是最好的整体。

#6

You can't really talk about slowness based on the data you provided in a question. And I don't think you should even worry too much about performance at this stage of developement. It's called premature optimization.

你不能根据问题中提供的数据来谈论慢度。我认为在这个发展阶段，你甚至不应该过分担心性能问题。它叫做过早优化。

However, I'd suggest that you'd include Tag_ID column in the Tags table. It's usually a good practice that every table has an ID column.

但是，我建议您在Tags表中包含Tag_ID列。通常情况下，每个表都有一个ID列。

#7

If space is going to be an issue, have a 3rd table Tags(Tag_Id, Title) to store the text for the tag and then change your Tags table to be (Tag_Id, Item_Id). Those two values should provide a unique composite primary key as well.

如果空间有问题，请使用第3个表标记(Tag_Id, Title)来存储标记的文本，然后将标记表更改为(Tag_Id, Item_Id)。这两个值还应该提供一个惟一的复合主键。

#8

Items should have an "ID" field, and Tags should have an "ID" field (Primary Key, Clustered).

项目应该有一个“ID”字段，标签应该有一个“ID”字段(主键，集群)。

Then make an intermediate table of ItemID/TagID and put the "Perfect Index" on there.

然后制作一个ItemID/TagID的中间表，并将“完美索引”放在其中。

#1

167