以数据库方式处理标签的更有效方法是什么?

时间:2022-10-17 23:25:48

Is it more efficient to use a taglist field, with all the tags separated by a space, or use 2 more tables (tag: tagid tagtext, tagitem: tagid, itemid)?

使用taglist字段更有效,所有标签用空格分隔,还是再使用2个表(tag:tagid tagtext,tagitem:tagid,itemid)?

4 个解决方案

#1


3  

The efficiency largely depends on what you are doing. If you want to query based on the tag name, it is probably faster if you have a tag table with the ID keyed on both the tag and items table (i.e. option #2). However, unless you have thousands of rows of either, it probably won't make a difference. If you don't have that many tags at all, the difference will be even less.

效率很大程度上取决于你在做什么。如果要根据标记名称进行查询,如果您的标记表在标记和项目表(即选项#2)上都标记了ID,则可能会更快。但是,除非你有数千行,否则它可能没什么区别。如果您根本没有那么多标签,那么差异会更小。

If you want to get tags by item IDs, though, the first method is ever so slightly faster. Again, I doubt you will notice.

但是,如果您想按项目ID获取标签,那么第一种方法的速度要快得多。我再次怀疑你会注意到。

There are other considerations to make: data integrity and normalization. If you use two tables and foreign keys, it is much easier for you to have your set of tags be consistent with the items. If a tag is removed and you are only using one table, old items will still have the old tags. Additionally, it's much easier to get a list of unique tags and keep it consistent. If you have tags in another table, this opens up a whole new world of organization: you can make timestamps for tag creation and modification, mark tags as active or inactive (and possibly other statuses), etc.

还有其他考虑因素:数据完整性和规范化。如果使用两个表和外键,则可以更轻松地使标记集与项目保持一致。如果删除了某个代码并且您只使用了一个表格,则旧项目仍会包含旧代码。此外,获取唯一标记列表并使其保持一致更容易。如果您在另一个表中有标签,则会打开一个全新的组织世界:您可以为标签创建和修改创建时间戳,将标签标记为活动或非活动(以及可能的其他状态)等。

#2


1  

The second option. Store the tags separately. You won't be able to write good queries to search on a specific tag if you store them in a single field. You don't want to use MATCH or LIKE to filter on tags. By storing them in a separate table, you can easily find the tags you need, and the related articles too. Your tables do need to be properly indexed, though.

第二种选择。分别存储标签。如果将特定标记存储在单个字段中,则无法编写好的查询来搜索特定标记。您不希望使用MATCH或LIKE来过滤标记。通过将它们存储在单独的表中,您可以轻松找到所需的标签以及相关文章。但是,您的表确实需要正确编入索引。

Never store comma/space/otherwise separated values in a database if you need to query for those values. The whole essence of a database is to store the data in a structured way. This way the database can optimize the retrieval of that data to a great extent.

如果需要查询这些值,切勿在数据库中存储逗号/空格/其他分隔值。数据库的全部本质是以结构化的方式存储数据。这样,数据库可以在很大程度上优化对该数据的检索。

#3


0  

The second version, to split the data into two additional tables, is a lot more efficient, as it allows the database to use indexes to run the queries you mostly need (Get all texts with a certain tag, get a count of how often the tags are used sorted by count for the tag cloud, and get all tags for the given text)

第二个版本,将数据拆分为两个额外的表,效率更高,因为它允许数据库使用索引来运行您最需要的查询(获取具有特定标记的所有文本,计算频率的数量)标签按标签云的计数使用,并获取给定文本的所有标签)

#4


-1  

One table will be more efficient, but having two tables is generally the proper way to store simple tags.

一个表将更有效,但有两个表通常是存储简单标记的正确方法。

#1


3  

The efficiency largely depends on what you are doing. If you want to query based on the tag name, it is probably faster if you have a tag table with the ID keyed on both the tag and items table (i.e. option #2). However, unless you have thousands of rows of either, it probably won't make a difference. If you don't have that many tags at all, the difference will be even less.

效率很大程度上取决于你在做什么。如果要根据标记名称进行查询,如果您的标记表在标记和项目表(即选项#2)上都标记了ID,则可能会更快。但是,除非你有数千行,否则它可能没什么区别。如果您根本没有那么多标签,那么差异会更小。

If you want to get tags by item IDs, though, the first method is ever so slightly faster. Again, I doubt you will notice.

但是,如果您想按项目ID获取标签,那么第一种方法的速度要快得多。我再次怀疑你会注意到。

There are other considerations to make: data integrity and normalization. If you use two tables and foreign keys, it is much easier for you to have your set of tags be consistent with the items. If a tag is removed and you are only using one table, old items will still have the old tags. Additionally, it's much easier to get a list of unique tags and keep it consistent. If you have tags in another table, this opens up a whole new world of organization: you can make timestamps for tag creation and modification, mark tags as active or inactive (and possibly other statuses), etc.

还有其他考虑因素:数据完整性和规范化。如果使用两个表和外键,则可以更轻松地使标记集与项目保持一致。如果删除了某个代码并且您只使用了一个表格,则旧项目仍会包含旧代码。此外,获取唯一标记列表并使其保持一致更容易。如果您在另一个表中有标签,则会打开一个全新的组织世界:您可以为标签创建和修改创建时间戳,将标签标记为活动或非活动(以及可能的其他状态)等。

#2


1  

The second option. Store the tags separately. You won't be able to write good queries to search on a specific tag if you store them in a single field. You don't want to use MATCH or LIKE to filter on tags. By storing them in a separate table, you can easily find the tags you need, and the related articles too. Your tables do need to be properly indexed, though.

第二种选择。分别存储标签。如果将特定标记存储在单个字段中,则无法编写好的查询来搜索特定标记。您不希望使用MATCH或LIKE来过滤标记。通过将它们存储在单独的表中,您可以轻松找到所需的标签以及相关文章。但是,您的表确实需要正确编入索引。

Never store comma/space/otherwise separated values in a database if you need to query for those values. The whole essence of a database is to store the data in a structured way. This way the database can optimize the retrieval of that data to a great extent.

如果需要查询这些值,切勿在数据库中存储逗号/空格/其他分隔值。数据库的全部本质是以结构化的方式存储数据。这样,数据库可以在很大程度上优化对该数据的检索。

#3


0  

The second version, to split the data into two additional tables, is a lot more efficient, as it allows the database to use indexes to run the queries you mostly need (Get all texts with a certain tag, get a count of how often the tags are used sorted by count for the tag cloud, and get all tags for the given text)

第二个版本,将数据拆分为两个额外的表,效率更高,因为它允许数据库使用索引来运行您最需要的查询(获取具有特定标记的所有文本,计算频率的数量)标签按标签云的计数使用,并获取给定文本的所有标签)

#4


-1  

One table will be more efficient, but having two tables is generally the proper way to store simple tags.

一个表将更有效,但有两个表通常是存储简单标记的正确方法。