在数据库中存储标签的最佳实践?

I developed a site that uses tags (key words) in order to categorize photographs. Right now, what I have in my MySQL database is a table with the following structure:

我开发了一个网站，使用标签(关键词)对照片进行分类。现在，我的MySQL数据库中有一个具有以下结构的表:

image_id (int)
tag      (varchar(32))

Every time someone tags an image (if the tag is valid and has enough votes) it's added to the database. I think that this isn't the optimal way of doing things since now that I have 5000+ images with tags, the tags table has over 40000 entries. I fear that this will begin to affect performance (if it's not already affecting it).

每当有人标记一个图像(如果标记是有效的并且有足够的选票)，它就被添加到数据库中。我认为这不是做事情的最佳方式，因为现在我有5000多个带有标签的图片，标签表有超过40000个条目。我担心这会开始影响性能(如果它还没有影响到性能)。

I considered this other structure thinking that it'd be faster to fetch the tags associated to a particular image but then it looks horrible for when I want to get all the tags, or the most popular one for instance:

我认为另一个结构可以更快地获取与特定图像相关联的标签，但是当我想要获取所有的标签，或者最流行的标签时，它看起来很糟糕:

image_id (int)
tags     (text) //comma delimited list of tags for the image

Is there a correct way of doing this or are both ways more or less the same? Thoughts?

有没有一种正确的方法来做这件事，或者两种方法或多或少都是一样的?想法吗?

4 个解决方案

#1

Look over in these questions/posts

看看这些问题/帖子

* question
*问题
Phillip Keller's Blog post on implementing tagging
菲利普·凯勒关于实现标签的博客文章

#2

Use a many-to-many table to link a TAG record to an IMAGE record:

使用多对多表将标记记录链接到图像记录:

IMAGE

DROP TABLE IF EXISTS `example`.`image`;
CREATE TABLE  `example`.`image` (
  `image_id` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`image_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

TAG

DROP TABLE IF EXISTS `example`.`tag`;
CREATE TABLE  `example`.`tag` (
 `tag_id` int(10) unsigned NOT NULL auto_increment,
 `description` varchar(45) NOT NULL default '',
 PRIMARY KEY  (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

IMAGE_TAG_MAP

DROP TABLE IF EXISTS `example`.`image_tag_map`;
CREATE TABLE  `example`.`image_tag_map` (
 `image_id` int(10) unsigned NOT NULL default '0',
 `tag_id` int(10) unsigned NOT NULL default '0',
 PRIMARY KEY  (`image_id`,`tag_id`),
 KEY `tag_fk` (`tag_id`),
 CONSTRAINT `image_fk` FOREIGN KEY (`image_id`) REFERENCES `image` (`image_id`),
 CONSTRAINT `tag_fk` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

#3

You can make a tags table which is just an id and tag with a unique constraint on tag and then photo_tags table which has tag_id and photo_id. Insert a tag into the tags table only if it doesn't already exist.

你可以创建一个标签表，它只是一个id和标签，在标签上有唯一的约束，然后是photo_tags表，它有tag_id和photo_id。只有在标记表不存在时，才将标记插入标记表。

Then you will be querying by a pk instead of varchar text comparison when doing queries like how many photos are tagged with a certain tag.

然后，在执行查询时，您将使用pk而不是varchar文本比较，例如有多少照片被标记为特定的标记。

#4

In multi tag search query you will have to hit every tag that is requested. Hence image tag set I has to be a superset of the request tag set U.

在多标记搜索查询中，您将不得不命中被请求的每个标记。因此，图像标记集I必须是请求标记集U的超集。

I >= U

To implement this complex comparison in SQL is a bit of challenge as each of the image has to be qualified individually. Given that tags are unique set per image:

在SQL中实现这种复杂的比较有点困难，因为每个映像都必须单独限定。考虑到标签是每个图像唯一的设置:

SELECT i.* FROM images AS i WHERE {n} = (
  SELECT COUNT(*) 
  FROM image_tags AS t 
  WHERE t.image_id = i.image_id
    AND t.tag IN ({tag1}, {tag2}, ... {tagn})
)

Schema:

模式:

CREATE TABLE images (
  image_id varchar NOT NULL,
  PRIMARY KEY (image_id)
)

CREATE TABLE image_tags (
  image_id varchar NOT NULL,
  tag varchar NOT NULL,
  PRIMARY KEY (image_id, tag)
)

#1

Look over in these questions/posts

看看这些问题/帖子

* question
*问题
Phillip Keller's Blog post on implementing tagging
菲利普·凯勒关于实现标签的博客文章

#2

Use a many-to-many table to link a TAG record to an IMAGE record:

使用多对多表将标记记录链接到图像记录:

IMAGE

DROP TABLE IF EXISTS `example`.`image`;
CREATE TABLE  `example`.`image` (
  `image_id` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`image_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

TAG

DROP TABLE IF EXISTS `example`.`tag`;
CREATE TABLE  `example`.`tag` (
 `tag_id` int(10) unsigned NOT NULL auto_increment,
 `description` varchar(45) NOT NULL default '',
 PRIMARY KEY  (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

IMAGE_TAG_MAP

DROP TABLE IF EXISTS `example`.`image_tag_map`;
CREATE TABLE  `example`.`image_tag_map` (
 `image_id` int(10) unsigned NOT NULL default '0',
 `tag_id` int(10) unsigned NOT NULL default '0',
 PRIMARY KEY  (`image_id`,`tag_id`),
 KEY `tag_fk` (`tag_id`),
 CONSTRAINT `image_fk` FOREIGN KEY (`image_id`) REFERENCES `image` (`image_id`),
 CONSTRAINT `tag_fk` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

#3

Then you will be querying by a pk instead of varchar text comparison when doing queries like how many photos are tagged with a certain tag.

然后，在执行查询时，您将使用pk而不是varchar文本比较，例如有多少照片被标记为特定的标记。

#4

In multi tag search query you will have to hit every tag that is requested. Hence image tag set I has to be a superset of the request tag set U.

在多标记搜索查询中，您将不得不命中被请求的每个标记。因此，图像标记集I必须是请求标记集U的超集。

I >= U

To implement this complex comparison in SQL is a bit of challenge as each of the image has to be qualified individually. Given that tags are unique set per image:

在SQL中实现这种复杂的比较有点困难，因为每个映像都必须单独限定。考虑到标签是每个图像唯一的设置:

SELECT i.* FROM images AS i WHERE {n} = (
  SELECT COUNT(*) 
  FROM image_tags AS t 
  WHERE t.image_id = i.image_id
    AND t.tag IN ({tag1}, {tag2}, ... {tagn})
)

Schema:

模式:

CREATE TABLE images (
  image_id varchar NOT NULL,
  PRIMARY KEY (image_id)
)

CREATE TABLE image_tags (
  image_id varchar NOT NULL,
  tag varchar NOT NULL,
  PRIMARY KEY (image_id, tag)
)

秒客网

在数据库中存储标签的最佳实践?

4 个解决方案

#1

#2

IMAGE

TAG

IMAGE_TAG_MAP

#3

#4

#1

#2

IMAGE

TAG

IMAGE_TAG_MAP

#3

#4

相关文章