I'm implementing a tag system similar to * tag system but I just wonder How-to get related tags and define the relationships weights between tags like the list of "Related Tags" in any tag page like this https://*.com/questions/tagged/php they define the relationship weight by the co-occurrence between 2 or more tags
我实现一个标签系统类似于*标签系统,但我只是想知道如何得到相关标记和定义标签之间的权重关系中的“相关标签”列表这样的任何标签页面https://*.com/questions/tagged/php定义重量之间的共生关系的2个或更多的标签
How I can do this in PHP/MySQl to define the most related tags for tag "X" and keep all weights up to date as users add more and more posts/questions ?
如何在PHP/MySQl中定义标记“X”最相关的标记,并在用户添加越来越多的帖子/问题时保持所有权重的最新?
3 个解决方案
#1
2
You probably want to look into statistics for this:
你可能想看看统计数据:
- given a tag X
- 给定一个标记X
- check all other tags Y
- 检查所有其他标签Y
- count how often Y and X show up at the same time
- 计算Y和X同时出现的频率
- divide by how often Y shows up
- 除以Y出现的频率。
- ???
- ? ? ?
- Profit!!!
- 利润! ! !
As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.
关于第5步的更多信息:这些信息的变化非常缓慢,因此您可以真正地缓存这些内容,并且只有在您有时间时才能重新创建它。
What you want in the end is a relation
你最终想要的是一种关系
conditional_probability(X, Y, P)
Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.
它告诉你,给定x。P在步骤4中计算,标记Y的可能性有多大。
#2
1
I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.
我使用这个博客条目来计算云中的相对标记大小。你可以对整个can或者一个特定的找到集使用这个算法。
Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.
我没有为数据库中的所有标记存储非规范化的权重,而是在(Ruby)进程中缓存它们,并在添加/删除标记或进程重新启动时重新构建它们。
As for how to store them, you generally want:
至于如何储存,你一般希望:
- A tags table associating unique tag names with row IDs, and
- 一个标记表,将惟一的标记名与行id关联起来。
- A tags_items table providing you with your n-to-n mapping between tags and items.
- 一个tags_items表,为您提供标记和项之间的n- n映射。
Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.
一旦你有了它,一旦你在结果页面上找到了一组条目,它就是一个简单的连接,并且是唯一找到“相关”标签的集合。
#3
0
1 Each post id can be tagged with one or more tags (PHP + other tags)
每个post id都可以使用一个或多个标签(PHP +其他标签)进行标记
2 Going back the same way each tag has associated post id
返回的方式与每个标签有关联的post id相同
3 Foreach post id get all tags other than PHP
每个post id获取除PHP之外的所有标记
4 Show only those which has count more than a prticular Number (say 4000)
4 .只显示比实际数字多的数字(如4000)
Think about it this question has been tagged "Mysql" "Database-design" "Tags" and "Tagging" Do you see how you have related PHP with other tags.
想想这个问题被标记为“Mysql”“数据库设计”“标签”和“标签”你看到你是如何把PHP和其他标签联系起来的吗?
#1
2
You probably want to look into statistics for this:
你可能想看看统计数据:
- given a tag X
- 给定一个标记X
- check all other tags Y
- 检查所有其他标签Y
- count how often Y and X show up at the same time
- 计算Y和X同时出现的频率
- divide by how often Y shows up
- 除以Y出现的频率。
- ???
- ? ? ?
- Profit!!!
- 利润! ! !
As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.
关于第5步的更多信息:这些信息的变化非常缓慢,因此您可以真正地缓存这些内容,并且只有在您有时间时才能重新创建它。
What you want in the end is a relation
你最终想要的是一种关系
conditional_probability(X, Y, P)
Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.
它告诉你,给定x。P在步骤4中计算,标记Y的可能性有多大。
#2
1
I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.
我使用这个博客条目来计算云中的相对标记大小。你可以对整个can或者一个特定的找到集使用这个算法。
Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.
我没有为数据库中的所有标记存储非规范化的权重,而是在(Ruby)进程中缓存它们,并在添加/删除标记或进程重新启动时重新构建它们。
As for how to store them, you generally want:
至于如何储存,你一般希望:
- A tags table associating unique tag names with row IDs, and
- 一个标记表,将惟一的标记名与行id关联起来。
- A tags_items table providing you with your n-to-n mapping between tags and items.
- 一个tags_items表,为您提供标记和项之间的n- n映射。
Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.
一旦你有了它,一旦你在结果页面上找到了一组条目,它就是一个简单的连接,并且是唯一找到“相关”标签的集合。
#3
0
1 Each post id can be tagged with one or more tags (PHP + other tags)
每个post id都可以使用一个或多个标签(PHP +其他标签)进行标记
2 Going back the same way each tag has associated post id
返回的方式与每个标签有关联的post id相同
3 Foreach post id get all tags other than PHP
每个post id获取除PHP之外的所有标记
4 Show only those which has count more than a prticular Number (say 4000)
4 .只显示比实际数字多的数字(如4000)
Think about it this question has been tagged "Mysql" "Database-design" "Tags" and "Tagging" Do you see how you have related PHP with other tags.
想想这个问题被标记为“Mysql”“数据库设计”“标签”和“标签”你看到你是如何把PHP和其他标签联系起来的吗?