I am developing a PHP-based system where users can create, post, and view pieces of content using a MySQL database, each piece of content being stored in a table row. When a user posts content, a PHP script extracts common words or tags (removing any stop words like ‘and’ and ‘or’), orders them by occurrence, and stores them all as an array within the database.
我正在开发一个基于PHP的系统,用户可以使用MySQL数据库创建,发布和查看内容,每个内容都存储在一个表行中。当用户发布内容时,PHP脚本会提取常用单词或标签(删除任何停用词,如'和'和'或'),按出现顺序排序,并将它们全部存储为数据库中的数组。
As well as viewing each piece of content and the tags generated, I would like a feature that displays a list of similar pieces of content posted which have one or more tags that the content being displayed has (similar to YouTube's related videos or related stories on news websites). Furthermore, I would like the list to be ordered based on how many of those tags each piece of content has.
除了查看每个内容和生成的标签之外,我还想要一个功能,显示已发布的相似内容列表,其中包含一个或多个显示内容的标签(类似于YouTube的相关视频或相关故事)新闻网站)。此外,我希望根据每个内容中有多少个标签来排序列表。
I have done some research and I have developed two different scripts that can select rows from within the database based on whether any tags are present or not. However, both scripts have problems;
我做了一些研究,我开发了两个不同的脚本,可以根据是否存在任何标签从数据库中选择行。但是,这两个脚本都有问题;
The first I tried was a LIKE query;
我尝试的第一个是LIKE查询;
$tags=$row['tags'];
$tags2=explode(",",$tags);
foreach ($tags2 as $key => $keyword) {
$sql = ("SELECT DISTINCT * FROM table WHERE tags LIKE '%$keyword%' ");
if ($key != (sizeof($tags2)-1)){
$sql .= " OR ";
}
$sql .= " LIMIT 20";
}
The problem with this query is that it does not order the results. I then tried a MATCH AGAINST query;
此查询的问题在于它不对结果进行排序。然后我尝试了一个MATCH AGAINST查询;
$tags=$row['tags'];
$tags2=explode(",",$tags);
$searchstring="";
foreach ($tags2 as $word){
$searchstring = $searchstring .' +'.$word;
}
$sql = ("SELECT * FROM table WHERE MATCH (tags) AGAINST ('$searchstring' IN BOOLEAN MODE)");
While the results are ordered by relevance, it only retrieves a row in which all tags are present, and if that row lacks even a single tag, it is not retrieved by the query.
虽然结果按相关性排序,但它只检索存在所有标记的行,如果该行甚至缺少单个标记,则查询不会检索它。
What I want is to combine the best of both features, selects rows which contain one or more tags, and then order them based on how many tags are present. For example; if row1 contains 10 tags, row2 has 20 tags with 9 being found in row1, and row3 having 50 tags with 8 being found in row1, then both row2 and row3 will be retrieved, with row2 being more relevant than row3.
我想要的是结合两个功能中的最佳功能,选择包含一个或多个标签的行,然后根据存在多少个标签对它们进行排序。例如;如果row1包含10个标签,row2有20个标签,其中9个在row1中找到,row3有50个标签,其中8个在row1中找到,则row2和row3都将被检索,其中row2比row3更相关。
Any help would be appreciated
任何帮助,将不胜感激
1 个解决方案
#1
2
The inclusion of the +
character in your $searchstring
is what forces all tags to be present. If you put in just the words and omit the +
, the engine will rank its results without requiring each word to be present.
在$ searchstring中包含+字符是强制所有标记出现的原因。如果您只输入单词并省略+,则引擎会对其结果进行排名,而不要求每个单词都存在。
Take a look at the docs for fulltext searching in MySQL.
看一下MySQL中全文搜索的文档。
You have many options with each word. +
will force the word to be found somewhere in the result, -
will force the word to not be found anywhere in the result, ~
will allow a word to be found but lower the result's ranking if it is found. Read the docs, they're very useful.
每个单词都有很多选项。 +将强制在结果的某处找到单词, - 将强制在结果中的任何位置找不到单词,〜将允许找到单词,但如果找到则降低结果的排名。阅读文档,它们非常有用。
#1
2
The inclusion of the +
character in your $searchstring
is what forces all tags to be present. If you put in just the words and omit the +
, the engine will rank its results without requiring each word to be present.
在$ searchstring中包含+字符是强制所有标记出现的原因。如果您只输入单词并省略+,则引擎会对其结果进行排名,而不要求每个单词都存在。
Take a look at the docs for fulltext searching in MySQL.
看一下MySQL中全文搜索的文档。
You have many options with each word. +
will force the word to be found somewhere in the result, -
will force the word to not be found anywhere in the result, ~
will allow a word to be found but lower the result's ranking if it is found. Read the docs, they're very useful.
每个单词都有很多选项。 +将强制在结果的某处找到单词, - 将强制在结果中的任何位置找不到单词,〜将允许找到单词,但如果找到则降低结果的排名。阅读文档,它们非常有用。