如何实施推荐系统？

I've Collective Intelligence book, but I'm not sure how it can be apply in practical.

我有集体智慧书,但我不确定它在实际中是如何应用的。

Let say I have a PHP website with mySQL database. User can insert articles with title and content in the database. For the sake of simplicity, we just compare the title.

假设我有一个带有mySQL数据库的PHP网站。用户可以在数据库中插入带有标题和内容的文章。为简单起见,我们只是比较标题。

How to Make Coffee?

如何煮咖啡?

15 Things About Coffee.

关于咖啡的15件事。

The Big Question.

大问题。

How to Sharpen A Pencil?

如何锐化铅笔?

Guy Getting Hit in Balls

在球中被击中的家伙

We open 'How to Make Coffee?' article and because there are similarity in words with the second and fourth title, they will be displayed in Related Article section.

我们打开'如何制作咖啡?'文章和因为第二和第四标题的词语相似,它们将显示在相关文章部分。

How can I implement this using PHP and mySQL? It's ok if I have to use Python. Thanks in advance.

如何使用PHP和mySQL实现这一点?如果我必须使用Python,那没关系。提前致谢。

3 个解决方案

#1

Store a set of keywords alongside each product, which should essentially be everything in the title besides a set of stop words. When a title is displayed, you find any other products which share keywords in common (with those with one or more in common given priority).

在每个产品旁边存储一组关键字,除了一组停用词之外,它应该基本上是标题中的所有内容。显示标题时,您会发现共享共享关键字的任何其他产品(具有一个或多个共同优先级的产品)。

You could further enhance this by assigning a score to each keyword based on its scarcity (with more scarce words being given a higher score, as a match on 'PHP', for instance, is going to be more relevant than a match on 'programming'), or by tracking the number of times a user navigates manually between a set of products.

您可以通过根据每个关键字的稀缺性为每个关键字指定一个分数来进一步增强这一点(例如,更多稀缺的单词被赋予更高的分数,因为'PHP'的匹配将比'编程的匹配更具相关性'),或跟踪用户在一组产品之间手动导航的次数。

Regardless you'd best start off by making it simple, and then enhance it as you go on. Depending on the size of your database more advanced techniques may not be all that fruitful.

无论你最好从简单开始,然后随着你继续增强它。根据您的数据库的大小,更高级的技术可能不是那么富有成效。

#2

You're best off using a set of tags which are parsed and stored in the db when the title is inserted, and then querying based on that.

你最好使用一组标签,在插入标题时将其解析并存储在数据库中,然后根据它进行查询。

If you have to parse the title though, you'd basically be doing a LIKE query:

如果你必须解析标题,你基本上会做一个LIKE查询:

SELECT * FROM ENTRIES WHERE TITLE LIKE '%<keyword>%';

For a more verbose answer though:

有关更详细的答案:

// You need some test to see if the word is valid. 
// "is" should not be considered a valid match.
// This is a simple one based on length, a 
// "blacklist" would be better, but that's up to you.
function isValidEntry( $word )
{
    return strlen( $word ) >= 4;
}

//to hold all relevant search strings:
$terms = array();
$postTitleWords = explode( ' ' , strtolower( 'How to Make Coffee' ) );

for( $postTitleWords as $index => $word )
{
    if( isValidEntry( $word ) ) $terms[] = $word;
    else
    {
        $bef = @$postTitleWords[ $index - 1 ];
        if( $bef && !isValidEntry( $bef ) ) $terms[] = "$bef $word";
        $aft = @$postTitleWords[ $index + 1 ];
        if( $aft && !isValidEntry( $aft ) ) $terms[] = "$word $aft";
    }
}
$terms = array_unique( $terms );
if( !count( $terms ) ) 
{
    //This is a completely unique title!
}
$search = 'SELECT * FROM ENTRIES WHERE lower( TITLE ) LIKE \'%' . implode( '%\' OR lower( TITLE ) LIKE \'%' $terms ) . '\'%';
// either pump that through your mysql_search or PDO.

#3

This can be simply achieved by using wildcards in SQL queries. If you have larger texts and the wildcard seems to be unable to capture the middle part of text then check if the substring of one matches the other. I hope this helps. BTW, your question title asks about implementing recommendation system and the question description just asks about matching a field among database records. Recommendation system is a broad topic and comes with many interesting algorithms (e.g, Collaborative filtering, content-based method, matrix factorization, neural networks, etc.). Please feel free to explore these advanced topics if your project is to that scale.

这可以通过在SQL查询中使用通配符来实现。如果你有较大的文本,并且通配符似乎无法捕获文本的中间部分,那么检查一个子字符串是否与另一个匹配。我希望这有帮助。顺便说一下,你的问题标题询问实现推荐系统和问题描述只是询问是否匹配数据库记录之间的字段。推荐系统是一个广泛的主题,并附带许多有趣的算法(例如,协作过滤,基于内容的方法,矩阵分解,神经网络等)。如果您的项目符合规模,请随时浏览这些高级主题。

#1