文件名称:基于协同过滤的推荐算法研究.caj
文件大小:10.22MB
文件格式:CAJ
更新时间:2023-07-25 09:44:20
推荐系统 协同过滤; 社交网络 数据稀疏问题 用户相似度模型
【摘要】 W.eb2.0技术将互联网带入了一个崭新的时代,互联网用户在互联网生活中发挥着越来越主动的作用,用户不再只是被动地从互联网上接受信息,而是主动地创造信息,并利用Web2.0平台与其他用户进行交互和分享。随着互联网用户的飞速增长,以用户为中心的信息生产模式造成了互联网信息的爆炸式增长,人们正面临着越来越严重的“信息过载”问题。“信息过载”问题是指,人们无法从海量的信息中快速准确的定位到自己所需要的信息。目前,解决信息过载问题的技术主要分两类,第一类是以搜索引擎为代表的信息检索技术,第二类是以推荐系统为代表的信息过滤技术。两者最重要的区别在于用户通过搜索引擎获取的信息的质量的好坏在很大程度上依赖于用户对于信息求描述的准确程度,而推荐系统不需要用户提供明确的需求,而是从用户的历史行为和数据中出发,建立相关的模型从而挖掘出用户的需求和兴趣,从而以此为依据从海量的信息中为用户筛选出用户感兴趣的信息。由此可见,在用户需求不明确时,推荐系统的作用显得尤为重要。到目前为止,已经有许多推荐算法被提出,协同过滤是这些算法中应用最多且最为有效的推荐算法。虽然协同过滤算法已经被成功地应用到许多商业推荐系统中,但是仍然存在着诸如数据稀疏问题、冷启动问题等亟待解决。随着互联网的飞速发展,以微博为代表的各种社交媒体纷纷涌现,以用户为中心的社交网站产生了海量的和用户兴趣相关的数据,如何有效的利用这些数据来改进推荐算法的性能已经成为一个重要的研究领域。针对以上关键问题,本文展开了如下几个方面的研究。第一,协同过滤中相似度模型的研究。用户(项目)相似度计算是基于内存的协同过滤算法中最为关键的问题,正负标注信息不对称和数据稀疏性导致了传统的相似度模型不准确从而影响推荐精度。本文针对这两个问题,提出了基于变权重和罚函数的用户相似度模型。实验结果表明,本文提出的算法能够有效缓解上述两个问题,从而提高推荐精度。第二,融合社交网络信息的协同过滤算法研究。丰富的社交网络信息给推荐系统带来的新的机遇也提出了更大的挑战,如何有效地挖掘海量的社交网络信息以提高推荐算法的精度是社交网络推荐系统研究的核心问题。本文基于腾讯微博用户的真实社交网络信息,构建有效的用户相似度模型,并将该相似度模型与基于评价矩阵信息的用户相似度模型相结合,提出了融合社交网络信息的协同过滤算法。实验结果表明,通过融合社交网络信息,数据稀疏问题得到了明显缓解且推荐精度显著提高。第三,基于用户与基于项目的融合协同过滤算法的研究。根据不同的假设,协同算法可以分为基于用户的方法与基于项目的方法。本文研究了两种方法在推荐性能与效果上的本质差别,并在此基础上针对两种方法的优缺点进行模型融合,提出了融合基于用户和基于项目的融合协同过滤算法。实验结果表明,基于用户的方法更擅长于热门推荐而基于项目的方法更擅长于长尾推荐,本文提出的模型融合算法能有效的缓解数据稀疏问题并提高算法精度。第四,协同过滤算法中的全局模型融合与局部模型融合研究。目前存在着许多有效的协同过滤算法(例如基于内存的方法与基于模型的方法、基于用户的方法与基于项目的方法),不同的算都具有各自的优势和缺陷。本文提出了不同的方法对于不同的用户(项目)的适用程度不一致的观点。基于上述观点,本文通过机器学习的方法,自动发现用户(项目)对于各种方法的适应程度,并进行局部模型融合。实验结果表明,局部融合模型比全局融合模型具有更高的推荐精度。 还原 【Abstract】 The fast development of Web2.0technology sparked a new revolution of the in-ternet. Users now play a new role in the world of internet, they take the initiative to generate information instead of simply getting information from the web. As the rapid growth of the users’population, the user-centric information generation mode leads to the exponential growth of the available information in internet, which cause the infor-mation overload problem. The information overload problem refers that people can not quickly and accurately locate the information they need. Currently, the technology to solve information overload problem can be classified into to two categories. The first technology is information retrieval represented by the search engine and the second is information filtering represented by recommender systems. The most important differ-ence between these two technologies is that search engines need queries formatted by the user and recommender systems need no queries. Thus the quality of the results of search engines depend on how users describe their information needs. Recommender systems however, filter out the information that the user is interested in by exploiting users’profile data and historical activities(watching,listening,buying etc.). So, recom-mender systems can play an very important role in the situation that uses’can not tell their information need precisely.Many recommendation algorithms have been proposed by both academia and in-dustry, collaborative filtering is one of the most effective recommendation algorithms. Collaborative filtering algorithm has been successfully applied to many commercial recommender system, but there are still issues such as the data sparsity problem and the cold start problem to be solved. With the rapid rise of social media, user-centric social networking web sites generate vast amounts of data which may reflects users’interests, how to leverage these data to improve the performance of the recommendation algorith-m has become a very hot research area. In view of the above key issues, this dissertation launched a study of the following aspects.First, research on the similarity model of collaborative filtering. User/item simi-larity calculation is the most critical issue in the memory-based collaborative filtering algorithms, sparsity of the rating matrix and unbalance of negative and positive ratings causes inaccurate similarity computation, thus limit the recommendation quality. In this dissertation, we introduce a weighting scheme and a penalty function to address the above issue. Experiment results show that improved similarity model can significantly improve the recommendation accuracy.Second, Integrating social information into collaborative filtering. The rich social information brings great opportunities for recommendation system. How to effectively leverage the abundant social network information to improve the accuracy of recom-mendation systems is the core issue of the research on social recommendation systems. In this dissertation, we build an user similarity model based on Tencent micro-blogging users’ real social network information, and effectively combine the social information based similarity model and the rating information based similarity model. Experiment results show that the proposed approach can effectively ease the data sparsity problem and improve the recommendation quality.Third, combining user-based and item-based collaborative algorithms using stacked regression. Collaborative filtering algorithms can be classified in