近年Recsys论文

时间:2024-07-28 16:05:56

2015年~2017年SIGIR,SIGKDD,ICML三大会议的Recsys论文:

【转载请注明出处:https://www.cnblogs.com/shenxiaolin/p/8321722.html】

SIGIR-2015

【Title】WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix
Approximation

【Abstract】Matrix approximation is one of the most effective
methods for collaborative filtering-based recommender systems. However, the
high computation complexity of matrix factorization on large datasets limits
its scalability. Prior solutions have adopted co-clustering methods to
partition a large matrix into a set of smaller submatrices, which can then be
processed in parallel to improve scalability. The drawback is that the
recommendation accuracy is lower as the submatrices only contain subsets of the
user-item rating information. This paper presents WEMAREC, a weighted and
ensemble matrix approximation method for accurate and scalable recommendation.
It builds upon the intuition that (sub)matrices containing more frequent samples
of certain user/item/rating tend to make more reliable rating predictions for
these specific user/item/rating. WEMAREC consists of two important components:
(1) a weighting strategy that is computed based on the rating distribution in
each submatrix and applied to approximate a single matrix containing those
submatrices; and (2) an ensemble strategy that leverages user-specific and
item-specific rating distributions to combine the approximation matrices of
multiple sets of co-clustering results. Evaluations using real-world datasets
demonstrate that WEMAREC outperforms state-of-the-art matrix approximation
methods in recommendation accuracy (0.5?11.9% on the MovieLens dataset and
2.2--13.1% on the Netflix dataset) with 3--10X improvement on scalability.

【Title】Effective
Latent Models for Binary Feedback in Recommender Systems

【Abstract】In many collaborative filtering (CF)
applications, latent approaches are the preferred model choice due to their
ability to generate real-time recommendations efficiently. However, the
majority of existing latent models are not designed for implicit binary
feedback (views, clicks, plays etc.) and perform poorly on data of this type.
Developing accurate models from implicit feedback is becoming increasingly
important in CF since implicit feedback can often be collected at lower cost
and in much larger quantities than explicit preferences. The need for accurate
latent models for implicit data was further emphasized by the recently
conducted Million Song Dataset Challenge organized by Kaggle. In this
challenge, the results for the best latent model were orders of magnitude worse
than neighbor-based approaches, and all the top performing teams exclusively
used neighbor-based models. We address this problem and propose a new latent
approach for binary feedback in CF. In our model, neighborhood similarity
information is used to guide latent factorization and derive accurate latent
representations. We show that even with simple factorization methods like SVD,
our approach outperforms existing models and produces state-of-the-art results.

【Title】Personalized Recommendation via Parameter-Free
Contextual Bandits

【Abstract】Personalized recommendation services have gained
increasing popularity and attention in recent years as most useful information
can be accessed online in real-time. Most online recommender systems try to
address the information needs of users by virtue of both user and content
information. Despite extensive recent advances, the problem of personalized
recommendation remains challenging for at least two reasons. First, the user
and item repositories undergo frequent changes, which makes traditional recommendation
algorithms ineffective. Second, the so-called cold-start problem is difficult
to address, as the information for learning a recommendation model is limited
for new items or new users. Both challenges are formed by the dilemma of
exploration and exploitation. In this paper, we formulate personalized
recommendation as a contextual bandit problem to solve the
exploration/exploitation dilemma. Specifically in our work, we propose a
parameter-free bandit strategy, which employs a principled resampling approach
called online bootstrap, to derive the distribution of estimated models in an
online manner. Under the paradigm of probability matching, the proposed
algorithm randomly samples a model from the derived distribution for every
recommendation. Extensive empirical experiments on two real-world collections
of web data (including online advertising and news recommendation) demonstrate
the effectiveness of the proposed algorithm in terms of the click-through rate.
The experimental results also show that this proposed algorithm is robust in
the cold-start situation, in which there is no sufficient data or knowledge to
tune the hyper-parameters.

一篇有关于预测用户的注意力模型

【Title】Inferring Searcher Attention by Jointly Modeling
User Interactions and Content Salience

【Abstract】Modeling and predicting user attention is crucial
for interpreting search behavior. The numerous applications include quantifying
web search satisfaction, estimating search quality, and measuring and
predicting online user engagement. While prior research has demonstrated the
value of mouse cursor data and other interactions as a rough proxy of user
attention, precisely predicting where a user is looking on a page remains a
challenge, exacerbated in Web pages beyond the traditional search results. To
improve attention prediction on a wider variety of Web pages, we propose a new
way of modeling searcher behavior data by connecting the user interactions to
the underlying Web page content. Specifically, we propose a principled model
for predicting a searcher's gaze position on a page, that we call Mixture of
Interactions and Content Salience (MICS). To our knowledge, our model is the
first to effectively combine user interaction data, such as mouse cursor and
scrolling positions, with the visual prominence, or salience, of the page
content elements. Extensive experiments on multiple popular types of Web
content demonstrate that the proposed MICS model significantly outperforms
previous approaches to searcher gaze prediction that use only the interaction
information. Grounding the observed interactions to the underlying page content
provides a general and robust approach to user attention modeling, enabling
more powerful tool for search behavior interpretation and ultimately search
quality improvements.

SIGIR-2016

【Title】Query to
Knowledge: Unsupervised Entity Extraction from Shopping Queries using Adaptor
Grammars

【Abstract】Web search queries provide a surprisingly large
amount of information, which can be potentially organized and converted into a
knowledgebase. In this paper, we focus on the problem of automatically
identifying brand and product entities from a large collection of web queries
in online shopping domain. We propose an unsupervised approach based on adaptor
grammars that does not require any human annotation efforts nor rely on any
external resources. To reduce the noise and normalize the query patterns, we
introduce a query standardization step, which groups multiple search patterns
and word orderings together into their most frequent ones. We present three
different sets of grammar rules used to infer query structures and extract
brand and product entities. To give an objective assessment of the performance
of our approach, we conduct experiments on a large collection of online
shopping queries and intrinsically evaluate the knowledgebase generated by our
method qualitatively and quantitatively. In addition, we also evaluate our
framework on extrinsic tasks on query tagging and chunking. Our empirical
studies show that the knowledgebase discovered by our approach is highly
accurate, has good coverage and significantly improves the performance on the
external tasks.

【Title】Learning to Rank Features for Recommendation over
Multiple Categories

【Abstract】Incorporating phrase-level sentiment analysis on
users' textual reviews for recommendation has became a popular method due to
its explainable property for latent features and high prediction accuracy.
However, the inherent limitations of the existing model make it difficult to
(1) effectively distinguish the features that are most interesting to users,
(2) maintain the recommendation performance especially when the set of items is
scaled up to multiple categories, and (3) model users' implicit feedbacks on
the product features. In this paper, motivated by these shortcomings, we first
introduce a tensor matrix factorization algorithm to Learn to Rank user
Preferences based on Phrase-level sentiment analysis across Multiple categories
(LRPPM for short), and then by combining this technique with Collaborative
Filtering (CF) method, we propose a novel model called LRPPM-CF to boost the
performance of recommendation. Thorough experiments on two real-world datasets
demonstrate that our proposed model is able to improve the performance in the
tasks of capturing users' interested features and item recommendation by about
17%-24% and 7%-13%, respectively, as compared with several state-of-the-art
methods.

【Title】How Much
Novelty is Relevant?: It Depends on Your Curiosity

【Abstract】Traditional recommendation systems (RS's) aim to
recommend items that are relevant to the user's interest. Unfortunately, the
recommended items will soon become too familiar to the user and hence fail to
arouse her interest. Discovery-oriented recommendation systems (DORS's)
complement accuracy with "discover utilities" (DU's) such as novelty
and diversity and optimize the tradeoff between the DU's and accuracy of the
recommendations. Unfortunately, DORS's ignore an important fact that different
users have different appetites for DU's. That is, highly curious users can
accept highly novel and diversified recommendations whereas conservative users
would behave in the opposite manner. In this paper, we propose a
curiosity-based recommendation system (CBRS) framework which generates
recommendations with a personalized amount of DU's to fit the user's curiosity
level. The major contribution of this paper is a computational model of user
curiosity, called Probabilistic Curiosity Model (PCM), which is based on the curiosity
arousal theory and Wundt curve in psychology research. In PCM, we model a
user's curiosity with a curiosity distribution function learnt from the user's
access history and compute a curiousness score for each item representing how
curious the user is about the item. CBRS then selects items which are both
relevant and have high curiousness score, bounded by the constraint that the
amount of DU's fits the user's DU appetite. We use joint optimization and
co-factorization approaches to incorporate the curiosity signal into the
recommendations. Extensive experiments have been performed to evaluate the
performance of CBRS against the baselines using a music dataset from last.fm.
The results show that compared to the baselines CBRS not only provides more
personalized recommendations that adapt to the user's curiosity level but also
improves the recommendation accuracy.

【Title】Discrete Collaborative Filtering

【Abstract】We address
the efficiency problem of Collaborative Filtering (CF) by hashing users and
items as latent vectors in the form of binary codes, so that user-item affinity
can be efficiently calculated in a Hamming space. However, existing hashing
methods for CF employ binary code learning procedures that most suffer from the
challenging discrete constraints. Hence, those methods generally adopt a
two-stage learning scheme composed of relaxed optimization via discarding the
discrete constraints, followed by binary quantization. We argue that such a
scheme will result in a large quantization loss, which especially compromises
the performance of large-scale CF that resorts to longer binary codes. In this
paper, we propose a principled CF hashing framework called Discrete
Collaborative Filtering (DCF), which directly tackles the challenging discrete
optimization that should have been treated adequately in hashing. The
formulation of DCF has two advantages: 1) the Hamming similarity induced loss
that preserves the intrinsic user-item similarity, and 2) the balanced and
uncorrelated code constraints that yield compact yet informative binary codes.
We devise a computationally efficient algorithm with a rigorous convergence
proof of DCF. Through extensive experiments on several real-world benchmarks,
we show that DCF consistently outperforms state-of-the-art CF hashing
techniques, e.g, though using only 8 bits, DCF is even significantly better
than other methods using 128 bits.

【Title】Contextual Bandits in a Collaborative Environment

【Abstract】Contextual bandit algorithms provide principled
online learning solutions to find optimal trade-offs between exploration and
exploitation with companion side-information. They have been extensively used
in many important practical scenarios, such as display advertising and content
recommendation. A common practice estimates the unknown bandit parameters pertaining
to each user independently. This unfortunately ignores dependency among users
and thus leads to suboptimal solutions, especially for the applications that
have strong social components.

In this
paper, we develop a collaborative contextual bandit algorithm, in which the
adjacency graph among users is leveraged to share context and payoffs among
neighboring users while online updating. We rigorously prove an improved upper
regret bound of the proposed collaborative bandit algorithm comparing to conventional
independent bandit algorithms. Extensive experiments on both synthetic and
three large-scale real-world datasets verified the improvement of our proposed
algorithm against several state-of-the-art contextual bandit algorithms.

【Title】Collaborative
Filtering Bandits

【Abstract】Classical collaborative filtering, and
content-based filtering methods try to learn a static recommendation model
given training data. These approaches are far from ideal in highly dynamic
recommendation domains such as news recommendation and computational
advertisement, where the set of items and users is very fluid. In this work, we
investigate an adaptive clustering technique for content recommendation based
on exploration-exploitation strategies in contextual multi-armed bandit
settings. Our algorithm takes into account the collaborative effects that arise
due to the interaction of the users with the items, by dynamically grouping
users based on the items under consideration and, at the same time, grouping
items based on the similarity of the clusterings induced over the users. The
resulting algorithm thus takes advantage of preference patterns in the data in
a way akin to collaborative filtering methods. We provide an empirical analysis
on medium-size real-world datasets, showing scalability and increased
prediction performance (as measured by click-through rate) over
state-of-the-art methods for clustering bandits. We also provide a regret
analysis within a standard linear stochastic noise setting.

【Title】Fast Matrix Factorization for Online
Recommendation with Implicit Feedback

【Abstract】This paper contributes improvements on both the
effectiveness and efficiency of Matrix Factorization (MF) methods for implicit
feedback. We highlight two critical issues of existing works. First, due to the
large space of unobserved feedback, most existing works resort to assign a
uniform weight to the missing data to reduce computational complexity. However,
such a uniform assumption is invalid in real-world settings. Second, most
methods are also designed in an offline setting and fail to keep up with the
dynamic nature of online data. We address the above two issues in learning MF
models from implicit feedback. We first propose to weight the missing data
based on item popularity, which is more effective and flexible than the
uniform-weight assumption. However, such a non-uniform weighting poses
efficiency challenge in learning the model. To address this, we specifically
design a new learning algorithm based on the element-wise Alternating Least
Squares (eALS) technique, for efficiently optimizing a MF model with
variably-weighted missing data. We exploit this efficiency to then seamlessly
devise an incremental update strategy that instantly refreshes a MF model given
new feedback. Through comprehensive experiments on two public datasets in both
offline and online protocols, we show that our implemented, open-source
(https://github.com/hexiangnan/sigir16-eals) eALS consistently outperforms
state-of-the-art implicit MF methods.

Short Research
Paper

【Title】A Dynamic Recurrent Model for Next Basket
Recommendation

【Abstract】Next basket
recommendation becomes an increasing concern. Most conventional models explore
either sequential transaction features or general interests of users. Further,
some works treat users' general interests and sequential behaviors as two
totally divided matters, and then combine them in some way for next basket
recommendation. Moreover, the state-of-the-art models are based on the
assumption of Markov Chains (MC), which only capture local sequential features
between two adjacent baskets. In this work, we propose a novel model, Dynamic
REcurrent bAsket Model (DREAM), based on Recurrent Neural Network (RNN). DREAM
not only learns a dynamic representation of a user but also captures global
sequential features among baskets. The dynamic representation of a specific
user can reveal user's dynamic interests at different time, and the global
sequential features reflect interactions of all baskets of the user over time.
Experiment results on two public datasets indicate that DREAM is more effective
than the state-of-the-art models for next basket recommendation.

Short
Research Paper

【Title】Collaborative
Ranking with Social Relationships for Top-N Recommendations

【Abstract】Recommendation
systems have gained a lot of attention because of their importance for handling
the unprecedentedly large amount of available content on the Web, such as
movies, music, books, etc. Although Collaborative Ranking (CR) models can produce
accurate recommendation lists, in practice several real-world problems decrease
their ranking performance, such as the sparsity and cold start problems. Here,
to account for the fact that the selections of social friends can leverage the
recommendation accuracy, we propose SCR, a Social CR model. Our model learns
personalized ranking functions collaboratively, using the notion of Social
Reverse Height, that is, considering how well the relevant items of users and
their social friends have been ranked at the top of the list. The reason that
we focus on the top of the list is that users mainly see the top-N
recommendations, and not the whole ranked list. In our experiments with a
benchmark data set from Epinions, we show that our SCR model performs better than
state-of-the-art CR models that either consider social relationships or focus
on the ranking performance at the top of the list.

SIGIR-2017

【Title】Multi-site User Behavior Modeling and Its Application in
Video Recommendation

【Abstract】As online
video service continues to grow in popularity, video content providers compete
hard for more eyeball engagement. Some users visit multiple video sites to
enjoy videos of their interest while some visit exclusively one site. However,
due to the isolation of data, mining and exploiting user behaviors in multiple
video websites remain unexplored so far. In this work, we try to model user
preferences in six popular video websites with user viewing records obtained
from a large ISP in China. The empirical study shows that users exhibit both
consistent cross-site interests as well as site-specific interests. To
represent this dichotomous pattern of user preferences, we propose a generative
model of Multi-site Probabilistic Factorization (MPF) to capture both the
cross-site as well as site-specific preferences. Besides, we discuss the design
principle of our model by analyzing the sources of the observed site-specific
user preferences, namely, site peculiarity and data sparsity. Through
conducting extensive recommendation validation, we show that our MPF model
achieves the best results compared to several other state-of-the-art
factorization models with significant improvements of F-measure by 12.96%,
8.24% and 6.88%, respectively. Our findings provide insights on the value of
integrating user data from multiple sites, which stimulates collaboration
between video service providers.

【Title】Item Silk Road: Recommending Items from Information
Domains to Social Users

【Abstract】Online platforms can be divided into
information-oriented and social-oriented domains. The former refers to forums
or E-commerce sites that emphasize user-item interactions, like Trip.com and
Amazon; whereas the latter refers to social networking services (SNSs) that
have rich user-user connections, such as Facebook and Twitter. Despite their
heterogeneity, these two domains can be bridged by a few overlapping users,
dubbed as bridge users. In this work, we address the problem of cross-domain
social recommendation, i.e., recommending relevant items of information domains
to potential users of social networks. To our knowledge, this is a new problem
that has rarely been studied before.

Existing
cross-domain recommender systems are unsuitable for this task since they have
either focused on homogeneous information domains or assumed that users are
fully overlapped. Towards this end, we present a novel Neural Social
Collaborative Ranking (NSCR) approach, which seamlessly sews up the user-item
interactions in information domains and user-user connections in SNSs. In the
information domain part, the attributes of users and items are leveraged to
strengthen the embedding learning of users and items. In the SNS part, the
embeddings of bridge users are propagated to learn the embeddings of other
non-bridge users. Extensive experiments on two real-world datasets demonstrate
the effectiveness and rationality of our NSCR method.

【Title】Cross-Domain
Recommendation via Clustering on Multi-Layer Graphs

【Abstract】Venue category recommendation is an essential
application for the tourism and advertisement industries, wherein it may
suggest attractive localities within close proximity to users' current
location. Considering that many adults use more than three social networks
simultaneously, it is reasonable to leverage on this rapidly growing
multi-source social media data to boost venue recommendation performance.
Another approach to achieve higher recommendation results is to utilize group
knowledge, which is able to diversify recommendation output. Taking into
account these two aspects, we introduce a novel cross-network collaborative
recommendation framework C3R, which utilizes both individual and group
knowledge, while being trained on data from multiple social media sources.
Group knowledge is derived based on new cross-source user community detection
approach, which utilizes both inter-source relationship and the ability of
sources to complement each other. To fully utilize multi-source multi-view data,
we process user-generated content by employing state-of-the-art text, image,
and location processing techniques. Our experimental results demonstrate the
superiority of our multi-source framework over state-of-the-art baselines and
different data source combinations. In addition, we suggest a new approach for
automatic construction of inter-network relationship graph based on the data,
which eliminates the necessity of having pre-defined domain knowledge.

【Title】Optimizing Trade-offs Among Stakeholders in
Real-Time Bidding by Incorporating Multimedia Metrics

【Abstract】Displaying banner advertisements (in short, ads)
on webpages has usually been discussed as an Internet economics topic where a
publisher uses auction models to sell an online user's page view to advertisers
and the one with the highest bid can have her ad displayed to the user. This is
also called real-time bidding (RTB) and the ad displaying process ensures that
the publisher's benefit is maximized or there is an equilibrium in ad auctions.
However, the benefits of the other two stakeholders - the advertiser and the
user - have been rarely discussed. In this paper, we propose a two-stage
computational framework that selects a banner ad based on the optimized
trade-offs among all stakeholders. The first stage is still auction based and
the second stage re-ranks ads by considering the benefits of all stakeholders.
Our metric variables are: the publisher's revenue, the advertiser's utility,
the ad memorability, the ad click-through rate (CTR), the contextual relevance,
and the visual saliency. To the best of our knowledge, this is the first work
that optimizes trade-offs among all stakeholders in RTB by incorporating
multimedia metrics. An algorithm is also proposed to determine the optimal
weights of the metric variables. We use both ad auction datasets and multimedia
datasets to validate the proposed framework. Our experimental results show that
the publisher can significantly improve the other stakeholders' benefits by
slightly reducing her revenue in the short-term. In the long run, advertisers
and users will be more engaged, the increased demand of advertising and the
increased supply of page views can then boost the publisher's revenue.

【Title】A
Probabilistic Reformulation of Memory-Based Collaborative Filtering:
Implications on Popularity Biases

【Abstract】We develop a probabilistic formulation giving
rise to a formal version of heuristic k nearest-neighbor (kNN) collaborative
filtering. Different independence assumptions in our scheme lead to user-based,
item-based, normalized and non-normalized variants that match in structure the
traditional formulations, while showing equivalent empirical effectiveness. The
probabilistic formulation provides a principled explanation why kNN is an
effective recommendation strategy, and identifies a key condition for this to
be the case. Moreover, a natural explanation arises for the bias of kNN towards
recommending popular items. Thereupon the kNN variants are shown to fall into
two groups with similar trends in behavior, corresponding to two different
notions of item popularity. We show experiments where the comparative
performance of the two groups of algorithms changes substantially, which
suggests that the performance measurements and comparison may heavily depend on
statistical properties of the input data sample.

【Title】Personalized
Key Frame Recommendation

【Abstract】Key frames are playing a very important role for
many video applications, such as on-line movie preview and video information
retrieval. Although a number of key frame selection methods have been proposed
in the past, existing technologies mainly focus on how to precisely summarize
the video content, but seldom take the user preferences into consideration.
However, in real scenarios, people may cast diverse interests on the contents
even for the same video, and thus they may be attracted by quite different key
frames, which makes the selection of key frames an inherently personalized
process. In this paper, we propose and investigate the problem of personalized
key frame recommendation to bridge the above gap. To do so, we make use of
video images and user time-synchronized comments to design a novel key frame
recommender that can simultaneously model visual and textual features in a
unified framework. By user personalization based on her/his previously reviewed
frames and posted comments, we are able to encode different user interests in a
unified multi-modal space, and can thus select key frames in a personalized manner,
which, to the best of our knowledge, is the first time in the research field of
video content analysis. Experimental results show that our method performs
better than its competitors on various measures.

【Title】Personalized
Itinerary Recommendation with Queuing Time Awareness

【Abstract】Personalized itinerary recommendation is a
complex and time-consuming problem, due to the need to recommend popular
attractions that are aligned to the interest preferences of a tourist, and to
plan these attraction visits as an itinerary that has to be completed within a
specific time limit. Furthermore, many existing itinerary recommendation
systems do not automatically determine and consider queuing times at
attractions in the recommended itinerary, which varies based on the time of
visit to the attraction, e.g., longer queuing times at peak hours. To solve
these challenges, we propose the PersQ algorithm for recommending personalized
itineraries that take into consideration attraction popularity, user interests
and queuing times. We also implement a framework that utilizes geo-tagged
photos to derive attraction popularity, user interests and queuing times, which
PersQ uses to recommend personalized and queue-aware itineraries. We
demonstrate the effectiveness of PersQ in the context of five major theme parks,
based on a Flickr dataset spanning nine years. Experimental results show that
PersQ outperforms various state-of-the-art baselines, in terms of various
queuing-time related metrics, itinerary popularity, user interest alignment,
recall, precision and F1-score.

【Title】Attentive Collaborative Filtering: Multimedia
Recommendation with Item- and Component-Level Attention

【Abstract】Multimedia content is dominating today's Web
information. The nature of multimedia user-item interactions is 1/0 binary
implicit feedback (e.g., photo likes, video views, song downloads, etc.), which
can be collected at a larger scale with a much lower cost than explicit
feedback (e.g., product ratings). However, the majority of existing
collaborative filtering (CF) systems are not well-designed for multimedia
recommendation, since they ignore the implicitness in users' interactions with
multimedia content. We argue that, in multimedia recommendation, there exists
item- and component-level implicitness which blurs the underlying users'
preferences. The item-level implicitness means that users' preferences on items
(e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness
means that inside each item users' preferences on different components (e.g.
regions in an image, frames of a video, etc.) are unknown. For example, a
'view'' on a video does not provide any specific information about how the user
likes the video (i.e.item-level) and which parts of the video the user is
interested in (i.e.component-level). In this paper, we introduce a novel
attention mechanism in CF to address the challenging item- and component-level
implicit feedback in multimedia recommendation, dubbed Attentive Collaborative
Filtering (ACF). Specifically, our attention model is a neural network that
consists of two attention modules: the component-level attention module,
starting from any content feature extraction network (e.g. CNN for images/videos),
which learns to select informative components of multimedia items, and the
item-level attention module, which learns to score the item preferences. ACF
can be seamlessly incorporated into classic CF models with implicit feedback,
such as BPR and SVD++, and efficiently trained using SGD. Through extensive
experiments on two real-world multimedia Web services: Vine and Pinterest, we
show that ACF significantly outperforms state-of-the-art CF methods.

【Title】Neural Rating Regression
with Abstractive Tips Generation for Recommendation

【Abstract】Recently, some E-commerce sites launch a new
interaction box called Tips on their mobile apps. Users can express their
experience and feelings or provide suggestions using short texts typically
several words or one sentence. In essence, writing some tips and giving a
numerical rating are two facets of a user's product assessment action,
expressing the user experience and feelings. Jointly modeling these two facets
is helpful for designing a better recommendation system. While some existing
models integrate text information such as item specifications or user reviews
into user and item latent factors for improving the rating prediction, no
existing works consider tips for improving recommendation quality. We propose a
deep learning based framework named NRT which can simultaneously predict
precise ratings and generate abstractive tips with good linguistic quality
simulating user experience and feelings. For abstractive tips generation, gated
recurrent neural networks are employed to "translate'' user and item
latent representations into a concise sentence. Extensive experiments on
benchmark datasets from different domains show that NRT achieves significant
improvements over the state-of-the-art methods. Moreover, the generated tips
can vividly predict the user experience and feelings.

【Title】Neural Factorization Machines for Sparse
Predictive Analytics

【Abstract】Many predictive tasks of web applications need to
model categorical variables, such as user IDs and demographics like genders and
occupations. To apply standard machine learning techniques, these categorical
predictors are always converted to a set of binary features via one-hot
encoding, making the resultant feature vector highly sparse. To learn from such
sparse data effectively, it is crucial to account for the interactions between
features.

Factorization
Machines (FMs) are a popular solution for efficiently using the second-order
feature interactions. However, FM models feature interactions in a linear way,
which can be insufficient for capturing the non-linear and complex inherent
structure of real-world data. While deep neural networks have recently been
applied to learn non-linear feature interactions in industry, such as the
Wide&Deep by Google and DeepCross by Microsoft, the deep structure
meanwhile makes them difficult to train.

In this
paper, we propose a novel model Neural Factorization Machine (NFM) for
prediction under sparse settings. NFM seamlessly combines the linearity of FM
in modelling second-order feature interactions and the non-linearity of neural
network in modelling higher-order feature interactions. Conceptually, NFM is
more expressive than FM since FM can be seen as a special case of NFM without
hidden layers. Empirical results on two regression tasks show that with one
hidden layer only, NFM significantly outperforms FM with a 7.3% relative
improvement. Compared to the recent deep learning methods Wide&Deep and
DeepCross, our NFM uses a shallower structure but offers better performance,
being much easier to train and tune in practice.

【Title】Content Recommendation for Viral Social Influence

【Abstract】How do we create content that will become viral
in a whole network after we share it with friends or followers' Significant
research activity has been dedicated to the problem of strategically selecting
a seed set of initial adopters so as to maximize a meme's spread in a network.
This line of work assumes that the success of such a campaign depends solely on
the choice of a tunable seed set of adopters, while the way users perceive the
propagated meme is fixed. Yet, in many real-world settings, the opposite holds:
a meme's propagation depends on users' perceptions of its tunable
characteristics, while the set of initiators is fixed.

In this
paper, we address the natural problem that arises in such circumstances:
Suggest content, expressed as a limited set of attributes, for a creative
promotion campaign that starts out from a given seed set of initiators, so as
to maximize its expected spread over a social network. To our knowledge, no
previous work addresses this problem. We find that the problem is NP-hard and
inapproximable. As a tight approximation guarantee is not admissible, we design
an efficient heuristic, Explore-Update, as well as a conventional Greedy
solution. Our experimental evaluation demonstrates that Explore-Update selects
near-optimal attribute sets with real data, achieves 30% higher spread than
baselines, and runs an order of magnitude faster than the Greedy solution.

【Title】Exploiting
Food Choice Biases for Healthier Recipe Recommendation

【Abstract】By incorporating healthiness into the food
recommendation / ranking process we have the potential to improve the eating
habits of a growing number of people who use the Internet as a source of food
inspiration. In this paper, using insights gained from various data sources, we
explore the feasibility of substituting meals that would typically be
recommended to users with similar, healthier dishes. First, by analysing a
recipe collection sourced from Allrecipes.com, we quantify the potential for
finding replacement recipes, which are comparable but have different
nutritional characteristics and are nevertheless highly rated by users.
Building on this, we present two controlled user studies (n=107, n=111)
investigating how people perceive and select recipes. We show participants are
unable to reliably identify which recipe contains most fat due to their answers
being biased by lack of information, misleading cues and limited nutritional
knowledge on their part. By applying machine learning techniques to predict the
preferred recipes, good performance can be achieved using low-level image
features and recipe meta-data as predictors. Despite not being able to
consciously determine which of two recipes contains most fat, on average,
participants select the recipe with the most fat as their preference. The
importance of image features reveals that recipe choices are often visually
driven. A final user study (n=138) investigates to what extent the predictive
models can be used to select recipe replacements such that users can be
``nudged'' towards choosing healthier recipes. Our findings have important implications
for online food systems.

【Title】Embedding Factorization Models for Jointly Recommending Items and User
Generated Lists

【Abstract】Existing recommender algorithms mainly focused on
recommending individual items by utilizing user-item interactions. However,
little attention has been paid to recommend user generated lists (e.g.,
playlists and booklists). On one hand, user generated lists contain rich signal
about item co-occurrence, as items within a list are usually gathered based on
a specific theme. On the other hand, a user's preference over a list also
indicate her preference over items within the list. We believe that 1) if the
rich relevance signal within user generated lists can be properly leveraged, an
enhanced recommendation for individual items can be provided, and 2) if
user-item and user-list interactions are properly utilized, and the
relationship between a list and its contained items is discovered, the
performance of user-item and user-list recommendations can be mutually
reinforced.

Towards
this end, we devise embedding factorization models, which extend traditional
factorization method by incorporating item-item (item-item-list) co-occurrence
with embedding-based algorithms. Specifically, we employ factorization model to
capture users' preferences over items and lists, and utilize embedding-based
models to discover the co-occurrence information among items and lists. The gap
between the two types of models is bridged by sharing items' latent factors.
Remarkably, our proposed framework is capable of solving the new-item
cold-start problem, where items have never been consumed by users but exist in
user generated lists. Overall performance comparisons and micro-level analyses
demonstrate the promising performance of our proposed approaches.

-----------------------

SIGKDD-2015

【Title】Real Time Recommendations from
Connoisseurs

【Abstract】The information overload problem remains serious
for both consumers and service/content providers, leading to heightened demands
for personalized recommendations. For recommender systems, updating user models
is one of the most important tasks to keep up with their changing preferences
and trends. Especially since new consumers and items emerge every day, which
are promptly rated or reviewed, updating lists of items and rankings is
crucial. In this paper, we set the goal of real time recommendation, to present
these items instantly. Unlike standard collaborative filtering algorithms, our
offline approach focuses only innovative consumers for these predictions, and
then uses as few consumers as possible while keeping the same precision. Since
innovators exist in many communities, and their opinions will spread and then
stimulate their followers to adopt the same behavior, our approach is based on
the hypothesis that a set of innova- tive consumers is sufficient to represent
the most representative opinions in each community. Following this hypothesis,
we derive a scalable method to detect both communities and innovative consumers
in each community from a web- scale data from a behavior log. Our evaluation
shows that our proposed weighting method can accurately sample given logs, and
be compatible only with previous algorithms for real time recommendations.

Complementary

【Title】Inferring Networks of Substitutable and
Complementary Products

【Abstract】To design a useful recommender system, it is
important to understand how products relate to each other. For example, while a
user is browsing mobile phones, it might make sense to recommend other phones,
but once they buy a phone, we might instead want to recommend batteries, cases,
or chargers. In economics, these two types of recommendations are referred to
as substitutes and complements: substitutes are products that can be purchased
instead of each other, while complements are products that can be purchased in
addition to each other. Such relationships are essential as they help us to
identify items that are relevant to a user's search.

Our goal in
this paper is to learn the semantics of substitutes and complements from the
text of online reviews. We treat this as a supervised learning problem, trained
using networks of products derived from browsing and co-purchasing logs.
Methodologically, we build topic models that are trained to automatically discover
topics from product reviews that are successful at predicting and explaining
such relationships. Experimentally, we evaluate our system on the Amazon
product catalog, a large dataset consisting of 9 million products, 237 million
links, and 144 million reviews.

【Title】SCRAM: A Sharing Considered Route Assignment Mechanism
for Fair Taxi Route Recommendations

【Abstract】Recommending routes for a group of competing taxi
drivers is almost untouched in most route recommender systems. For this kind of
problem, recommendation fairness and driving efficiency are two fundamental
aspects. In the paper, we propose SCRAM, a sharing considered route assignment
mechanism for fair taxi route recommendations. SCRAM aims to provide
recommendation fairness for a group of competing taxi drivers, without
sacrificing driving efficiency. By designing a concise route assignment
mechanism, SCRAM achieves better recommendation fairness for competing taxis.
By considering the sharing of road sections to avoid unnecessary competition,
SCRAM is more efficient in terms of driving cost per customer (DCC). We test
SCRAM based on a large number of historical taxi trajectories and validate the
recommendation fairness and driving efficiency of SCRAM with extensive
evaluations. Experimental results show that SCRAM achieves better
recommendation fairness and higher driving efficiency than three compared
approaches.

【Title】Matrix Completion with Queries

【Abstract】In many applications, e.g., recommender systems
and traffic monitoring, the data comes in the form of a matrix that is only
partially observed and low rank. A fundamental data-analysis task for these
datasets is matrix completion, where the goal is to accurately infer the
entries missing from the matrix. Even when the data satisfies the low-rank
assumption, classical matrix-completion methods may output completions with
significant error -- in that the reconstructed matrix differs significantly
from the true underlying matrix. Often, this is due to the fact that the
information contained in the observed entries is insufficient. In this work, we
address this problem by proposing an active version of matrix completion, where
queries can be made to the true underlying matrix. Subsequently, we design
Order&Extend, which is the first algorithm to unify a matrix-completion
approach and a querying strategy into a single algorithm. Order&Extend is
able identify and alleviate insufficient information by judiciously querying a
small number of additional entries. In an extensive experimental evaluation on
real-world datasets, we demonstrate that our algorithm is efficient and is able
to accurately reconstruct the true matrix while asking only a small number of
queries.

【Title】Collaborative Deep Learning for Recommender Systems

【Abstract】Collaborative filtering (CF) is a successful
approach commonly used by many recommender systems. Conventional CF-based
methods use the ratings given to items by users as the sole source of
information for learning to make recommendation. However, the ratings are often
very sparse in many applications, causing CF-based methods to degrade
significantly in their recommendation performance. To address this sparsity
problem, auxiliary information such as item content information may be
utilized. Collaborative topic regression (CTR) is an appealing recent method
taking this approach which tightly couples the two components that learn from
two different sources of information. Nevertheless, the latent representation
learned by CTR may not be very effective when the auxiliary information is very
sparse. To address this problem, we generalize recently advances in deep
learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this
paper a hierarchical Bayesian model called collaborative deep learning (CDL),
which jointly performs deep representation learning for the content information
and collaborative filtering for the ratings (feedback) matrix. Extensive
experiments on three real-world datasets from different domains show that CDL
can significantly advance the state of the art.

【Title】Life-stage Prediction for Product
Recommendation in E-commerce

【Abstract】Although marketing researchers and sociologists
have recognized the large impact of life stage on consumer's purchasing
behaviors, existing recommender systems have not taken this impact into
consideration. In this paper, we found obvious correlation between life stage
and purchasing behavior in many E-commerce categories. For example, a mum may
look for different suitable products when her baby is at different ages.
Motivated by this, we introduce the conception of life stage into recommender
systems and propose to predict a user's current life-stage and recommend
products correspondingly. We propose a new Maximum Entropy Semi Markov Model to
segment and label consumer life stage based on the observed purchasing data
over time. In the mom-baby product category where the life stage transition is
deterministic, we develop an efficient approximate solution using large scale
logistic regression and a Viterbi-like algorithm. We also propose a Gaussian
mixture model to efficiently handle multi-kids life stage prediction problem.
We integrate the life stage information predicted into the recommender system
behind the largest online shopping website taobao.com. Both offline and online
experiments demonstrate the effectiveness of the proposed life-stage based
recommendation approach.

【Title】An Architecture for Agile Machine Learning
in Real-Time Applications

【Abstract】Machine learning techniques have proved effective
in recommender systems and other applications, yet teams working to deploy them
lack many of the advantages that those in more established software disciplines
today take for granted. The well-known Agile methodology advances projects in a
chain of rapid development cycles, with subsequent steps often informed by
production experiments. Support for such workflow in machine learning
applications remains primitive.

The
platform developed at if(we) embodies a specific machine learning approach and
a rigorous data architecture constraint, so allowing teams to work in rapid
iterative cycles. We require models to consume data from a time-ordered event
history, and we focus on facilitating creative feature engineering. We make it
practical for data scientists to use the same model code in development and in
production deployment, and make it practical for them to collaborate on complex
models.

We deliver
real-time recommendations at scale, returning top results from among 10,000,000
candidates with sub-second response times and incorporating new updates in just
a few seconds. Using the approach and architecture described here, our team can
routinely go from ideas for new models to production-validated results within two
weeks.

【Title】Stock Constrained Recommendation in Tmall

【Abstract】A large number of recommender systems have been
developed to serve users with interesting news, ads, products or other
contents. One main limitation with the existing work is that they do not take
into account the inventory size of of items to be recommended. As a result,
popular items are likely to be out of stock soon as they have been recommended
and sold to many users, significantly affecting the impact of recommendation
and user experience. This observation motivates us to develop a novel aware
recommender system. It jointly optimizes the recommended items for all users
based on both user preference and inventory sizes of different items. It
requires solving a non-smooth optimization involved estimating a matrix of nxn,
where n is the number of items. With the proliferation of items, this approach
can quickly become computationally infeasible. We address this challenge by
developing a dual method that reduces the number of variables from n^2 to n,
significantly improving the computational efficiency. We also extend this
approach to the online setting, which is particularly important for big
promotion events. Our empirical studies based on a real benchmark data with 100
millions of user visits from Tmall verify the effectiveness of the proposed
approach.

【Title】Web Personalization and Recommender
Systems

【Abstract】The quantity of accessible information has been
growing rapidly and far exceeded human processing capabilities. The sheer
abundance of information often prevents users from discovering the desired
information, or aggravates making informed and correct choices. This highlights
the pressing need for intelligent personalized applications that simplify
information access and discovery by taking into account users' preferences and
needs. One type of personalized application that has recently become
tremendously popular in research and industry is recommender systems. These
provide to users personalized recommendations about information and products
they may be interested to examine or purchase. Extensive research into
recommender systems has yielded a variety of techniques, which have been
published at a variety of conferences and adopted by numerous Web-sites. This
tutorial will provide the participants with broad overview and thorough
understanding of algorithms and practically deployed Web and mobile applications
of personalized technologies.

SIGKDD-2016

【Title】An Empirical Study on Recommendation with
Multiple Types of Feedback

【Abstract】User feedback like clicks and ratings on
recommended items provides important information for recommender systems to
predict users' interests in unseen items. Most systems rely on models trained
using a single type of feedback, e.g., ratings for movie recommendation and
clicks for online news recommendation. However, in addition to the primary
feedback, many systems also allow users to provide other types of feedback,
e.g., liking or sharing an article, or hiding all articles from a source. These
additional feedback potentially provides extra information for the
recommendation models. To optimize user experience and business objectives, it
is important for a recommender system to use both the primary feedback and
additional feedback. This paper presents an empirical study on various training
methods for incorporating multiple user feedback types based on LinkedIn
recommendation products. We study three important problems that we face at
LinkedIn: (1) Whether to send an email based on clicks and complaints, (2) how
to rank updates in LinkedIn feeds based on clicks and hides and (3) how jointly
optimize for viral actions and clicks in LinkedIn feeds. Extensive offline
experiments on historical data show the effectiveness of these methods in
different situations. Online A/B testing results further demonstrate the impact
of these methods on LinkedIn production systems.

【Title】Collaborative Knowledge
Base Embedding for Recommender Systems

【Abstract】Among
different recommendation techniques, collaborative filtering usually suffer
from limited performance due to the sparsity of user-item interactions. To
address the issues, auxiliary information is usually used to boost the
performance. Due to the rapid collection of information on the web, the
knowledge base provides heterogeneous information including both structured and
unstructured data with different semantics, which can be consumed by various
applications. In this paper, we investigate how to leverage the heterogeneous
information in a knowledge base to improve the quality of recommender systems.
First, by exploiting the knowledge base, we design three components to extract
items' semantic representations from structural content, textual content and
visual content, respectively. To be specific, we adopt a heterogeneous network
embedding method, termed as TransR, to extract items' structural
representations by considering the heterogeneity of both nodes and
relationships. We apply stacked denoising auto-encoders and stacked
convolutional auto-encoders, which are two types of deep learning based
embedding techniques, to extract items' textual representations and visual
representations, respectively. Finally, we propose our final integrated
framework, which is termed as Collaborative Knowledge Base Embedding (CKE), to
jointly learn the latent representations in collaborative filtering as well as
items' semantic representations from the knowledge base. To evaluate the
performance of each embedding component as well as the whole system, we conduct
extensive experiments with two real-world datasets from different scenarios.
The results reveal that our approaches outperform several widely adopted
state-of-the-art recommendation methods.

【Title】Compute Job
Memory Recommender System Using Machine Learning

【Abstract】This paper
presents a machine learning approach to predict the amount of compute memory
needed by jobs which are submitted to Load Sharing Facility (LSF® ) with a high
level of accuracy. LSF® is the compute resource manager and job scheduler for
Qualcomm chip design process. It schedules the jobs based on available
resources: CPU, memory, storage, and software licenses. Memory is one of the
key resources and its proper utilization leads to a substantial improvement in
saving machine resources which in turn results in a significant reduction in
overall job pending time. In addition, efficient memory utilization helps to
reduce the operations cost by decreasing the number of servers needed for the
end-to-end design process. In this paper, we explored a suite of statistical
and machine learning techniques to develop a Compute Memory Recommender System
for the Qualcomm chip design process with over 90% accuracy in predicting the
amount of memory a job needs. Moreover, it demonstrates the potential to
significantly reduce job pending time.

【Title】Scalable
Time-Decaying Adaptive Prediction Algorithm

【Abstract】Online
learning is used in a wide range of real applications, e.g., predicting ad
click-through rates (CTR) and personalized recommendations. Based on the
analysis of users' behaviors in Video-On-Demand (VoD) recommender systems,we
discover that the most recent users' actions can better reflect users' current
intentions and preferences. Under this observation, we thereby propose a novel
time-decaying online learning algorithm derived from the state-of-the-art
FTRL-proximal algorithm, called Time-Decaying Adaptive Prediction (TDAP)
algorithm.

To scale Big Data, we further parallelize our
algorithm following the data parallel scheme under both BSP and SSP consistency
model. We experimentally evaluate our TDAP algorithm on real IPTV VoD datasets
using two state-of-the-art distributed computing platforms, TDAP achieves good
accuracy: it improves at least 5.6% in terms of prediction accuracy, compared
to FTRL-proximal algorithm; and TDAP scales well: it runs 4 times faster when
the number of machines increases from 2 to 10.

【Title】The Limits of
Popularity-Based Recommendations, and the Role of Social Ties

【Abstract】In this
paper we introduce a mathematical model that captures some of the salient
features of recommender systems that are based on popularity and that try to
exploit social ties among the users. We show that, under very general
conditions, the market always converges to a steady state, for which we are
able to give an explicit form. Thanks to this we can tell rather precisely how
much a market is altered by a recommendation system, and determine the power of
users to influence others. Our theoretical results are complemented by
experiments with real world social networks showing that social graphs prevent
large market distortions in spite of the presence of highly influential users.

【Title】Towards Conversational Recommender Systems

【Abstract】People
often ask others for restaurant recommendations as a way to discover new dining
experiences. This makes restaurant recommendation an exciting scenario for
recommender systems and has led to substantial research in this area. However,
most such systems behave very differently from a human when asked for a
recommendation. The goal of this paper is to begin to reduce this gap. In
particular, humans can quickly establish preferences when asked to make a
recommendation for someone they do not know. We address this cold-start
recommendation problem in an online learning setting. We develop a preference
elicitation framework to identify which questions to ask a new user to quickly
learn their preferences. Taking advantage of latent structure in the recommendation
space using a probabilistic latent factor model, our experiments with both
synthetic and real world data compare different types of feedback and question
selection strategies. We find that our framework can make very effective use of
online user feedback, improving personalized recommendations over a static
model by 25% after asking only 2 questions. Our results demonstrate dramatic
benefits of starting from offline embeddings, and highlight the benefit of
bandit-based explore-exploit strategies in this setting.

【Title】Point-of-Interest Recommendations: Learning
Potential Check-ins from Friends

【Abstract】The emergence of Location-based Social Network
(LBSN) services provides a wonderful opportunity to build personalized
Point-of-Interest (POI) recommender systems. Although a personalized POI
recommender system can significantly facilitate users' outdoor activities, it
faces many challenging problems, such as the hardness to model user's POI
decision making process and the difficulty to address data sparsity and
user/location cold-start problem. To cope with these challenges, we define
three types of friends (i.e., social friends, location friends, and neighboring
friends) in LBSN, and develop a two-step framework to leverage the information
of friends to improve POI recommendation accuracy and address cold-start
problem. Specifically, we first propose to learn a set of potential locations
that each individual's friends have checked-in before and this individual is
most interested in. Then we incorporate three types of check-ins (i.e.,
observed check-ins, potential check-ins and other unobserved check-ins) into
matrix factorization model using two different loss functions (i.e., the square
error based loss and the ranking error based loss). To evaluate the proposed
model, we conduct extensive experiments with many state-of-the-art baseline
methods and evaluation metrics on two real-world data sets. The experimental results
demonstrate the effectiveness of our methods.

【Title】Continuous Experience-aware Language Model

【Abstract】Online review communities are dynamic as users join
and leave, adopt new vocabulary, and adapt to evolving trends. Recent work has
shown that recommender systems benefit from explicit consideration of user
experience. However, prior work assumes a fixed number of discrete experience
levels, whereas in reality users gain experience and mature continuously over
time. This paper presents a new model that captures the continuous evolution of
user experience, and the resulting language model in reviews and other posts.
Our model is unsupervised and combines principles of Geometric Brownian Motion,
Brownian Motion, and Latent Dirichlet Allocation to trace a smooth temporal
progression of user experience and language model respectively. We develop
practical algorithms for estimating the model parameters from data and for
inference with our model (e.g., to recommend items). Extensive experiments with
five real-world datasets show that our model not only fits data better than
discrete-model baselines, but also outperforms state-of-the-art methods for
predicting item ratings.

SIGKDD-2017

【Title】Unsupervised P2P Rental Recommendations via
Integer Programming

【Abstract】Due to the sparseness of quality rating data,
unsupervised recommender systems are used in many applications in Peer to Peer
(P2P) rental marketplaces such as Airbnb, FlipKey, and HomeAway. We present an
integer programming based recommender systems, where both accommodation
benefits and community risks of lodging places are measured and incorporated
into an objective function as utility measurements. More specifically, we first
present an unsupervised fused scoring method for quantifying the accommodation benefits
and community risks of a lodging with crowd-sourced geo-tagged data. In order
to the utility of recommendations, we formulate the unsupervised P2P rental
recommendations as a constrained integer programming problem, where the
accommodation benefits of recommendations are maximized and the community risks
of recommendations are minimized, while maintaining constraints on
personalization. Furthermore, we provide an efficient solution for the
optimization problem by developing a learning-to-integer-programming method for
combining aggregated listwise learning to rank into branching variable
selection. We apply the proposed approach to the Airbnb data of New York City
and provide lodging recommendations to travelers. In our empirical experiments,
we demonstrate both the efficiency and effectiveness of our method in terms of
striving a trade-off between the user satisfaction, time on market, and the
number of reviews, and achieving a balance between positive and negative sides.

【Title】Collaborative Variational Autoencoder for
Recommender Systems

【Abstract】Modern recommender systems usually employ
collaborative filtering with rating information to recommend items to users due
to its successful performance. However, because of the drawbacks of
collaborative-based methods such as sparsity, cold start, etc., more attention
has been drawn to hybrid methods that consider both the rating and content
information. Most of the previous works in this area cannot learn a good
representation from content for recommendation task or consider only text
modality of the content, thus their methods are very limited in current
multimedia scenario. This paper proposes a Bayesian generative model called
collaborative variational autoencoder (CVAE) that considers both rating and
content for recommendation in multimedia scenario. The model learns deep latent
representations from content data in an unsupervised manner and also learns
implicit relationships between items and users from both content and rating.
Unlike previous works with denoising criteria, the proposed CVAE learns a
latent distribution for content in latent space instead of observation space
through an inference network and can be easily extended to other multimedia
modalities other than text. Experiments show that CVAE is able to significantly
outperform the state-of-the-art recommendation methods with more robust
performance.

【Title】HoORaYs: High-order Optimization of Rating Distance for
Recommender Systems

【Abstract】Latent factor models have become a prevalent
method in recommender systems, to predict users' preference on items based on
the historical user feedback. Most of the existing methods, explicitly or
implicitly, are built upon the first-order rating distance principle, which
aims to minimize the difference between the estimated and real ratings. In this
paper, we generalize such first-order rating distance principle and propose a
new latent factor model (HoORaYs) for recommender systems. The core idea of the
proposed method is to explore high-order rating distance, which aims to minimize
not only (i) the difference between the estimated and real ratings of the same
(user, item) pair (i.e., the first-order rating distance), but also (ii) the
difference between the estimated and real rating difference of the same user
across different items (i.e., the second-order rating distance). We formulate
it as a regularized optimization problem, and propose an effective and scalable
algorithm to solve it. Our analysis from the geometry and Bayesian perspectives
indicate that by exploring the high-order rating distance, it helps to reduce
the variance of the estimator, which in turns leads to better generalization
performance (e.g., smaller prediction error). We evaluate the proposed method
on four real-world data sets, two with explicit user feedback and the other two
with implicit user feedback. Experimental results show that the proposed method
consistently outperforms the state-of-the-art methods in terms of the
prediction accuracy.

【Title】Meta-Graph Based Recommendation Fusion over Heterogeneous
Information Networks

【Abstract】Heterogeneous Information Network (HIN) is a
natural and general representation of data in modern large commercial
recommender systems which involve heterogeneous types of data. HIN based
recommenders face two problems: how to represent the high-level semantics of
recommendations and how to fuse the heterogeneous information to make
recommendations. In this paper, we solve the two problems by first introducing
the concept of meta-graph to HIN-based recommendation, and then solving the
information fusion problem with a "matrix factorization (MF) +
factorization machine (FM)" approach. For the similarities generated by each
meta-graph, we perform standard MF to generate latent features for both users
and items. With different meta-graph based features, we propose to use FM with
Group lasso (FMG) to automatically learn from the observed ratings to
effectively select useful meta-graph based features. Experimental results on
two real-world datasets, Amazon and Yelp, show the effectiveness of our
approach compared to state-of-the-art FM and other HIN-based recommendation
algorithms.

【Title】Post
Processing Recommender Systems for Diversity

【Abstract】Collaborative filtering is a broad and powerful
framework for building recommendation systems that has seen widespread
adoption. Over the past decade, the propensity of such systems for favoring
popular products and thus creating echo chambers have been observed. This has
given rise to an active area of research that seeks to diversify
recommendations generated by such algorithms. We address the problem of
increasing diversity in recom- mendation systems that are based on
collaborative filtering that use past ratings to predict a rating quality for
potential recommendations. Following our earlier work, we formulate
recommendation system design as a subgraph selection problem from a candidate
super-graph of potential recommendations where both diversity and rating
quality are explicitly optimized: (1) On the modeling side, we define a new
flexible notion of diversity that allows a system designer to prescribe the
number of recommendations each item should receive, and smoothly penalizes
deviations from this distribution. (2) On the algorithmic side, we show that
minimum-cost network flow methods yield fast algorithms in theory and practice
for designing recommendation subgraphs that optimize this notion of diversity.
(3) On the empirical side, we show the effectiveness of our new model and
method to increase diversity while maintaining high rating quality in standard
rating data sets from Netflix and MovieLens.

【Title】End-to-end Learning
for Short Text Expansion

【Abstract】Effectively making sense of short texts is a
critical task for many real world applications such as search engines, social
media services, and recommender systems. The task is particularly challenging as
a short text contains very sparse information, often too sparse for a machine
learning algorithm to pick up useful signals. A common practice for analyzing
short text is to first expand it with external information, which is usually
harvested from a large collection of longer texts. In literature, short text
expansion has been done with all kinds of heuristics. We propose an end-to-end
solution that automatically learns how to expand short text to optimize a given
learning task. A novel deep memory network is proposed to automatically find
relevant information from a collection of longer documents and reformulate the
short text through a gating mechanism. Using short text classification as a
demonstrating task, we show that the deep memory network significantly
outperforms classical text expansion methods with comprehensive experiments on
real world data sets.

【Title】Bridging
Collaborative Filtering and Semi-Supervised Learning: A Neural Approach for POI
Recommendation

【Abstract】Recommender system is one of the most popular
data mining topics that keep drawing extensive attention from both academia and
industry. Among them, POI (point of interest) recommendation is extremely
practical but challenging: it greatly benefits both users and businesses in
real-world life, but it is hard due to data scarcity and various context. While
a number of algorithms attempt to tackle the problem w.r.t. specific data and
problem settings, they often fail when the scenarios change. In this work, we
propose to devise a general and principled SSL (semi-supervised learning)
framework, to alleviate data scarcity via smoothing among neighboring users and
POIs, and treat various context by regularizing user preference based on
context graphs. To enable such a framework, we develop PACE (Preference And
Context Embedding), a deep neural architecture that jointly learns the
embeddings of users and POIs to predict both user preference over POIs and
various context associated with users and POIs. We show that PACE successfully
bridges CF (collaborative filtering) and SSL by generalizing the de facto
methods matrix factorization of CF and graph Laplacian regularization of SSL.
Extensive experiments on two real location-based social network datasets
demonstrate the effectiveness of PACE.

【Title】A Data-driven Process Recommender Framework

【Abstract】We present an approach for improving the
performance of complex knowledge-based processes by providing data-driven
step-by-step recommendations. Our framework uses the associations between
similar historic process performances and contextual information to determine
the prototypical way of enacting the process. We introduce a novel similarity
metric for grouping traces into clusters that incorporates temporal information
about activity performance and handles concurrent activities. Our data-driven recommender
system selects the appropriate prototype performance of the process based on
user-provided context attributes. Our approach for determining the prototypes
discovers the commonly performed activities and their temporal relationships.
We tested our system on data from three real-world medical processes and
achieved recommendation accuracy up to an F1 score of 0.77 (compared to an F1
score of 0.37 using ZeroR) with 63.2% of recommended enactments being within
the first five neighbors of the actual historic enactments in a set of 87
cases. Our framework works as an interactive visual analytic tool for process
mining. This work shows the feasibility of data-driven decision support system
for complex knowledge-based processes.

-----------------------

ICML-2015

【Title】Counterfactual
Risk Minimization: Learning from Logged Bandit Feedback

【Abstract】We
develop a learning principle and an efficient algorithm for batch learning from
logged bandit feedback. This learning setting is ubiquitous in online systems
(e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad
ranking) for a given input (e.g., query) and observes bandit feedback (e.g.,
user clicks on presented ads). We first address the counterfactual nature of
the learning problem through propensity scoring. Next, we prove generalization
error bounds that account for the variance of the propensity-weighted empirical
risk estimator. These constructive bounds give rise to the Counterfactual Risk
Minimization (CRM) principle. We show how CRM can be used to derive a new
learning method – called Policy Optimizer for Exponential Models (POEM) – for
learning stochastic linear rules for structured output prediction. We present a
decomposition of the POEM objective that enables efficient stochastic gradient
optimization. POEM is evaluated on several multi-label classification problems
showing substantially improved robustness and generalization performance
compared to the state-of-the-art.

【Title】Safe
Exploration for Optimization with Gaussian Processes

【Abstract】We
consider sequential decision problems under uncertainty, where we seek to
optimize an unknown function from noisy samples. This requires balancing
exploration (learning about the objective) and exploitation (localizing the
maximum), a problem well-studied in the multi-armed bandit literature. In many
applications, however, we require that the sampled function values exceed some
prespecified "safety" threshold, a requirement that existing
algorithms fail to meet. Examples include medical applications where patient
comfort must be guaranteed, recommender systems aiming to avoid user
dissatisfaction, and robotic control, where one seeks to avoid controls causing
physical harm to the platform. We tackle this novel, yet rich, set of problems
under the assumption that the unknown function satisfies regularity conditions
expressed via a Gaussian process prior. We develop an efficient algorithm
called SafeOpt, and theoretically guarantee its convergence to a natural notion
of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic
data, as well as two real applications: movie recommendation, and therapeutic spinal cord
stimulation.

【Title】Non-Linear
Cross-Domain Collaborative Filtering via Hyper-Structure Transfer

【Abstract】The
Cross Domain Collaborative Filtering (CDCF) exploits the rating matrices from
multiple domains to make better recommendations. Existing CDCF methods adopt the sub-structure
sharing technique that can only transfer linearly correlated knowledge between
domains. In this paper, we propose the notion of Hyper-Structure Transfer (HST)
that requires the rating matrices to be explained by the projections of some
more complex structure, called the hyper-structure, shared by all domains, and
thus allows the non-linearly correlated knowledge between domains to be
identified and transferred. Extensive experiments are conducted and the results
demonstrate the effectiveness of our HST models empirically.

【Title】MRA-based
Statistical Learning from Incomplete Rankings

【Abstract】Statistical
analysis of rank data describing preferences over small and variable subsets of
a potentially large ensemble of items 1, ..., n is a very challenging problem.
It is motivated by a wide variety of modern applications, such as recommender systems or
search engines. However, very few inference methods have been documented in the
literature to learn a ranking model from such incomplete rank data. The goal of
this paper is twofold: it develops a rigorous mathematical framework for the
problem of learning a ranking model from incomplete rankings and introduces a
novel general statistical method to address it. Based on an original concept of
multi-resolution analysis (MRA) of incomplete rankings, it finely adapts to any
observation setting, leading to a statistical accuracy and an algorithmic
complexity that depend directly on the complexity of the observed data. Beyond
theoretical guarantees, we also provide experimental results that show its
statistical performance.

【Title】PU Learning
for Matrix Completion

【Abstract】In this
paper, we consider the matrix completion problem when the observations are
one-bit measurements of some underlying matrix M , and in particular the
observed samples consist only of ones and no zeros. This problem is motivated
by modern applications such as recommender systems and social networks where only “likes” or
“friendships” are observed. The problem is an instance of PU
(positive-unlabeled) learning, i.e. learning from only positive and unlabeled
examples that has been studied in the context of binary classification. Under
the assumption that M has bounded nuclear norm, we provide recovery guarantees
for two different observation models: 1) M parameterizes a distribution that
generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For
the first case, we propose a “shifted matrix completion” method that recovers M
using only a subset of indices corresponding to ones; for the second case, we
propose a “biased matrix completion” method that recovers the (thresholded)
binary matrix. Both methods yield strong error bounds — if M ∈R^n
\times n, the error is bounded as O(1-ρ) , where 1-ρdenotes the fraction of
ones observed. This implies a sample complexity of O(n log n) ones to achieve a
small error, when M is dense and n is large. We extend our analysis to the
inductive matrix completion problem, where rows and columns of M have
associated features. We develop efficient and scalable optimization procedures
for both the proposed methods and demonstrate their effectiveness for link
prediction (on real-world networks consisting of over 2 million nodes and 90
million links) and semi-supervised clustering tasks.

ICML-2016

【Title】Minimum
Regret Search for Single- and Multi-Task Optimization

【Abstract】We
propose minimum regret search (MRS), a novel acquisition function for Bayesian
optimization. MRS bears similarities with information-theoretic approaches such
as entropy search (ES). However, while ES aims in each query at maximizing the
information gain with respect to the global maximum, MRS aims at minimizing the
expected simple regret of its ultimate recommendation for the optimum. While empirically
ES and MRS perform similar in most of the cases, MRS produces fewer outliers
with high simple regret than ES. We provide empirical results both for a
synthetic single-task optimization problem as well as for a simulated
multi-task robotic control problem.

【Title】Low-Rank
Matrix Approximation with Stability

【Abstract】Low-rank
matrix approximation has been widely adopted in machine learning applications
with sparse data, such as recommender
systems. However, the sparsity of the data, incomplete and noisy, introduces
challenges to the algorithm stability – small changes in the training data may
significantly change the models. As a result, existing low-rank matrix
approximation solutions yield low generalization performance, exhibiting high
error variance on the training dataset, and minimizing the training error may
not guarantee error reduction on the testing dataset. In this paper, we
investigate the algorithm stability problem of low-rank matrix approximations.
We present a new algorithm design framework, which (1) introduces new
optimization objectives to guide stable matrix approximation algorithm design,
and (2) solves the optimization problem to obtain stable low-rank approximation
solutions with good generalization performance. Experimental results on
real-world datasets demonstrate that the proposed work can achieve better
prediction accuracy compared with both state-of-the-art low-rank matrix
approximation methods and ensemble methods in recommendation task.

【Title】Online
Stochastic Linear Optimization under One-bit Feedback

【Abstract】In this
paper, we study a special bandit setting of online stochastic linear
optimization, where only one-bit of information is revealed to the learner at
each round. This problem has found many applications including online
advertisement and online recommendation.
We assume the binary feedback is a random variable generated from the logit
model, and aim to minimize the regret defined by the unknown linear function.
Although the existing method for generalized linear bandit can be applied to
our problem, the high computational cost makes it impractical for real-world
applications. To address this challenge, we develop an efficient online
learning algorithm by exploiting particular structures of the observation
model. Specifically, we adopt online Newton step to estimate the unknown
parameter and derive a tight confidence region based on the exponential
concavity of the logistic loss. Our analysis shows that the proposed algorithm
achieves a regret bound of O(d\sqrtT), which matches the optimal result of
stochastic linear bandits.

【Title】Actively
Learning Hemimetrics with Applications to Eliciting User Preferences

【Abstract】Motivated
by an application of eliciting users’ preferences, we investigate the problem
of learning hemimetrics, i.e., pairwise distances among a set of n items that
satisfy triangle inequalities and non-negativity constraints. In our
application, the (asymmetric) distances quantify private costs a user incurs
when substituting one item by another. We aim to learn these distances (costs)
by asking the users whether they are willing to switch from one item to another
for a given incentive offer. Without exploiting structural constraints of the hemimetric
polytope, learning the distances between each pair of items requires Θ(n^2)
queries. We propose an active learning algorithm that substantially reduces
this sample complexity by exploiting the structural constraints on the version
space of hemimetrics. Our proposed algorithm achieves provably-optimal sample
complexity for various instances of the task. For example, when the items are
embedded into K tight clusters, the sample complexity of our algorithm reduces
to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of
our theoretical analysis.

【Title】Polynomial
Networks and Factorization Machines: New Insights and Efficient Training
Algorithms

【Abstract】Polynomial
networks and factorization machines are two recently-proposed models that can
efficiently use feature interactions in classification and regression tasks. In
this paper, we revisit both models from a unified perspective. Based on this new
view, we study the properties of both models and propose new efficient training
algorithms. Key to our approach is to cast parameter learning as a low-rank
symmetric tensor estimation problem, which we solve by multi-convex
optimization. We demonstrate our approach on regression and recommender system tasks.

【Title】DCM
Bandits: Learning to Rank with Multiple Clicks

【Abstract】A
search engine recommends
to the user a list of web pages. The user examines this list, from the first
page to the last, and clicks on all attractive pages until the user is
satisfied. This behavior of the user can be described by the dependent click
model (DCM). We propose DCM bandits, an online learning variant of the DCM
where the goal is to maximize the probability of recommending satisfactory
items, such as web pages. The main challenge of our learning problem is that we
do not observe which attractive item is satisfactory. We propose a
computationally-efficient learning algorithm for solving our problem,
dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable
assumptions; and also prove a matching lower bound up to logarithmic factors.
We evaluate our algorithm on synthetic and real-world problems, and show that
it performs well even when our model is misspecified. This work presents the
first practical and regret-optimal online algorithm for learning to rank with
multiple clicks in a cascade-like click model.

【Title】Copeland
Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and
Computationally Efficient Algorithm

【Abstract】We
study the K-armed dueling bandit problem, a variation of the standard
stochastic bandit problem where the feedback is limited to relative comparisons
of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest
number of other arms, is characterized by deriving an asymptotic regret bound.
We propose Copeland Winners Deterministic Minimum Empirical Divergence
(CW-RMED), an algorithm inspired by the DMED algorithm (Honda and Takemura,
2010), and derive an asymptotically optimal regret bound for it. However, it is
not known whether the algorithm can be efficiently computed or not. To address
this issue, we devise an efficient version (ECW-RMED) and derive its asymptotic
regret bound. Experimental comparisons of dueling bandit algorithms show that
ECW-RMED significantly outperforms existing ones.

【Title】Contextual
Combinatorial Cascading Bandits

【Abstract】We
propose the contextual combinatorial cascading bandits, a combinatorial online
learning game, where at each time step a learning agent is given a set of
contextual information, then selects a list of items, and observes stochastic
outcomes of a prefix in the selected items by some stopping criterion. In
online recommendation,
the stopping criterion might be the first item a user selects; in network
routing, the stopping criterion might be the first edge blocked in a path. We
consider position discounts in the list order, so that the agent’s reward is
discounted depending on the position where the stopping criterion is met. We
design a UCB-type algorithm, C^3-UCB, for this problem, prove an n-step regret
bound \tildeO(\sqrtn) in the general setting, and give finer analysis for two
special cases. Our work generalizes existing studies in several directions,
including contextual information, position discounts, and a more general
cascading bandit model. Experiments on synthetic and real datasets demonstrate the
advantage of involving contextual information and position discounts.

【Title】Fast
Constrained Submodular Maximization: Personalized Data Summarization

【Abstract】Can we
summarize multi-category data based on user preferences in a scalable manner?
Many utility functions used for data summarization satisfy submodularity, a
natural diminishing returns property. We cast personalized data summarization
as an instance of a general submodular maximization problem subject to multiple
constraints. We develop the first practical and FAst coNsTrained submOdular
Maximization algorithm, FANTOM, with strong theoretical guarantees. FANTOM
maximizes a submodular function (not necessarily monotone) subject to
intersection of a p-system and l knapsacks constrains. It achieves a (1 + ε)(p
+ 1)(2p + 2l + 1)/p approximation guarantee with only O(nrp log(n)/ε) query
complexity (n and r indicate the size of the ground set and the size of the
largest feasible solution, respectively). We then show how we can use FANTOM
for personalized data summarization. In particular, a p-system can model
different aspects of data, such as categories or time stamps, from which the
users choose. In addition, knapsacks encode users’ constraints including budget
or time. In our set of experiments, we consider several concrete applications:
movie recommendation
over 11K movies, personalized image summarization with 10K images, and revenue
maximization on the YouTube social networks with 5000 communities. We observe
that FANTOM constantly provides the highest utility against all the baselines.

【Title】Predictive
Entropy Search for Multi-objective Bayesian Optimization

【Abstract】We
present \small PESMO, a Bayesian method for identifying the Pareto set of
multi-objective optimization problems, when the functions are expensive to
evaluate. \small PESMO chooses the evaluation points to maximally reduce the
entropy of the posterior distribution over the Pareto set. The \small PESMO
acquisition function is decomposed as a sum of objective-specific acquisition
functions, which makes it possible to use the algorithm in \emphdecoupled
scenarios in which the objectives can be evaluated separately and perhaps with
different costs. This decoupling capability is useful to identify difficult
objectives that require more evaluations. \small PESMO also offers gains in
efficiency, as its cost scales linearly with the number of objectives, in
comparison to the exponential cost of other methods. We compare \small PESMO
with other methods on synthetic and real-world problems. The results show that
\small PESMO produces better recommendations
with a smaller number of evaluations, and that a decoupled evaluation can lead
to improvements in performance, particularly when the number of objectives is
large.

【Title】Bounded
Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design

【Abstract】Successfully
recommending personalized course schedules is a difficult problem given the
diversity of students knowledge, learning behaviour, and goals. This paper
presents personalized course recommendation and curriculum design algorithms
that exploit logged student data. The algorithms are based on the regression
estimator for contextual multi-armed bandits with a penalized variance term.
Guarantees on the predictive performance of the algorithms are provided using
empirical Bernstein bounds. We also provide guidelines for including expert
domain knowledge into the recommendations. Using undergraduate engineering
logged data from a post-secondary institution we illustrate the performance of
these algorithms.

【Title】Gaussian
process nonparametric tensor estimator and its minimax optimality

【Abstract】We
investigate the statistical efficiency of a nonparametric Gaussian process
method for a nonlinear tensor estimation problem. Low-rank tensor estimation
has been used as a method to learn higher order relations among several data
sources in a wide range of applications, such as multi-task learning, recommendation systems,
and spatiotemporal analysis. We consider a general setting where a common
linear tensor learning is extended to a nonlinear learning problem in
reproducing kernel Hilbert space and propose a nonparametric Bayesian method
based on the Gaussian process method. We prove its statistical convergence rate
without assuming any strong convexity, such as restricted strong convexity.
Remarkably, it is shown that our convergence rate achieves the minimax optimal
rate. We apply our proposed method to multi-task learning and show that our
method significantly outperforms existing methods through numerical experiments
on real-world data sets.

【Title】Recommendations as Treatments:
Debiasing Learning and Evaluation

【Abstract】Most
data for evaluating and training recommender systems is subject to selection
biases, either through self-selection by the users or through the actions of
the recommendation system itself. In this paper, we provide a principled
approach to handle selection biases by adapting models and estimation
techniques from causal inference. The approach leads to unbiased performance
estimators despite biased data, and to a matrix factorization method that
provides substantially improved prediction performance on real-world data. We
theoretically and empirically characterize the robustness of the approach, and
find that it is highly practical and scalable.

【Title】Dictionary
Learning for Massive Matrix Factorization

【Abstract】Sparse
matrix factorization is a popular tool to obtain interpretable data
decompositions, which are also effective to perform data completion or denoising.
Its applicability to large datasets has been addressed with online and
randomized methods, that reduce the complexity in one of the matrix dimension,
but not in both of them. In this paper, we tackle very large matrices in both
dimensions. We propose a new factorization method that scales gracefully to
terabyte-scale datasets. Those could not be processed by previous algorithms in
a reasonable amount of time. We demonstrate the efficiency of our approach on
massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix
completion problems for recommender
systems, where we obtain significant speed-ups compared to state-of-the art
coordinate descent methods.

【Title】Hierarchical
Compound Poisson Factorization

【Abstract】Non-negative
matrix factorization models based on a hierarchical Gamma-Poisson structure
capture user and item behavior effectively in extremely sparse data sets,
making them the ideal choice for collaborative filtering applications.
Hierarchical Poisson factorization (HPF) in particular has proved successful
for scalable recommendation
systems with extreme sparsity. HPF, however, suffers from a tight coupling of
sparsity model (absence of a rating) and response model (the value of the
rating), which limits the expressiveness of the latter. Here, we introduce
hierarchical compound Poisson factorization (HCPF) that has the favorable
Gamma-Poisson structure and scalability of HPF to high-dimensional extremely
sparse matrices. More importantly, HCPF decouples the sparsity model from the
response model, allowing us to choose the most suitable distribution for the
response. HCPF can capture binary, non-negative discrete, non-negative
continuous, and zero-inflated continuous responses. We compare HCPF with HPF on
nine discrete and three continuous data sets and conclude that HCPF captures
the relationship between sparsity and response better than HPF.

ICML-2017

【Title】Dueling
Bandits with Weak Regret

【Abstract】We
consider online content
recommendation with implicit feedback through pairwise comparisons,
formalized as the so-called dueling bandit problem. We study the dueling bandit
problem in the Condorcet winner setting, and consider two notions of regret:
the more well-studied strong regret, which is 0 only when both arms pulled are
the Condorcet winner; and the less well-studied weak regret, which is 0 if
either arm pulled is the Condorcet winner. We propose a new algorithm for this
problem, Winner Stays (WS), with variations for each kind of regret: WS for
weak regret (WS-W) has expected cumulative weak regret that is , and  if arms have a total order; WS for strong
regret (WS-S) has expected cumulative strong regret of , and  if arms have a total order. WS-W is the first
dueling bandit algorithm with weak regret that is constant in time. WS is
simple to compute, even for problems with many arms, and we demonstrate through
numerical experiments on simulated and real data that WS has significantly
smaller regret than existing algorithms in both the weak- and strong-regret
settings.

【Title】Zonotope
Hit-and-run for Efficient Sampling from Projection DPPs

【Abstract】Determinantal
point processes (DPPs) are distributions over sets of items that model
diversity using kernels. Their applications in machine learning include summary
extraction and recommendation
systems. Yet, the cost of sampling from a DPP is prohibitive in
large-scale applications, which has triggered an effort towards efficient
approximate samplers. We build a novel MCMC sampler that combines ideas from
combinatorial geometry, linear programming, and Monte Carlo methods to sample
from DPPs with a fixed sample cardinality, also called projection DPPs. Our
sampler leverages the ability of the hit-and-run MCMC kernel to efficiently
move across convex bodies. Previous theoretical results yield a fast mixing
time of our chain when targeting a distribution that is close to a projection
DPP, but not a DPP in general. Our empirical results demonstrate that this
extends to sampling projection DPPs, i.e., our sampler is more sample-efficient
than previous approaches which in turn translates to faster convergence when
dealing with costly-to-evaluate functions, such as summary extraction in our
experiments.

【Title】On
Context-Dependent Clustering of Bandits

【Abstract】We
investigate a novel cluster-of-bandit algorithm CAB for collaborative recommendation tasks that
implements the underlying feedback sharing mechanism by estimating user
neighborhoods in a context-dependent manner. CAB makes sharp departures from
the state of the art by incorporating collaborative effects into inference, as
well as learning processes in a manner that seamlessly interleaves
explore-exploit tradeoffs and collaborative steps. We prove regret bounds for
CAB under various data-dependent assumptions which exhibit a crisp dependence
on the expected number of clusters over the users, a natural measure of the
statistical difficulty of the learning task. Experiments on production and
real-world datasets show that CAB offers significantly increased prediction
performance against a representative pool of state-of-the-art methods.

【Title】Preferential
Bayesian Optimization

【Abstract】Bayesian
optimization (BO) has emerged during the last few years as an effective
approach to optimize black-box functions where direct queries of the objective
are expensive. We consider the case where direct access to the function is not
possible, but information about user preferences is. Such scenarios arise in
problems where human preferences are modeled, such as A/B tests or recommender systems. We
present a new framework for this scenario that we call Preferential Bayesian
Optimization (PBO) and that allows to find the optimum of a latent function
that can only be queried through pairwise comparisons, so-called duels. PBO
extend the applicability of standard BO ideas and generalizes previous discrete
dueling approaches by modeling the probability of the the winner of each duel
by means of Gaussian process model with a Bernoulli likelihood. The latent
preference function is used to define a family of acquisition functions that
extend usual policies used in BO. We illustrate the benefits of PBO in a
variety of experiments in which we show how the way correlations are modeled is
the key ingredient to drastically reduce the number of comparisons to find the
optimum of the latent function of interest.

【Title】The Sample
Complexity of Online One-Class Collaborative Filtering

【Abstract】We
consider the online one-class collaborative filtering (CF) problem that consist
of recommending items
to users over time in an online fashion based on positive ratings only. This
problem arises when users respond only occasionally to a recommendation with a
positive rating, and never with a negative one. We study the impact of the
probability of a user responding to a recommendation, ,
on the sample complexity, and ask whether receiving positive and negative
ratings, instead of positive ratings only, improves the sample complexity. Both
questions arise in the design of recommender systems. We introduce a simple
probabilistic user model, and analyze the performance of an online user-based
CF algorithm. We prove that after an initial cold start phase, where
recommendations are invested in exploring the user’s preferences, this
algorithm makes—up to a fraction of the recommendations required for updating
the user’s preferences—perfect recommendations. The number of ratings required
for the cold start phase is nearly proportional to ,,
and that for updating the user’s preferences is essentially independent of ,.
As a consequence we find that, receiving positive and negative ratings instead
of only positive ones improves the number of ratings required for initial
exploration by a factor of , which
can be significant.

【Title】Provably
Optimal Algorithms for Generalized Linear Contextual Bandits

【Abstract】Contextual
bandits are widely used in Internet services from news recommendation to advertising, and to Web
search. Generalized linear models (logistical regression in particular) have
demonstrated stronger performance than linear models in many applications where
rewards are binary. However, most theoretical analyses on contextual bandits so
far are on linear bandits. In this work, we propose an upper confidence bound
based algorithm for generalized linear contextual bandits, which achieves an )
regret
over T rounds with d dimensional feature vectors. This regret matches the minimax
lower bound, up to logarithmic terms, and improves on the best previous result
by a  factor, assuming the number of arms is fixed.
A key component in our analysis is to establish a new, sharp finite-sample
confidence bound for maximum likelihood estimates in generalized linear models,
which may be of independent interest. We also analyze a simpler upper
confidence bound algorithm, which is useful in practice, and prove it to have
optimal regret for certain cases.

【Title】Deletion-Robust
Submodular Maximization: Data Summarization with “the Right to be Forgotten”

【Abstract】How can
we summarize a dynamic data stream when elements selected for the summary can
be deleted at any time? This is an important challenge in online services,
where the users generating the data may decide to exercise their right to
restrict the service provider from using (part of) their data due to privacy
concerns. Motivated by this challenge, we introduce the dynamic deletion-robust
submodular maximization problem. We develop the first resilient streaming
algorithm, called ROBUST-STREAMING, with a constant factor approximation
guarantee to the optimum solution. We evaluate the effectiveness of our
approach on several real-world applications, including summarizing (1) streams
of geo-coordinates (2); streams of images; and (3) click-stream log data,
consisting of 45 million feature vectors from a news recommendation task.

【Title】Probabilistic
Submodular Maximization in Sub-Linear Time

【Abstract】In this
paper, we consider optimizing submodular functions that are drawn from some
unknown distribution. This setting arises, e.g., in recommender systems, where the utility of a
subset of items may depend on a user-specific submodular utility function. In
modern applications, the ground set of items is often so large that even the
widely used (lazy) greedy algorithm is not efficient enough. As a remedy, we
introduce the problem of sublinear time probabilistic submodular maximization:
Given training examples of functions (e.g., via user feature vectors), we seek
to reduce the ground set so that optimizing new functions drawn from the same
distribution will provide almost as much value when restricted to the reduced
ground set as when using the full set. We cast this problem as a two-stage
submodular maximization and develop a novel efficient algorithm for this
problem which offers approximation ratio for general monotone
submodular functions and general matroid constraints. We demonstrate the
effectiveness of our approach on several real-world applications where running
the maximization problem on the reduced ground set leads to two orders of
magnitude speed-up while incurring almost no loss.