论文阅读11-----基于强化学习的推荐系统

原创

邵维奇

修改于 2021-01-20 10:33:29

6510

修改于 2021-01-20 10:33:29

文章被收录于专栏：用户6881919的专栏

Large-Scale Interactive RecommendationwithTree-Structured Policy Gradient

Abstract Reinforcement learning (RL) has recently been introduced to interactive recommender systems (IRS) because of its nature of learning from dynamic interactions and planning for longrun performance.

RL可以被用于IRS因为它动态的特性以及为长期行为的打算。

As IRS is always with thousands of items to recommend (i.e., thousands of actions), most existing RLbased methods, however, fail to handle such a large discrete action space problem and thus become inefﬁcient. The existing work that tries to deal with the large discrete action space problem by utilizing the deep deterministic policy gradient framework suffers from the inconsistency between the continuous action representation (the output of the actor network) and the real discrete action.

需要推荐的东西比较多，为了能够把RL用于推荐系统我们常常采用DDPG格式，但是DDPG格式会出现真是action和outpput出来的action之间的差异（一般采用cos similarity或是欧氏距离最近）

To avoid such inconsistency and achieve high efﬁciency and recommendation effectiveness, in this paper,

我们解决两者之间的不连贯性以及提高了它的效率。

we propose a Tree-structured Policy Gradient Recommendation (TPGR) framework, where a balanced hierarchical clustering tree is built over the items and picking an item is formulated as seeking a path from the root to a certain leaf of the tree.

就是我们采用了层次化的聚集树，所白了一层一层从上往下走，最后的叶子结点为action，每一层形成一个policy gradient选择下一层直到最后一个。

Extensive experiments on carefully-designed environments based on two real-world datasets demonstrate that our model provides superior recommendation performance and signiﬁcant efﬁciency improvement over state-of-the-art methods.

实验证明我们很厉害。