在之前的文章中介绍了基于用户的协同过滤python代码实现方法(戳?基于用户的协同过滤),本次接着来看基于物品的协同过滤如何用python实现。
1
原理回顾
基于物品的协同过滤算法中心思想,就是给用户推荐与他们喜欢的商品类似的商品。因此在实现过程中有如下几步:
(图片来自网络)
(图片来自网络)
上图中矩阵C记录了同时喜欢两个物品的用户数,这样我们就可以得到物品之间的相似度矩阵W。
2
python案例演示
这里使用用户对电影的打分数据进行案例演示:
########获得初始化数据
def __init__(self,data):
data_dic = {}
for line in data.itertuples():
if not line[1] in data_dic.keys():
data_dic[line[1]]={line[4]:line[2]}
else:
data_dic[line[1]][line[4]]=line[2]
self.data = data_dic
self.ItemSimilarity()
def ItemSimilarity(self):
self.itemSim = dict()
movie_popular = dict() #item_user_count{item: likeCount} the number of users who like the item
count = dict() #count{i:{j:value}} the number of users who both like item i and j
for user,movies in self.data.items():
for movie in movies:
if movie not in movie_popular:
movie_popular[movie] = 0
movie_popular[movie] += 1
movie_count = len(movie_popular)
print('Total movies: %d'% movie_count)
for user,movies in self.data.items():
for m1 in movies:
for m2 in movies:
if m1 == m2:
continue
self.itemSim.setdefault(m1,{})
self.itemSim[m1].setdefault(m2,0)
self.itemSim[m1][m2] += 1/math.log(1+len(movies))
print('Build co-rated users matrix success!')
for m1,related_movies in self.itemSim.items():
for m2,count in related_movies.items():
if movie_popular[m1] == 0 or movie_popular[m2] == 0:
movie_sim_matrix[m1][m2] = 0
else:
self.itemSim[m1][m2] = count / math.sqrt(movie_popular[m1]*movie_popular[m2])
print('Calculate movie similarity matrix success!')
max_w = 0
for m1,related_movies in self.itemSim.items():
for m2,_ in related_movies.items():
if self.itemSim[m1][m2] > max_w:
max_w = self.itemSim[m1][m2]
for m1,related_movies in self.itemSim.items():
for m2,_ in related_movies.items():
self.itemSim[m1][m2] = self.itemSim[m1][m2]/max_w
def Recomand(self,user,n_sim_movie=10,n_rec_movie=5):
K = n_sim_movie
N = n_rec_movie
rank = {}
watched_movies = self.data[user]
for movie,rating in watched_movies.items():
for related_movie,w in sorted(self.itemSim[movie].items(),key=itemgetter(1),reverse=True)[:K]:
if related_movie in watched_movies:
continue
rank.setdefault(related_movie,0)
rank[related_movie] += w*float(rating)
return sorted(rank.items(),key=itemgetter(1),reverse=True)[0:N]
最终得到结果如下:
后台回复“协同过滤物品”获得数据及完整代码