我有一个数据集,其中包含两列:user_id和product_name。DataFrame如下所示:
index user_id product_name
0 user1 A
1 user1 A
2 user2 A
3 user3 B我正在寻找一种方法来将这个表转换为推荐系统的交互矩阵:
A B
user1 2 0
user2 1 0
user3 0 1发布于 2021-06-13 17:05:20
给定的答案不适用于大型事件数据框,因此我建议您将其存储为稀疏矩阵,您可以执行以下操作
frame= pd.DataFrame.from_records([('user1' ,'A') , ('user1','A'), ('user2' , 'A') , ('user3' ,'B')] , columns = ['user_id' , 'product_name'])
from scipy.sparse import csr_matrix
from pandas.api.types import CategoricalDtype
def incident_to_sparse_interaction_matrix(frame,user_column,item_column):
#create datatypes to count and index your categorical data (like user_id , item_id)
users = CategoricalDtype(sorted(frame[user_column].unique()), ordered=True)
items = CategoricalDtype(sorted(frame[item_column].unique()), ordered=True)
frame['score'] = 1 # add score column to fill the interaction matrix with this can be score of the movie or simple 1 as indicator variable
row = frame[user_column].astype(users).cat.codes
col = frame[item_column].astype(items).cat.codes
sparse_matrix = csr_matrix((frame['score'], (row, col)), \
shape=(users.categories.size, items.categories.size))
return sparse_matrix
collab_sparse = incident_to_sparse_interaction_matrix(frame , 'user_id' , 'product_name')
print(collab_sparse.toarray())将稀疏矩阵转换为密集矩阵如下所示
[[2 0]
[1 0]
[0 1]]https://stackoverflow.com/questions/60776407
复制相似问题