首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Python中的文档矢量化表示

Python中的文档矢量化表示
EN

Stack Overflow用户
提问于 2016-08-03 00:36:17
回答 1查看 603关注 0票数 1

我试着在python 3中进行情感分析,并使用TDF-下手向量器和单词包模型来向量化文档。

因此,对于任何熟悉这一点的人来说,很明显,得到的矩阵表示是稀疏的。

下面是我的代码片段。首先是文件。

代码语言:javascript
运行
复制
tweets = [('Once you get inside you will be impressed with the place.',1),('I got home to see the driest damn wings ever!',0),('An extensive menu provides lots of options for breakfast.',1),('The flair bartenders are absolutely amazing!',1),('My first visit to Hiro was a delight!',1),('Poor service, the waiter made me feel like I was stupid every time he came to the table.',0),('Loved this place.',1),('This restaurant has great food',1),
      ('Honeslty it did not taste THAT fresh :(',0),('Would not go back.',0),
       ('I was shocked because no signs indicate cash only.',0),
        ('Waitress was a little slow in service.',0),
        ('did not like at all',0),('The food, amazing.',1),
        ('The burger is good beef, cooked just right.',1),
        ('They have horrible attitudes towards customers, and talk down to each one when customers do not enjoy their food.',0),
        ('The cocktails are all handmade and delicious.',1),('This restaurant has terrible food',0),
        ('Both of the egg rolls were fantastic.',1),('The WORST EXPERIENCE EVER.',0),
        ('My friend loved the salmon tartar.',1),('Which are small and not worth the price.',0),
        ('This is the place where I first had pho and it was amazing!!',1),
        ('Horrible - do not waste your time and money.',0),('Seriously flavorful delights, folks.',1),
        ('I loved the bacon wrapped dates.',1),('I dressed up to be treated so rudely!',0),
        ('We literally sat there for 20 minutes with no one asking to take our order.',0),
        ('you can watch them preparing the delicious food! :)',1),('In the summer, you can dine in a charming outdoor patio - so very delightful.',1)]

X_train, y_train = zip(*tweets)

和下面的代码来向量化文档。

代码语言:javascript
运行
复制
tfidfvec = TfidfVectorizer(lowercase=True)
vectorized = tfidfvec.fit_transform(X_train)

print(vectorized)

当我打印vectorized时,它不会输出正常矩阵。相反,这是:

如果我没有错,这一定是稀疏矩阵表示。然而,我无法理解它的形式,以及每个术语的含义。

此外,还有30份文件。所以,这解释了第一列的0-29。如果这是趋势,那么我猜第二栏是单词的索引,最后一个值是tf-国防军?在我打问题的时候,我突然想到了,但如果我错了,请纠正我。

有经验的人能帮我更好地理解它吗?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-08-03 00:51:51

是的,从技术上讲,前两个元组表示行列位置,第三列表示该位置中的值。它基本上显示了非零值的位置和值。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/38732561

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档