文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用pandas从txt加载数据？

问如何使用pandas从txt加载数据？
EN

Stack Overflow用户

提问于 2018-03-23 00:51:52

回答 3查看 315关注 0票数 0

我已经读过这个问题Load data from txt with pandas了。但是，我的数据格式有点不同。以下是数据的示例：

product/productId: B003AI2VGA
review/userId: A141HP4LYPWMSR
review/profileName: Brian E. Erland "Rainbow Sphinx"
review/helpfulness: 7/7
review/score: 3.0
review/time: 1182729600
review/summary: "There Is So Much Darkness Now ~ Come For The Miracle"
review/text: Synopsis: On the daily trek from Juarez, Mexico to ... 

product/productId: B003AI2VGA
review/userId: A328S9RN3U5M68
review/profileName: Grady Harp
review/helpfulness: 4/4
review/score: 3.0
review/time: 1181952000
review/summary: Worthwhile and Important Story Hampered by Poor Script and Production
review/text: THE VIRGIN OF JUAREZ is based on true events...

.
.

我打算做一个情感分析，所以我只想在每个部分中获得text和score行。有谁知道如何使用熊猫做到这一点吗？或者我需要读取文件并分析每一行以提取评论和评级？

python

pandas

dataframe

loaddata

回答 3

Stack Overflow用户

发布于 2018-03-23 01:01:45

这是一种方法：

import pandas as pd
from io import StringIO

mystr = StringIO("""product/productId: B003AI2VGA
review/userId: A141HP4LYPWMSR
review/profileName: Brian E. Erland "Rainbow Sphinx"
review/helpfulness: 7/7
review/score: 3.0
review/time: 1182729600
review/summary: "There Is So Much Darkness Now ~ Come For The Miracle"
review/text: Synopsis: On the daily trek from Juarez, Mexico to ... 

product/productId: B003AI2VGA
review/userId: A328S9RN3U5M68
review/profileName: Grady Harp
review/helpfulness: 4/4
review/score: 3.0
review/time: 1181952000
review/summary: Worthwhile and Important Story Hampered by Poor Script and Production
review/text: THE VIRGIN OF JUAREZ is based on true events...""")

# replace mystr with 'file.txt'
df = pd.read_csv(mystr, header=None, sep='|', error_bad_lines=False)

df = pd.DataFrame(df[0].str.split(':', n=1).values.tolist())
df = df.loc[df[0].isin({'review/text', 'review/score'})]

结果：

               0                                                  1
4   review/score                                                3.0
7    review/text   Synopsis: On the daily trek from Juarez, Mexi...
12  review/score                                                3.0
15   review/text    THE VIRGIN OF JUAREZ is based on true events...

票数 0

Stack Overflow用户

发布于 2018-03-23 01:04:13

事实上，我不知道pandas可以读取该文件。

我建议编写一个python程序来读取您的文件，并输出csv文件，我们可以这样命名sentiment.csv：

产品ID，审阅者Id，配置文件名称，帮助，分数，时间，摘要，文本B003AI2VGA，A141HP4LYPWMSR，布莱恩E. Erland“彩虹狮身人面像”，7/7，3.0,1182729600，“现在有这么多黑暗~为奇迹而来”，概要:从墨西哥华雷斯到...

B003AI2VGA，A328S9RN3U5M68，Grady Harp,4/4,3.0,1181952000，有价值和重要的故事被糟糕的剧本和制作阻碍，华雷斯的圣母是基于真实事件...

然后，简单地使用: df = pd.read_csv('sentiment.csv')

票数 0

Stack Overflow用户

发布于 2018-03-23 04:54:00

我认为@sanrio给出的答案可能是最直接的，但这里有一个在pandas中进行字符串操作的选项

with open('your_text_file.txt') as f:
    text_lines = f.readlines()

# create pandas Series object where each value is a text line from your file
s = pd.Series(text_lines)

# remove the new-lines
s = s.str.strip()

# extract some columns using regex and represent in a dataframe
df = s.str.split('\s?(.*)/([^:]*):(.*)', expand=True)

# remove irrelevant columns
df = df.replace('', np.nan).dropna(how='all', axis=1)

def gb_organize(df_):
    """
    Organize a sub-dataframe from group-by operation.
    """
    df_ = df_.dropna()
    return pd.DataFrame(df_[3].values, index=df_[2].values).T

# pass a Series object to .groupby to iterate over consecutive non-null rows
df_result = df.groupby(df.isna().all(axis=1).cumsum(), group_keys=False).apply(gb_organize)

df_result = df_result.set_index(['productId', 'userId'])

# then you can access the records you want with the following:
df_result[['score', 'text']]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/49434247

复制

相似问题

问如何使用pandas从txt加载数据？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用pandas从txt加载数据？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用pandas从txt加载数据？
EN