文章/答案/技术大牛

发布

社区首页 >问答首页 >从行值和填充熊猫创建列。

问从行值和填充熊猫创建列。
EN

Stack Overflow用户

提问于 2022-03-04 09:14:59

回答 1查看 750关注 0票数 2

我有一个像这样的数据文件：

df=pd.read_csv('https://raw.githubusercontent.com/amanaroratc/hello-world/master/ask_git.csv')

    Channel_ID  Video_Category_Name score_pct
0   UC--bUZc5c9WseZNqGR6KLxA    Autos & Vehicles    0.213702
1   UC--bUZc5c9WseZNqGR6KLxA    Entertainment   0.786298
2   UC-B1L3oT81XgeeGh6S12qgQ    People & Blogs  1.000000
3   UC-N_7HFKrSsYxCSA_kfdRSA    People & Blogs  0.137261
4   UC-N_7HFKrSsYxCSA_kfdRSA    Pets & Animals  0.862739
... ... ... ...
819 UCzsNLZ9GrGXRjt0QmvWFm2Q    Entertainment   0.945243
820 UCzsNLZ9GrGXRjt0QmvWFm2Q    Film & Animation    0.002046
821 UCzsNLZ9GrGXRjt0QmvWFm2Q    Music   0.002797
822 UCzsNLZ9GrGXRjt0QmvWFm2Q    News & Politics 0.000433
823 UCzsNLZ9GrGXRjt0QmvWFm2Q    People & Blogs  0.000358

在Video_Category_Name中有15个不同的值

df.Video_Category_Name.unique()

给出

array(['Autos & Vehicles', 'Entertainment', 'People & Blogs',
       'Pets & Animals', 'Howto & Style', 'Education', 'Gaming', 'Music',
       'Comedy', 'Travel & Events', 'Science & Technology',
       'Nonprofits & Activism', 'Sports', 'Film & Animation',
       'News & Politics'], dtype=object)

 In [3]: iwantthis
  Out[3]:
     Channel_ID  Autos & Vehicles Entertainment People & Blogs ...
  0  UC--bUZc5c9WseZNqGR6KLxA  0.213702 0.786298 0 ...
  1  UC-B1L3oT81XgeeGh6S12qgQ  0        0        1.0000 ...

如何为这15中的每一列创建一个列并从score_pct填充值(如果不存在0)？不知道如何使用解栈/熔融/枢轴或其他东西

pandas

dataframe

python

Stack Overflow用户

回答已采纳

发布于 2022-03-04 09:39:21

我认为pivot()是解决问题的正确函数。它接受Video_Category_Name的范畴值，并创建新的列，其中填充了score_pct的值。不存在的值用“`filna(0)”替换为零：

df = df.pivot(index='Channel_ID', columns='Video_Category_Name', values='score_pct').fillna(0).reset_index()

输出：

Video_Category_Name Channel_ID  Autos & Vehicles    Comedy  Education   Entertainment   Film & Animation    Gaming  Howto & Style   Music   News & Politics Nonprofits & Activism   People & Blogs  Pets & Animals  Science & Technology    Sports  Travel & Events
0   UC--bUZc5c9WseZNqGR6KLxA    0.213702    0.0 0.0 0.786298    0.0 0.0 0.0 0.0 0.0 0.0 0.000000    0.000000    0.0 0.0 0.0
1   UC-B1L3oT81XgeeGh6S12qgQ    0.000000    0.0 0.0 0.000000    0.0 0.0 0.0 0.0 0.0 0.0 1.000000    0.000000    0.0 0.0 0.0
2   UC-N_7HFKrSsYxCSA_kfdRSA    0.000000    0.0 0.0 0.000000    0.0 0.0 0.0 0.0 0.0 0.0 0.137261    0.862739    0.0 0.0 0.0
3   UC-T4JheeuNl2DVg-B-v7McA    0.000000    0.0 0.0 0.000000    0.0 0.0 1.0 0.0 0.0 0.0 0.000000    0.000000    0.0 0.0 0.0
4   UC-WG1VP4am6NaUtANEJxRQw    0.000000    0.0 0.0 0.000000    0.0 0.0 0.0 0.0 0.0 0.0 1.000000    0.000000    0.0 0.0 0.0

编辑1:评论中提到，只有在Channel_ID是唯一的情况下才能工作。如果不是(或为了安全起见)，也可以将索引包括在pivot操作中。之后，再次恢复索引：

df = df.reset_index().pivot(index=['index', 'Channel_ID'], columns='Video_Category_Name', values='score_pct').fillna(0).reset_index(level=1)

编辑2：dataframe中的Video_Category_Name只是列的标签，不应该更改任何内容。但是，您可以使用以下行轻松地删除它：

df = df.rename_axis(None, axis=1)

对于Edit 1的解决方案，您还可能希望删除索引名称，这可以通过相同的操作和不同的轴来完成：

df = df.rename_axis(None, axis=0)

票数 2

查看全部 1 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71349085

复制

相似问题

问从行值和填充熊猫创建列。
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从行值和填充熊猫创建列。EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从行值和填充熊猫创建列。
EN