问在python中重塑dataframe的列值
EN

Stack Overflow用户

提问于 2018-07-31 05:50:09

回答 2查看 54关注 0票数 -2

我有一个像下面这样没有标题的Dataframe。

当前df：

Col 0    Col 1   Col 2   Col3  
2345      abcd  mobile    oneplus
4567      abbb  internet  explorer 
mozilla   2345  cccc      dddd 
eeee      bbbb  1234      hello

我想让数字值(它是ID)作为第一列(列索引为0)。

如果在任何一行中，数字值移到Col 1，则组合Col 1和Col2值并将其放入Col 1，然后将Col3值放入Col2，然后将下一行的Col0值作为前一行的Col3。

预期输出如下：

Col 0     Col 1          Col 2    Col3  
2345      abcd           mobile    oneplus 
4567      abbbinternet   explorer  mozilla
2345      ccccdddd        eeee     bbbb    
1234      hello

非常感谢您的帮助！谢谢..

python

pandas

dataframe

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-07-31 06:02:28

您可以使用stack、set_index和unstack来完成此操作

from io import StringIO

txt = StringIO("""2345      abcd  mobile    oneplus
4567      abbb  internet  explorer 
mozilla   2345  cccc      dddd 
eeee      bbbb  1234      hello""")

df = pd.read_csv(txt, header=None, sep='\s+')

df = df.stack().reset_index(drop=True)

df1 = df.to_frame().set_index(df.str.isnumeric().cumsum())

df_out = df1.set_index(df1.groupby(level=0).cumcount(), append=True)[0].unstack()
df_out

输出：

      0      1         2         3        4
1  2345   abcd    mobile   oneplus      NaN
2  4567   abbb  internet  explorer  mozilla
3  2345   cccc      dddd      eeee     bbbb
4  1234  hello       NaN       NaN      NaN

票数 1

Stack Overflow用户

发布于 2018-07-31 08:51:43

在将这些数据读取到pandas之前，做一些清理工作可能会更容易。假设你的数据是一个CSV，不是最漂亮的代码，但是下面的代码应该可以做到：

import numpy as np
import pandas as pd
import re

filename = "<path to file>.csv"
new_file = "<path to where fixed csv should go>.csv"

with open(filename, "r") as infile:
    text = infile.read()

# get rid of existing new line characters
text = text.replace("\n", ",")

# put a new line before every number
out = re.sub("([0-9]+)", "\n\\1", text)

# write out
with open(new_file, "w+") as outfile:
    outfile.write(out)

# read in the fixed csv -- need to provide a number of columns
# greater than you'll need (using 50 here), and then cut the excess
df = pd.read_csv(new_file, header=None, names=range(50)).dropna(how="all", axis=1)

# jam as many columns into column1 as necessary to get just 3 after ID
df["cols_to_jam"] = df[df.columns[1:]].notnull().sum(axis=1) - 3

def jam(row):
     if row["cols_to_jam"] > 0:
         new = ""
         for col in range(1, row["cols_to_jam"] + 2):
             new += str(row[col])
     else:
         new = row[1]
     return new

idx = df[0]
col1 = df.apply(jam, axis=1)

# blank out jammed values
for i, row in df.iterrows():
    if row["cols_to_jam"] > 0:
        for col in range(1, row["cols_to_jam"] + 2):
            df.ix[i, col] = np.nan
    else:
        df.ix[i, 1] = np.nan

del df["cols_to_jam"], df[0]

remaining_cols = df.apply(lambda x: list(x.dropna().tail(2).values), axis=1).apply(pd.Series)
remaining_cols.columns = ["col2", "col3"]

# put it all together
output = idx.to_frame("id").join(col1.to_frame("col1")).join(remaining_cols)

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51602750

复制

相似问题

问在python中重塑dataframe的列值
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中重塑dataframe的列值EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中重塑dataframe的列值
EN