我的df是这样的
Date Description Debit Credit Balance originalIdx
0 01-03-19 AAAA NaN NaN 49Cr 0
1 01-03-19 ASSS NaN 6,000.00 55Cr 1
2 NaN XYZ ABC saa NaN 1
3 01-03-19 ABZ 289.00 NaN 55Cr 3
我想要这个
Date Description Debit Credit Balance originalIdx
0 01-03-19 AAAA NaN NaN 49Cr 0
1 01-03-19 ASSSXYZABCsaa NaN 6,000.00 55Cr 1
3 01-03-19 ABZ 289.00 NaN 55Cr 3
如果originalIdx相同,我想合并行,因此合并描述列中的行
这是我的实时数据
发布于 2021-03-11 11:49:35
假设如果需要合并行,Date
将具有NaN
,以下是代码。
首先创建一个虚列merged
。它将合并Description
、Debit
和Credit
的所有值。仅当值为alpha时才会合并(不包括数字值)
然后使用groupby transform (λ)函数替换Description
。
如果行的NaN为Date
,则dropna。还要删除temp列merged
。
df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)
df['Description'] = df.groupby("originalIdx")['merged'].transform(lambda x: "".join(x))
df.dropna(subset=['Date'],inplace=True)
df.drop(columns='merged',inplace=True)
print (df)
这将为您提供:
Date Description Debit Credit Balance originalIdx
0 01-03-19 AAAA NaN NaN 49Cr 0
1 01-03-19 ASSSXYZABCsaa NaN 6000.0 55Cr 1
3 01-03-19 ABZ 289.0 NaN 55Cr 3
下面是包含数据和输出的完整代码。
将您的df['merged']
替换为以下代码:
df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)
完整代码为:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_colwidth', 250)
c = ['Date','Description','Debit','Credit','Balance','originalIdx']
d = [['01-03-19','FORTAP-MUMBAI/',np.NaN, np.NaN, '49656.25Cr',0],
['01-03-19','FORTAP-MUMBAI/******',np.NaN,6000.00,'55656.25Cr',1],
[np.NaN,'UP/*ABC*/*DEF*','UPI/*PQR*/*XYZ*','paytm/NA',np.NaN,1],
['01-03-19','MBK/*ABCDEF*/*ZZZ*',289.00,np.NaN,'55357.25Cr',3]]
df = pd.DataFrame(d,columns=c)
print (df)
df['merged'] = df[['Description','Debit','Credit']].apply(lambda x: ''.join([str(a) for a in x if pd.notnull(a) and not isinstance(a, float)]) ,axis=1)
df['Description'] = df.groupby("originalIdx")['merged'].transform(lambda x: "".join(x))
df.dropna(subset=['Date'],inplace=True)
df.drop(columns='merged',inplace=True)
print (df)
附加输出之前和之后:
之前:
Date Description Debit Credit Balance originalIdx
0 01-03-19 FORTAP-MUMBAI/ NaN NaN 49656.25Cr 0
1 01-03-19 FORTAP-MUMBAI/****** NaN 6000.0 55656.25Cr 1
2 NaN UP/*ABC*/*DEF* UPI/*PQR*/*XYZ* paytm/NA NaN 1
3 01-03-19 MBK/*ABCDEF*/*ZZZ* 289.0 NaN 55357.25Cr 3
之后:
Date Description Debit Credit Balance originalIdx
0 01-03-19 FORTAP-MUMBAI/ NaN NaN 49656.25Cr 0
1 01-03-19 FORTAP-MUMBAI/******UP/*ABC*/*DEF*UPI/*PQR*/*XYZ*paytm/NA NaN 6000.0 55656.25Cr 1
3 01-03-19 MBK/*ABCDEF*/*ZZZ* 289.0 NaN 55357.25Cr 3
发布于 2021-03-11 11:29:42
我知道你用的是熊猫!
df.fillna("").groupby("originalIdx").agg(lambda x: "".join(x))
你可以用这一行很容易做到。
我们对列进行分组,并使用agg执行用于连接输出的lambda函数
编辑:
为了控制数据None,我们使用.fillna ("").
将None替换为空字符串
https://stackoverflow.com/questions/66580650
复制