我正在尝试合并到pandas数据帧,一个叫做DAILY,另一个叫做SF1。
每日csv:
ticker,date,lastupdated,ev,evebit,evebitda,marketcap,pb,pe,ps
A,2020-09-14,2020-09-14,31617.1,36.3,26.8,30652.1,6.2,44.4,5.9
SF1 csv (不确定为什么滚动条缩进,忽略它):
ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital
A,ARQ,2020-09-14,2020-09-14,2020-09-14,2020-09-14,53000000,7107000000,,4982000000,2125000000,,10.219,-30000000,1368000000,1368000000,1160000000,131000000,2.41,0.584,665000000,111000000,554000000,665000000,281000000,96000000,0,0.0,0.0,202000000,298000000,0.133,298000000,202000000,202000000,0.3,0.3,0.3,4486000000,,4486000000,50960600000,,,354000000,0.806,1.0,1086000000,0.484,0,0,4337000000,,1567000000,42000000,42000000,0,2621000000,2067000000,554000000,51663600000,1368000000,-160000000,2068000000,111000000,0,1192000000,-208000000,-42000000,384000000,0,131000000,131000000,131000000,0,0,0.058,915000000,171000000,635000000,0.0,11.517,,,1408000000,0,114.3,,,1445000000,131000000,2246000000,2246000000,290000000,,,,,0,625000000,1.0,452000000,439000000,440000000,5.116,7107000000,0,71000000,113000000,16.189,2915000000
数据排序/清理代码:
sf1 = sf1.drop(columns=['number','dimension', 'datekey', 'reportperiod','lastupdated', 'ev', 'evebit', 'evebitda', 'marketcap', 'pb', 'pe', 'ps'])
daily = daily.sort_values('date', ascending=True)
sf1 = sf1.sort_values('calendardate', ascending=True)
daily = daily.sort_values('ticker')
sf1 = sf1.sort_values('ticker')
合并数据帧的代码:
df = pd.merge_asof(daily, sf1, by = 'ticker', left_on='date', right_on='calendardate', tolerance=pd.Timedelta(value=100, unit='D'), direction='backward')
我认为可能导致错误的原因是数据帧正在被自动收报器列合并。我不确定它是否必须是一个int,或者是一个日期格式,或者任何特定的格式。目前,它只是上面所示的公司的股票代码。
数据帧将合并到DAILY csv中的date列和SF1 csv中的calendardate列。
如果有人还能区分合并时发生的事情
如果你只有一个
和
..。
发布于 2021-02-28 00:26:45
你面对这个问题是因为你的date
列在
“每日”
和calendardate
列在object
i.e
'sf1‘
的类型
i.edatatime
pd.to_datetime()
只需将其类型更改为
由
方法
所以只需将这两行代码添加到您的
数据排序/清理代码
:-
daily['date']=pd.to_datetime(daily['date'])
sf1['calendardate']=pd.to_datetime(sf1['calendardate'])
现在写下:
df = pd.merge_asof(daily, sf1, by = 'ticker', left_on='date', right_on='calendardate', tolerance=pd.Timedelta(value=100, unit='D'), direction='backward')
https://stackoverflow.com/questions/66400763
复制相似问题