首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >基于一列添加多个数据帧

基于一列添加多个数据帧
EN

Stack Overflow用户
提问于 2018-09-16 20:42:26
回答 3查看 80关注 0票数 -1

我有数百个具有相同列名的数据帧,如下所示:

df1

代码语言:javascript
复制
        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28269  0.07365  22.16080  4050.311360   
1    4208.98    5  0.48122  0.08765  44.90035  4208.972962   
2    4374.94    9  0.71483  0.11429  86.96497  4374.927110   
3    4379.74    9  0.31404  0.09107  30.44271  4379.760601   
4    4398.01   14  0.50415  0.09845  52.83236  4398.007473 
5    5520.50    1  0.06148  0.12556   8.21685  5520.484742 

df2

代码语言:javascript
复制
        wave  num  stlines     fwhm       EWs  MeasredWave  
0    4050.32    3  0.28616  0.07521  22.91064  4050.327388   
1    4208.98    6  0.48781  0.08573  44.51609  4208.990029   
2    4374.94    9  0.71548  0.11437  87.10152  4374.944513   
3    4379.74   10  0.31338  0.09098  30.34791  4379.778009   
4    4398.01   15  0.49950  0.08612  45.78707  4398.020367   
5    4502.21    9  0.56362  0.10114  60.67868  4502.223123   
6    4508.28    3  0.69554  0.11600  85.88428  4508.291777   
7    4512.99    2  0.20486  0.08891  19.38745  4512.999332
8    5520.50    1  0.06148  0.12556   8.21685  5520.484742

我就是这么读的

代码语言:javascript
复制
path_to_files = '/home/Desktop/computed_2d/'
lst = []

for filen in dir1:
   df = pd.read_table(path_to_files+filen, skiprows=0, usecols=(0,1,2,3,4,8),names=['wave','num','stlines','fwhm','EWs','MeasredWave'],delimiter=r'\s+')

   lst.append(df)

所需的结果应如下所示:

代码语言:javascript
复制
      wave   num   stlines      fwhm        EWs  MeasredWave
0  4050.32   3.0  0.284425  0.074430  22.535720  4050.319374
1  4208.98   5.5  0.484515  0.086690  44.708220  4208.981496
2  4374.94   9.0  0.715155  0.114330  87.033245  4374.935812
3  4379.74   9.5  0.313710  0.091025  30.395310  4379.769305
4  4398.01  14.5  0.501825  0.092285  49.309715  4398.013920
5  4502.21    9   0.56362   0.10114   60.67868   4502.223123   
6  4508.28    3   0.69554   0.11600   85.88428   4508.291777   
7  4512.99    2   0.20486   0.08891   19.38745   4512.999332
8  5520.50   1.0  0.061480  0.125560  8.216850   5520.484742

正如你所看到的,行数是不一样的。现在,我想要取基于column1 wave的所有数据帧的平均值,并确保将df1的列wave的每个索引添加到df2的正确索引中

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-09-16 21:22:29

您可以使用pd.concat将所有数据帧堆叠在一个数据帧中,其中轴=1并取各列的平均值

代码语言:javascript
复制
df3 = pd.merge(df1,df2,on=['wave'],how ='outer',)
df4 = df3.rename(columns = lambda x: x.split('_')[0]).T
df4.groupby(df4.index).mean().T

输出:

代码语言:javascript
复制
    EWs         MeasredWave fwhm        num stlines      wave
0   22.535720   4050.319374 0.074430    3.0 0.284425    4050.32
1   44.708220   4208.981496 0.086690    5.5 0.484515    4208.98
2   87.033245   4374.935812 0.114330    9.0 0.715155    4374.94
3   30.395310   4379.769305 0.091025    9.5 0.313710    4379.74
4   49.309715   4398.013920 0.092285    14.5 0.501825   4398.01
5   8.216850    5520.484742 0.125560    1.0 0.061480    5520.50
6   60.678680   4502.223123 0.101140    9.0 0.563620    4502.21
7   85.884280   4508.291777 0.116000    3.0 0.695540    4508.28
8   19.387450   4512.999332 0.088910    2.0 0.204860    4512.9
票数 1
EN

Stack Overflow用户

发布于 2018-09-16 21:06:26

下面是一个执行您所需操作的示例:

代码语言:javascript
复制
import pandas as pd

df1 = pd.DataFrame({'A': [0, 1, 2, 3],
                    'B': [0, 1, 2, 3],
                    'C': [0, 1, 2, 3],
                    'D': [0, 1, 2, 3]},
                    index=[0, 1, 2, 3])

df2 = pd.DataFrame({'A': [4, 5, 6, 7],
                    'B': [4, 5, 6, 7],
                    'C': [4, 5, 6, 7],
                    'D': [4, 5, 6, 7]},
                    index=[0, 1, 2, 3])

df3 = pd.DataFrame({'A': [8, 9, 10, 11],
                    'B': [8, 9, 10, 11],
                    'C': [8, 9, 10, 11],
                    'D': [8, 9, 10, 11]},
                    index=[0, 1, 2, 3])

df4 =  pd.concat([df1, df2, df3])
df5 = pd.concat([df1, df2, df3], ignore_index=True)
print(df4)
print('\n\n')
print(df5)

print(f"Average of column A = {df4['A'].mean()}")

你将会有

代码语言:javascript
复制
    A   B   C   D
0   0   0   0   0
1   1   1   1   1
2   2   2   2   2
3   3   3   3   3
0   4   4   4   4
1   5   5   5   5
2   6   6   6   6
3   7   7   7   7
0   8   8   8   8
1   9   9   9   9
2  10  10  10  10
3  11  11  11  11



     A   B   C   D
0    0   0   0   0
1    1   1   1   1
2    2   2   2   2
3    3   3   3   3
4    4   4   4   4
5    5   5   5   5
6    6   6   6   6
7    7   7   7   7
8    8   8   8   8
9    9   9   9   9
10  10  10  10  10
11  11  11  11  11

Average of column A = 5.5
票数 1
EN

Stack Overflow用户

发布于 2018-09-17 01:06:49

@Naga Kiran的回答很棒。我在这里更新了整个解决方案:

代码语言:javascript
复制
import pandas as pd

df1 = pd.DataFrame(
  {'wave'        : [4050.32, 4208.98, 4374.94, 4379.74, 4398.01, 5520.50],
   'num'         : [3, 5, 9, 9, 14, 1],
   'stlines'     : [0.28269, 0.48122, 0.71483, 0.31404, 0.50415, 0.06148],
   'fwhm'        : [0.07365, 0.08765, 0.11429, 0.09107, 0.09845, 0.12556],
   'EWs'         : [22.16080, 44.90035, 86.96497, 30.44271, 52.83236, 8.21685],
   'MeasredWave' : [4050.311360, 4208.972962, 4374.927110, 4379.760601, 4398.007473, 5520.484742]},
   index=[0, 1, 2, 3, 4, 5])

df2 = pd.DataFrame(
  {'wave'        : [4050.32, 4208.98, 4374.94, 4379.74, 4398.01, 4502.21, 4508.28, 4512.99, 5520.50],
   'num'         : [3, 6, 9, 10, 15, 9, 3, 2, 1],
   'stlines'     : [0.28616, 0.48781, 0.71548, 0.31338, 0.49950, 0.56362, 0.69554, 0.20486, 0.06148],
   'fwhm'        : [0.07521, 0.08573, 0.11437, 0.09098, 0.08612, 0.10114, 0.11600, 0.08891, 0.12556],
   'EWs'         : [22.91064, 44.51609, 87.10152, 30.34791, 45.78707, 60.67868, 85.88428, 19.38745, 8.21685],
   'MeasredWave' : [4050.327388, 4208.990029, 4374.944513, 4379.778009, 4398.020367, 4502.223123, 4508.291777, 4512.999332, 5520.484742]},
   index=[0, 1, 2, 3, 4, 5, 6, 7, 8])

df3 = pd.merge(df1, df2, on='wave', how='outer')
df4 = df3.rename(columns = lambda x: x.split('_')[0]).T
df5 = df4.groupby(df4.index).mean().T
df6 = df5[['wave', 'num', 'stlines', 'fwhm', 'EWs', 'MeasredWave']]
df7 = df6.sort_values('wave', ascending = True).reset_index(drop=True)
df7
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52354246

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档