背景
我希望在数据帧中标识部分匹配字符串的列名,并将它们替换为原始名称加上添加到其中的一些新元素。新元素是由列表定义的整数。这是一个similar question,但我担心建议的解决方案在我的特定情况下不够灵活。here是另一篇文章,提供了一些优秀的答案,与我面临的问题非常接近。
一些研究
我知道我可以组合两个字符串列表,将它们成对映射为into a dictionary,并使用字典作为df.rename函数中的输入来映射rename the columns。但是,考虑到现有列的数量会有所不同,这似乎有点太复杂,而且不太灵活。要重命名的列数也是如此。
下面的代码片段将生成一个输入示例:
# Libraries
import numpy as np
import pandas as pd
import itertools
# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','obs_1','obs_2','obs_3','obs_4'])
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print(df)输入

我希望标识以obs_开头的列名,并在=符号后面添加列表newElements = [5, 10, 15, 20]中的元素(整数)。名为Price的列保持不变。出现在obs_列之后的其他列也应该保持不变。
以下代码片段将演示所需的输出:
# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])
df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print(df2)输出

我的尝试
# Define the partial string I'm lookin for
stringMatch = 'Obs_'
# Put existing column names in a list
oldnames = list(df)
# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]
# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()
# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']
# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:
#print(repr(oldElement) + repr(str_newElements[i]))
newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
i = i + 1
# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)
print('My attempt: ', df)

我已经列出了新列名的完整列表,我也可以使用df.columns = newnames,但希望你们中的某个人能建议以一种更pythonic的方式使用df.rename。
感谢您的任何建议!
下面是一个简单的复制-粘贴的完整代码:
# Libraries
import numpy as np
import pandas as pd
import itertools
# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','obs_1','obs_2','obs_3','obs_4'])
datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print('Input: ', df)
# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])
df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print('Desired output: ', df2)
# My attempts
# Define the partial string I'm lookin for
stringMatch = 'Obs_'
# Put existing column names in a list
oldnames = list(df)
# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]
# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()
# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']
# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:
#print(repr(oldElement) + repr(str_newElements[i]))
newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
i = i + 1
# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)
print('My attempt: ', df)编辑:后果
仅仅一天之后就有了这么多好的答案,这真是令人惊叹!这使得它很难决定接受哪个答案。我不知道下面的内容是否会给这篇文章整体上增加多少价值,但我继续将所有的建议包装到函数中,并用%timeit对它们进行了测试。
结果如下:

建议框架HH1是第一个发布的,也是执行时间最快的之一。如果任何人感兴趣,我将在稍后包括代码。
编辑2
当我尝试时,来自suvy的建议呈现了以下结果:

直到最后一行,代码段都工作得很好。运行行df = df.rename(columns=dict(zip(names,renames)))之后,数据帧如下所示:

发布于 2017-06-15 20:51:06
这行得通吗?
df.columns = [col + ' = ' + str(newElements.pop(0)) if col.startswith(stringMatch) else col for col in df.columns]发布于 2017-06-15 20:56:27
您可以使用列表理解:
df.columns = [ i if "_" not in i else i + "=" + str(newElements[int(i[-1])-1]) for i in df.columns]输出
Price obs_1=5 obs_2=10 obs_3=15 obs_4=20
0 103 92 92 96 107
1 109 100 91 90 107
2 105 99 90 104 90
3 105 109 104 94 90
4 106 94 107 93 92发布于 2017-06-15 21:12:16
从这里称为df的输入数据帧开始
Price obs_1 obs_2 obs_3 obs_4
Dates
2017-06-15 103 92 92 96 107
2017-06-16 109 100 91 90 107
2017-06-17 105 99 90 104 90
2017-06-18 105 109 104 94 90
2017-06-19 106 94 107 93 92
newElements = [5, 10, 15, 20]
names = list(filter(lambda x: x.startswith('obs'), df.columns.values))
renames = list(map(lambda x,y: ' = '.join([x,str(y)]), names, newElements))
df = df.rename(columns=dict(zip(names,renames)))返回
Price obs_1 = 5 obs_2 = 10 obs_3 = 15 obs_4 = 20
Dates
2017-06-19 103 92 92 96 107
2017-06-20 109 100 91 90 107
2017-06-21 105 99 90 104 90
2017-06-22 105 109 104 94 90
2017-06-23 106 94 107 93 92https://stackoverflow.com/questions/44567821
复制相似问题