文章/答案/技术大牛

发布

社区首页 >问答首页 >替换部分匹配字符串的pandas数据框中的列名

问替换部分匹配字符串的pandas数据框中的列名
EN

Stack Overflow用户

提问于 2017-06-15 20:42:00

回答 5查看 4.5K关注 0票数 5

背景

我希望在数据帧中标识部分匹配字符串的列名，并将它们替换为原始名称加上添加到其中的一些新元素。新元素是由列表定义的整数。这是一个similar question，但我担心建议的解决方案在我的特定情况下不够灵活。here是另一篇文章，提供了一些优秀的答案，与我面临的问题非常接近。

一些研究

我知道我可以组合两个字符串列表，将它们成对映射为into a dictionary，并使用字典作为df.rename函数中的输入来映射rename the columns。但是，考虑到现有列的数量会有所不同，这似乎有点太复杂，而且不太灵活。要重命名的列数也是如此。

下面的代码片段将生成一个输入示例：

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
              columns = ['Price','obs_1','obs_2','obs_3','obs_4'])

datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                     periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print(df)

输入

我希望标识以obs_开头的列名，并在=符号后面添加列表newElements = [5, 10, 15, 20]中的元素(整数)。名为Price的列保持不变。出现在obs_列之后的其他列也应该保持不变。

以下代码片段将演示所需的输出：

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
              columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print(df2)

输出

我的尝试

# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()

# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:   
    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)

print('My attempt: ', df)

我已经列出了新列名的完整列表，我也可以使用df.columns = newnames，但希望你们中的某个人能建议以一种更pythonic的方式使用df.rename。

感谢您的任何建议！

下面是一个简单的复制-粘贴的完整代码：

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
                  columns = ['Price','obs_1','obs_2','obs_3','obs_4'])

datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),
                         periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print('Input: ', df)

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,110,size=(Observations, Columns)),
                  columns = ['Price','Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print('Desired output: ', df2)

# My attempts
# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5, 10, 15, 20]
oldElements = [1, 2, 3, 4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()


# Since I know the first column should not be renamed,
# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:

    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames, newnames)), inplace = True)

print('My attempt: ', df)

编辑:后果

仅仅一天之后就有了这么多好的答案，这真是令人惊叹！这使得它很难决定接受哪个答案。我不知道下面的内容是否会给这篇文章整体上增加多少价值，但我继续将所有的建议包装到函数中，并用%timeit对它们进行了测试。

结果如下：

建议框架HH1是第一个发布的，也是执行时间最快的之一。如果任何人感兴趣，我将在稍后包括代码。

编辑2

当我尝试时，来自suvy的建议呈现了以下结果：

直到最后一行，代码段都工作得很好。运行行df = df.rename(columns=dict(zip(names,renames)))之后，数据帧如下所示：

pandas

dictionary

dataframe

python

python-3.x

Stack Overflow用户

发布于 2017-06-15 21:52:21

选择所需的列，进行所需的更改，然后重新加入原始df

obs_cols = df.columns[df.columns.str.startswith('obs')]

obs_cols = [col + ' = ' + str(val) for col, val in zip(obs_cols, newElements)]

df.columns = list(df.columns[~df.columns.str.startswith('obs')]) + obs_cols


    Price   obs_1 = 5   obs_2 = 10  obs_3 = 15  obs_4 = 20
0   103     92          92          96          107
1   109     100         91          90          107
2   105     99          90          104         90
3   105     109         104         94          90
4   106     94          107         93          92

票数 2

查看全部 5 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44567821

复制

相似问题

问替换部分匹配字符串的pandas数据框中的列名
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问替换部分匹配字符串的pandas数据框中的列名EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问替换部分匹配字符串的pandas数据框中的列名
EN