我有一个Dataframe列,其中包含一些字符串,用于存储来自其他两个Dataframe的值,如下所示:
df的示例视图,其列col
具有以下字符串。
col
Highest Sales was for Mobile Scott
Lowest Returns was for Mobile Phone Steve
Low Returns was for Paul
我尝试从上面的Dataframe中提取值,以便创建一个新列,其中包含产品名称(从prod_df
获得)和代表名称(从sales_rep_df
获得)
prod_df
数据帧中的数据
prod_df
Laptop
Computer
Mobile
Mobile Phone
sales_rep_df
数据帧中的数据
sales_rep_df
Scott
Steve
Paul
预期输出
col, prod, rep
Highest Sales was for Mobile Scott, Mobile
Lowest Returns was for Mobile Phone Steve, Mobile Phone, Steve
Low Returns was for Paul,,Paul
发布于 2020-09-29 20:08:36
我相信你需要Series.str.extract
来从列表中获取第一个参数值:
pat1 = '|'.join(r"\b{}\b".format(x) for x in prod_df['col'])
pat2 = '|'.join(r"\b{}\b".format(x) for x in sales_rep_df['col'])
df['prod'] = df['col'].str.extract('('+ pat1 + ')', expand=False)
df['rep'] = df['col'].str.extract('('+ pat2 + ')', expand=False)
或对所有匹配项使用带有Series.str.join
的Series.str.findall
:
df['prod'] = df['col'].str.findall(pat1).str.join(',')
df['rep'] = df['col'].str.findall(pat2).str.join(',')
https://stackoverflow.com/questions/64119204
复制相似问题