从另一个df中的逗号分隔字符串中提取数据字符串。因此,每行中有不同数量的字符串。列A中的字符串可能是也可能不是我要复制的字符串。如果不是,则B列中的字符串将为。有些行包括列D和E中的数据,但我们不必使用这些列。(在现实世界中,这些是网站urls,我尝试只收集来自特定域的urls,可能是第一个,也可能是行中的第二个。我正在尝试使用np.where,但我得不到一致的结果,特别是当正确的字符串在A列中但没有在B列中重复时。Np.where似乎只应用"y“,而不应用"x”。我还尝试了if/where in循环的变体,但没有得到好的结果。
import pandas as pd
df = pd.DataFrame({"A": ["blue lorry", "yellow cycle", "red car", "blue lorry", "red truck", "red bike", "blue jeep", "yellow skate", "red bus"], "B": ["red train", "red cart", "red car", "red moto",'', "red bike", "red diesel", "red carriage",''], "C": ['','','', "red moto",'', "red bike", "red diesel", "red carriage",''], "D": ['','','', "red moto",'', "red bike", '','','']})
这会产生df:
A B C D
0 blue lorry red train
1 yellow cycle red cart
2 red car red car
3 blue lorry red moto red moto red moto
4 red truck
5 red bike red bike red bike red bike
6 blue jeep red diesel red diesel
7 yellow skate red carriage red carriage
8 red bus
当我运行时:
df['Red'] = np.where("red" in df['A'], df['A'], df['B'])
它返回:
A B C D Red
0 blue lorry red train red train
1 yellow cycle red cart red cart
2 red car red car red car
3 blue lorry red moto red moto red moto red moto
4 red truck
5 red bike red bike red bike red bike red bike
6 blue jeep red diesel red diesel red diesel
7 yellow skate red carriage red carriage red carriage
8 red bus
我知道它的基本结构是:numpy.where(numpy.where,x,y)
我尝试应用代码,所以条件是查找"red“,如果找到"red”,则复制A列中的字符串,如果没有,则复制B列中的字符串,但似乎只得到B列的字符串。任何帮助都是非常感谢的。
显然我是新来的。我从这些主题中收集了一些关于np.where的帮助,但我认为使用数字值和字符串以及我的多个列之间存在一些差异:
np.where Not Working in my Pandas
Efficiently replace values from a column to another column Pandas DataFrame
Update Value in one column, if string in other column contains something in list
https://stackoverflow.com/questions/56549432
复制相似问题