我有DataFrame,Df2。我正在尝试检查下面的列Lead_Lag的最后10行中的每一行--如果这些行中除了null之外还有任何值,那么我希望有一个新的列Position来等于'Y'
def run_HG_AUDUSD_15M_Aggregate():
Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))
Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]
Df2['Position'] = ''
for index,row in Df2.iterrows():
if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull():
continue
else:
Df2.loc[index, 'Position'] = "Y"这些数据的样本如下:
Date Close_HG Close_AUDUSD Lead_Lag
7/19/2017 12:59 2.7 0.7956
7/19/2017 13:59 2.7 0.7955
7/19/2017 14:14 2.7 0.7954
7/20/2017 3:14 2.7 0.791
7/20/2017 5:44 2.7 0.791
7/20/2017 7:44 2.71 0.7925
7/20/2017 7:59 2.7 0.7924
7/20/2017 8:44 2.7 0.7953 Short_Both
7/20/2017 10:44 2.71 0.7964 Short_Both
7/20/2017 11:14 2.71 0.7963 Short_Both
7/20/2017 11:29 2.71 0.7967 Short_Both
7/20/2017 13:14 2.71 0.796 Short_Both
7/20/2017 13:29 2.71 0.7956 Short_Both
7/20/2017 14:29 2.71 0.7957 Short_Both
因此,在本例中,我希望新列Position的最后两个值为'Y',因为在最后10行中至少有一个位于Lead_Lag列中。我想在滚动的基础上应用这个方法,例如,第13行的“位置”值将显示在第12-3行,第12行的“位置”值将显示在第11-2行,等等。
相反,我得到了错误:
NotImplementedError: RangeIndex类型不支持
我尝试过几种移位方法(在循环之前定义,等等)。也不能让它起作用。
编辑:这是解决方案:
N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"发布于 2017-11-04 17:22:01
编辑:
在问题中的post解决方案之后,我发现OP需要其他东西--测试窗口N,因此添加了另一个answer。
旧解决方案:
通过链接使用numpy.where与布尔掩码:
m = df["Lead_Lag"].notnull() & df.index.isin(df.index[-10:])或使用iloc按位置选择列,并按reindex添加Falses
m = df["Lead_Lag"].iloc[-10:].notnull().reindex(df.index, fill_value=False)df['new'] = np.where(m, 'Y', '')
print (df)
Date Close_HG Close_AUDUSD Lead_Lag new
0 7/19/2017 12:59 2.70 0.7956 NaN
1 7/19/2017 13:59 2.70 0.7955 NaN
2 7/19/2017 14:14 2.70 0.7954 NaN
3 7/20/2017 3:14 2.70 0.7910 NaN
4 7/20/2017 5:44 2.70 0.7910 NaN
5 7/20/2017 7:44 2.71 0.7925 NaN
6 7/20/2017 7:59 2.70 0.7924 NaN
7 7/20/2017 8:44 2.70 0.7953 Short_Both Y
8 7/20/2017 10:44 2.71 0.7964 Short_Both Y
9 7/20/2017 11:14 2.71 0.7963 Short_Both Y
10 7/20/2017 11:29 2.71 0.7967 Short_Both Y
11 7/20/2017 13:14 2.71 0.7960 Short_Both Y
12 7/20/2017 13:29 2.71 0.7956 Short_Both Y
13 7/20/2017 14:29 2.71 0.7957 Short_Both Y发布于 2017-11-04 21:44:39
这就是我最后所做的:
def run_HG_AUDUSD_15M_Aggregate():
N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"发布于 2019-09-01 19:31:28
示例:
np.random.seed(123)
M = 20
Df2 = pd.DataFrame({'Lead_Lag':np.random.choice([np.nan, 'N'], p=[.3,.7], size=M)})Solution1 -熊猫:
说明:首先,比较列表示不相等,将Series.ne列用于布尔Series,然后将Series.rolling与Series.any一起用于窗口中的测试值--最后一次设置为numpy.where和Y。
N = 3
a = (Df2['Lead_Lag'].ne('N')
.rolling(N, min_periods=1)
.apply(lambda x: x.any(), raw=False))
Df2['Pos1'] = np.where(a, 'Y','N')另一个具有strides并将第一个N值设置为False的numpy解决方案:
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
x = np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N), axis=1)
Df2['Pos2'] = np.where(arr, 'Y','N')比较产出:
print (Df2)
Lead_Lag Pos1 Pos2
0 N N N
1 nan Y Y
2 nan Y Y
3 N Y Y
4 N Y Y
5 N N N
6 N N N
7 N N N
8 N N N
9 N N N
10 N N N
11 N N N
12 N N N
13 nan Y Y
14 N Y Y
15 N Y Y
16 nan Y Y
17 nan Y Y
18 N Y Y
19 N Y Ynumpy解决方案的详细信息:
准备测试第一个N1值的False值:
print (np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values]))
[False False False True True False False False False False False False
False False False True False False True True False False]大步返回布尔值的2d数组:
print (rolling_window(x, N))
[[False False False]
[False False True]
[False True True]
[ True True False]
[ True False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False False]
[False False True]
[False True False]
[ True False False]
[False False True]
[False True True]
[ True True False]
[ True False False]]用numpy.any测试每行至少一个True
print (np.any(rolling_window(x, N), axis=1))
[False True True True True False False False False False False False
False True True True True True True True]编辑:
如果用iterrows解决方案进行测试,输出是不同的。原因是在N + 1窗口中进行了此解决方案测试,因此对于相同的输出,必须将1添加到N中。
N = 3
Df2['Position'] = ''
for index,row in Df2.iterrows():
#for check windows
#print (Df2.loc[index-N:index,"Lead_Lag"])
if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
Df2.loc[index, 'Position'] = "Y"
else:
Df2.loc[index, 'Position'] = "N"
a = (Df2['Lead_Lag'].ne('N')
.rolling(N + 1, min_periods=1)
.apply(lambda x: x.any(), raw=False) )
Df2['Pos1'] = np.where(a, 'Y','N')
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
x = np.concatenate([[False] * (N), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N + 1), axis=1)
Df2['Pos2'] = np.where(arr, 'Y','N')print (Df2)
Lead_Lag Position Pos1 Pos2
0 N N N N
1 nan Y Y Y
2 nan Y Y Y
3 N Y Y Y
4 N Y Y Y
5 N Y Y Y
6 N N N N
7 N N N N
8 N N N N
9 N N N N
10 N N N N
11 N N N N
12 N N N N
13 nan Y Y Y
14 N Y Y Y
15 N Y Y Y
16 nan Y Y Y
17 nan Y Y Y
18 N Y Y Y
19 N Y Y Yhttps://stackoverflow.com/questions/47113370
复制相似问题