我有以下玩具数据集:
import pandas as pd
from StringIO import StringIO
# read the data
df = pd.read_csv(StringIO("""
Date Return
1/28/2009 -0.825148
1/29/2009 -0.859997
1/30/2009 0.000000
2/2/2009 -0.909546
2/3/2009 0.000000
2/4/2009 -0.899110
2/5/2009 -0.866104
2/6/2009 0.000000
2/9/2009 -0.830099
2/10/2009 -0.885111
2/11/2009 -0.878320
2/12/2009 -0.881853
2/13/2009 -0.884432
2/17/2009 -0.947781
2/18/2009 -0.966414
2/19/2009 -1.016344
2/20/2009 -1.029667
2/23/2009 -1.087432
2/24/2009 -1.050808
2/25/2009 -1.089594
2/26/2009 -1.121556
2/27/2009 -1.105873
3/2/2009 -1.205019
3/3/2009 -1.191488
3/4/2009 -1.059311
3/5/2009 -1.135962
3/6/2009 -1.147031
3/9/2009 -1.117328
3/10/2009 -1.009050"""), sep="\s+").reset_index()
我的目标是:
a)在"Return“列中找到最负的值
b)查找此值出现的日期
c)然后向上“遍历”“”列,查找返回一个特定值(在本例中为0.000000)的第一个实例。
d)查找与步骤"c“中返回的值相关联的日期
我想要的结果是:
a) -1.20519
b) 2009年3月2日
c) 0.000000
d) 2009年2月6日
我可以用下面的代码找到"a“:
max_dd = df['Maximum_Drawdown'].min()
为了获得"b",我尝试使用以下代码:
df.loc[df['Return'] == max_dd, 'Date']
但是,错误消息显示:
KeyError: 'Date'
注意:我可以让"b“在这个玩具示例中工作,但是实际的数据抛出了错误消息。以下是用于从csv文件导入数据的实际代码:
df = pd.read_csv(FILE_NAME, parse_dates=True).reset_index()
df.set_index('Date', inplace = True) <<--- this is causing the problem
发布于 2019-05-30 09:02:12
为了解决你的所有问题,你的代码可以写成:
import pandas as pd
from io import StringIO
# read the data
df = pd.read_csv(StringIO("""
Date Return
1/28/2009 -0.825148
1/29/2009 -0.859997
1/30/2009 0.000000
2/2/2009 -0.909546
2/3/2009 0.000000
2/4/2009 -0.899110
2/5/2009 -0.866104
2/6/2009 0.000000
2/9/2009 -0.830099
2/10/2009 -0.885111
2/11/2009 -0.878320
2/12/2009 -0.881853
2/13/2009 -0.884432
2/17/2009 -0.947781
2/18/2009 -0.966414
2/19/2009 -1.016344
2/20/2009 -1.029667
2/23/2009 -1.087432
2/24/2009 -1.050808
2/25/2009 -1.089594
2/26/2009 -1.121556
2/27/2009 -1.105873
3/2/2009 -1.205019
3/3/2009 -1.191488
3/4/2009 -1.059311
3/5/2009 -1.135962
3/6/2009 -1.147031
3/9/2009 -1.117328
3/10/2009 -1.009050"""), sep="\s+").reset_index()
# a) find the most negative value in the "Return" column
min_value = df["Return"].min()
print("The minimum value in the dataset is: {}".format(min_value))
# b) find the date that this minimum value occurred at
min_value_date = df.iloc[df["Return"].idxmin(), :]["Date"]
print("The minimum value in the dataset occurred on: {}".format(min_value_date))
# c) find the first instance of a specified value in the dataset closest to this
# minimum value with an index less than the minimum value index
found_value = 0
found_indices = df.index[df["Return"] == found_value].tolist()
found_correct_index = -1
for index in found_indices:
if index > df["Return"].idxmin():
break
previous_index = index
found_correct_index = previous_index
try:
print("The value searched for is {0} and it is found in the index of {1}.".format(found_value, found_correct_index))
except:
print("The value searched for of {0} was not found in the dataset.".format(found_value))
# d) find the date associated with that value
found_value_date = df.iloc[found_correct_index, :]["Date"]
print("The date associated with that found value of {0} is {1}.".format(found_value, found_value_date))
发布于 2019-05-30 09:07:47
过滤数据帧中所有小于返回值最小值的行,同时返回等于零的行,然后显示最后一个值。
df.loc[(df.index < df.Return.idxmin()) & (df['Return'] == 0), "Date"].tail(1)
https://stackoverflow.com/questions/56370060
复制相似问题