我尝试使用日期范围过滤数据,其中我有初始的start_date
,而end_date
是start_date
之后的x天。基本上,我想要的就是SQL中start_date
和DATE_ADD(start_date, INTERVAL x DAYS) AS end_date
子句之间的WHERE DATE_ADD(start_date, INTERVAL x DAYS) AS end_date
子句。
下面是我的dataframe示例
+-----------+-----------+
| date | aggregate |
+-----------+-----------+
| ... | ... |
|2022-08-31 | 42 |
|2022-09-01 | 30 |
|2022-09-02 | 65 |
|2022-09-03 | 55 |
| ... | ... |
+-----------+-----------+
所以,我在python上试过这个
import pandas as pd
from datetime import datetime, timedelta
start_date = datetime.strptime("2022-08-31", "%Y-%m-%d")
end_date = start_date + timedelta(days=3) # let say I want to have 3 days range
df_filtered = df[(df['date'] >= start_date ) & (df['date'] < end_date ]
但是,它提高了UserWarning: Boolean Series key will be reindexed to match DataFrame index.
,并产生了一个缺少几个日期的数据。
发布于 2022-10-18 17:16:21
不如将date列设置为索引,然后筛选:
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame([
['2022-08-31',42],
['2022-09-01',30],
['2022-09-02',65],
['2022-09-03',55],
],columns=['date','aggregate'])
df.date=pd.to_datetime(df['date'])
df.set_index('date',inplace=True)
start_date = datetime.strptime("2022-08-31", "%Y-%m-%d")
end_date = start_date + timedelta(days=3) # let say I want to have 3 days range
df[(df.index >= start_date ) & (df.index < end_date)]
aggregate
date
2022-08-31 42
2022-09-01 30
2022-09-02 65
https://stackoverflow.com/questions/74114566
复制相似问题