下面是我在数据集中遇到的问题:
检查从“111-建筑物火灾”事件登录到计算机辅助调度系统和第一个单元到达现场之间所需的分钟数的分布情况。这个分布的第三个四分位数是什么。注意:分钟数可以是小数(即,不要圆圆)。
列包括日期和时间。我只想使用时间,这样我就可以从两行中减去它并得到分布。下面是我的当前代码:
time_of_issuance = dataset.loc[dataset['INCIDENT_TYPE_DESC'] == '111 - Building fire', 'INCIDENT_DATE_TIME']
time_of_creation = dataset.loc[dataset['INCIDENT_TYPE_DESC'] == '111 - Building fire', 'ARRIVAL_DATE_TIME']
print(time_of_issuance)
print(time_of_creation)
下面是上面片段的输出:
91 01/01/2013 12:58:10 AM
199 01/01/2013 02:22:56 AM
440 01/01/2013 06:20:49 AM
492 01/01/2013 07:59:48 AM
569 01/01/2013 09:47:27 AM
...
1735758 11/01/2016 03:54:41 PM
1735841 11/01/2016 04:48:49 PM
1736021 11/01/2016 07:05:58 PM
1736100 11/01/2016 08:36:32 PM
1736286 11/02/2016 12:32:51 AM
Name: INCIDENT_DATE_TIME, Length: 10379, dtype: object
91 01/01/2013 01:00:50 AM
199 01/01/2013 02:25:23 AM
440 01/01/2013 06:26:13 AM
492 01/01/2013 08:03:33 AM
569 01/01/2013 09:49:25 AM
...
1735758 11/01/2016 03:56:52 PM
1735841 11/01/2016 04:54:09 PM
1736021 11/01/2016 07:10:00 PM
1736100 11/01/2016 08:38:33 PM
1736286 11/02/2016 12:37:23 AM
Name: ARRIVAL_DATE_TIME, Length: 10379, dtype: object
我只想减去输出中所显示的两列之间的时间(以分钟为单位)。我怎样才能用Python做到这一点呢?我尝试过使用.dt.time,但它给我带来了一个错误。
在下面找到数据头:
{'IM_INCIDENT_KEY': {0: 55672688,
1: 55672692,
2: 55672693,
3: 55672695,
4: 55672697,
5: 55672698,
6: 55672699,
7: 55672700,
8: 55672703,
9: 55672705},
'FIRE_BOX': {0: 2147,
1: 818,
2: 9656,
3: 7412,
4: 4019,
5: 1328,
6: 688,
7: 9604,
8: 2897,
9: 2602},
'INCIDENT_TYPE_DESC': {0: '300 - Rescue, EMS incident, other',
1: '735A - Unwarranted alarm/defective condition of alarm system',
2: '300 - Rescue, EMS incident, other',
3: '412 - Gas leak (natural gas or LPG)',
4: '735A - Unwarranted alarm/defective condition of alarm system',
5: '735A - Unwarranted alarm/defective condition of alarm system',
6: '353 - Removal of victim(s) from stalled elevator',
7: '651 - Smoke scare, odor of smoke',
8: '331 - Lock-in (if lock out , use 511 )',
9: '710 - Malicious, mischievous false call, other'},
'INCIDENT_DATE_TIME': {0: '01/01/2013 12:00:20 AM',
1: '01/01/2013 12:00:37 AM',
2: '01/01/2013 12:01:17 AM',
3: '01/01/2013 12:02:32 AM',
4: '01/01/2013 12:01:49 AM',
5: '01/01/2013 12:02:45 AM',
6: '01/01/2013 12:03:55 AM',
7: '01/01/2013 12:04:03 AM',
8: '01/01/2013 12:04:37 AM',
9: '01/01/2013 12:05:10 AM'},
'ARRIVAL_DATE_TIME': {0: '01/01/2013 12:14:23 AM',
1: '01/01/2013 12:09:03 AM',
2: '01/01/2013 12:04:55 AM',
3: '01/01/2013 12:07:48 AM',
4: '01/01/2013 12:06:27 AM',
5: '01/01/2013 12:07:55 AM',
6: '01/01/2013 12:13:10 AM',
7: '01/01/2013 12:06:19 AM',
8: '01/01/2013 12:11:02 AM',
9: '01/01/2013 12:08:20 AM'},
'UNITS_ONSCENE': {0: 1.0,
1: 3.0,
2: 1.0,
3: 4.0,
4: 6.0,
5: 3.0,
6: 1.0,
7: 4.0,
8: 1.0,
9: 6.0},
'LAST_UNIT_CLEARED_DATE_TIME': {0: '01/01/2013 12:20:06 AM',
1: '01/01/2013 12:30:06 AM',
2: '01/01/2013 12:15:18 AM',
3: '01/01/2013 12:40:11 AM',
4: '01/01/2013 12:24:56 AM',
5: '01/01/2013 12:18:20 AM',
6: '01/01/2013 12:30:33 AM',
7: '01/01/2013 12:11:21 AM',
8: '01/01/2013 12:23:29 AM',
9: '01/01/2013 12:10:29 AM'},
'HIGHEST_LEVEL_DESC': {0: '1 - More than initial alarm, less than Signal 7-5',
1: '1 - More than initial alarm, less than Signal 7-5',
2: '1 - More than initial alarm, less than Signal 7-5',
3: '1 - More than initial alarm, less than Signal 7-5',
4: '1 - More than initial alarm, less than Signal 7-5',
5: '1 - More than initial alarm, less than Signal 7-5',
6: '1 - More than initial alarm, less than Signal 7-5',
7: '1 - More than initial alarm, less than Signal 7-5',
8: '1 - More than initial alarm, less than Signal 7-5',
9: '1 - More than initial alarm, less than Signal 7-5'},
'TOTAL_INCIDENT_DURATION': {0: 1186.0,
1: 1769.0,
2: 841.0,
3: 2259.0,
4: 1387.0,
5: 935.0,
6: 1598.0,
7: 438.0,
8: 1132.0,
9: 319.0},
'ACTION_TAKEN1_DESC': {0: '00 - Action taken, other',
1: '86 - Investigate',
2: '00 - Action taken, other',
3: '44 - Hazardous materials leak control & containment',
4: '86 - Investigate',
5: '86 - Investigate',
6: '64 - Shut down system',
7: '86 - Investigate',
8: '70 - Assistance, other',
9: '00 - Action taken, other'},
'ACTION_TAKEN2_DESC': {0: nan,
1: nan,
2: nan,
3: '64 - Shut down system',
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'ACTION_TAKEN3_DESC': {0: nan,
1: nan,
2: nan,
3: '82 - Notify other agencies.',
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'PROPERTY_USE_DESC': {0: 'UUU - Undetermined',
1: 'UUU - Undetermined',
2: 'UUU - Undetermined',
3: '429 - Multifamily dwelling',
4: 'UUU - Undetermined',
5: 'UUU - Undetermined',
6: '429 - Multifamily dwelling',
7: '960 - Street, other',
8: '429 - Multifamily dwelling',
9: 'UUU - Undetermined'},
'STREET_HIGHWAY': {0: 'E 138 ST',
1: 'W 46 ST',
2: '116 ST',
3: '43 ST',
4: 'WYCKOFF AVE',
5: 'HAMILTON AVE',
6: 'AVEOFAMERICAS',
7: '102 ST',
8: 'BOYNTON AVE',
9: '52 ST'},
'ZIP_CODE': {0: 10454.0,
1: 10036.0,
2: 11418.0,
3: 11103.0,
4: 11385.0,
5: 11215.0,
6: 10001.0,
7: 11418.0,
8: 10472.0,
9: 11219.0},
'BOROUGH_DESC': {0: '2 - Bronx',
1: '1 - Manhattan',
2: '5 - Queens',
3: '5 - Queens',
4: '5 - Queens',
5: '4 - Brooklyn',
6: '1 - Manhattan',
7: '5 - Queens',
8: '2 - Bronx',
9: '4 - Brooklyn'},
'FLOOR': {0: nan,
1: nan,
2: nan,
3: '1',
4: nan,
5: nan,
6: '18',
7: nan,
8: nan,
9: nan},
'CO_DETECTOR_PRESENT_DESC': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'FIRE_ORIGIN_BELOW_GRADE_FLAG': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'STORY_FIRE_ORIGIN_COUNT': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'FIRE_SPREAD_DESC': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'DETECTOR_PRESENCE_DESC': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'AES_PRESENCE_DESC': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan},
'STANDPIPE_SYS_PRESENT_FLAG': {0: nan,
1: nan,
2: nan,
3: nan,
4: nan,
5: nan,
6: nan,
7: nan,
8: nan,
9: nan}}
发布于 2022-10-16 08:09:10
这并不能直接回答你的问题,因为我不认为你的方法很好。试试这个:
delay = (
dataset
.query("INCIDENT_TYPE_DESC == '111 - Building fire'")
.eval("ARRIVAL_DATE_TIME - INCIDENT_DATE_TIME")
.divide(pd.Timedelta("1m"))
)
delay
将是日志日期和到达时间之间分钟的pandas.Series
,即表示一个分发。
例如,要找到第三个四分位数,请执行以下操作:
delay.quantile(0.75)
编辑
您可能需要确保您的日期时间数据是pandas.Timestamp
。执行此操作以进行转换,然后运行解决方案
dataset["INCIDENT_DATE_TIME"] = dataset["INCIDENT_DATE_TIME"].apply(pd.to_datetime)
dataset["ARRIVAL_DATE_TIME"] = dataset["ARRIVAL_DATE_TIME"].apply(pd.to_datetime)
编辑2
对于较老的熊猫版本:
df = dataset.query("INCIDENT_TYPE_DESC == '111 - Building fire'")
delay = df["ARRIVAL_DATE_TIME"] - df["INCIDENT_DATE_TIME"]
delay = delay/pd.Timedelta("1m")
https://stackoverflow.com/questions/74085367
复制相似问题