做 WRF 后处理的同僚们,你们是不是也踩过这样的坑:
ncdump -v Times wrfout_d01_xxxx 看到的是一串字符,手动拼字符串太痛苦;getvar(wrfnc, 'time', squeeze=False) 返回一个 numpy.datetime64,直接塞进 pd.to_datetime 就报 「ValueError: Could not convert object to NumPy datetime」;别慌!**官方早就给出了“一键解法”:wrf.extract_times**。今天三分钟教你“零 regex、零手工解析”搞定 WRF-out 时间序列。
!pip install wrf-python
最常见的两条“野路子”:
野路子 | 翻车现场 |
|---|---|
直接用 getvar(nc, 'time') 得到一个 numpy.datetime64,然后硬塞进 pd.to_datetime | 解析失败,因为格式不在 pandas 默认规则里 |
拼字符串 "%Y-%m-%d_%H:%M:%S",然后 datetime.strptime | 麻烦、慢、容易写出 bug |
一句话:这些都不需要!wrf-python ≥1.3 的 extract_times 已经把它们全干了。
from netCDF4 import Dataset
from wrf import extract_times # 👈 仅需这一个函数
import glob
wrf_files = glob.glob("/home/mw/input/typhoon9537/*")
wrf_list = [Dataset(f) for f in wrf_files]
times = extract_times(wrf_list, timeidx=None, method='cat') # list[datetime.datetime]
print(times[5])
2019-08-09T00:00:00.000000000
解释几个关键词:
timeidx=None → 把全部时次抓出来method='cat' → 多文件串联时也适用meta=False → 只要纯 datetime,不额外带元数据出来的 times 已经是**python 标准 datetime.datetime**,可以直接:
import pandas as pd
dt_index = pd.to_datetime(times) # 变成 pandas.DatetimeIndex
print(dt_index )
DatetimeIndex(['2019-08-09 04:00:00', '2019-08-09 06:00:00',
'2019-08-08 22:00:00', '2019-08-08 18:00:00',
'2019-08-08 20:00:00', '2019-08-09 00:00:00',
'2019-08-09 05:00:00', '2019-08-08 19:00:00',
'2019-08-09 02:00:00', '2019-08-08 23:00:00',
'2019-08-09 01:00:00', '2019-08-09 03:00:00',
'2019-08-08 21:00:00'],
dtype='datetime64[ns]', freq=None)
first = extract_times(wrf_list, timeidx=0) # 第一个时次
last = extract_times(wrf_list, timeidx=-1) # 最后一个时次
subset = extract_times(wrf_list, timeidx=[3,4,5]) # 任意时次列表
print(subset)
['2019-08-08T18:00:00.000000000' '2019-08-08T20:00:00.000000000'
'2019-08-09T00:00:00.000000000']
有些时候(例如版本太旧)没有 extract_times,也可以手工算:
import glob
from netCDF4 import Dataset
files = sorted(glob.glob("/home/mw/input/typhoon9537/wrfout_d01_*"))
base = datetime.strptime(
Dataset(files[0]).SIMULATION_START_DATE,
"%Y-%m-%d_%H:%M:%S")
offset = 0 # 全局分钟累加器
dt_list = []
for fn in files:
with Dataset(fn) as nc:
mins = nc.variables['XTIME'][:] # 该文件内分钟数(从 0 起)
dt = [base + timedelta(minutes=offset + m) for m in mins]
dt_list.extend(dt)
offset += mins[-1] # 更新到下一文件开始时间点
print(dt_list)
[datetime.datetime(2019, 8, 8, 18, 0), datetime.datetime(2019, 8, 8, 19, 0), datetime.datetime(2019, 8, 8, 21, 0), datetime.datetime(2019, 8, 9, 0, 0), datetime.datetime(2019, 8, 9, 4, 0), datetime.datetime(2019, 8, 9, 9, 0), datetime.datetime(2019, 8, 9, 15, 0), datetime.datetime(2019, 8, 9, 22, 0), datetime.datetime(2019, 8, 10, 6, 0), datetime.datetime(2019, 8, 10, 15, 0), datetime.datetime(2019, 8, 11, 1, 0), datetime.datetime(2019, 8, 11, 12, 0), datetime.datetime(2019, 8, 12, 0, 0)]
但相信我,既然能一句话解决,何必写 for 循环呢?
"""
提取 wrfout 的时间变量并生成标准 pandas.DatetimeIndex
"""
from netCDF4 import Dataset
from wrf import extract_times
import pandas as pd
wrf_files = glob.glob("/home/mw/input/typhoon9537/*")
wrf_list = [Dataset(f) for f in wrf_files]
# 一步拿到时间列表
times = extract_times(wrf_list, timeidx=None, method='cat')
# 转 pandas.Index,直接做时间筛选、绘图、重采样
dt_idx = pd.to_datetime(times)
print(f"共 {len(dt_idx)} 个时次")
print("前 5 个:\n", dt_idx[:5])
共 13 个时次
前 5 个:
DatetimeIndex(['2019-08-09 04:00:00', '2019-08-09 06:00:00',
'2019-08-08 22:00:00', '2019-08-08 18:00:00',
'2019-08-08 20:00:00'],
dtype='datetime64[ns]', freq=None)
需求 | 推荐做法 |
|---|---|
拿全部时次的 Python datetime | extract_times(nc, timeidx=None) |
拿全部时次的 pandas.DatetimeIndex | pd.to_datetime(extract_times(...)) |
拿单条或某些时次 | 传入 timeidx |
千万别再手动拼字符串! | — |
下次遇到时间解析报错,先问问自己:我老老实实调 extract_times 了吗?
祝大家后处理不再被时间折磨,把宝贵的脑细胞留给科研!
点个赞,分享到群,隔壁同事就少踩一个坑!