先将预报区域划分为60 * 60(视空间分辨率而定，空间分辨率越高，格点数值越大) 的网格，每个网格上都存在对应的降水预报值。
事后需要对预测结果进行检验，如图1，左图为14时的预报结果y_pre，右图为真实观测结果y_obs，异同明显。那如何衡量预报好坏呢？即如何衡量预报场(y_pre) 与 观测场(真值y_obs) 的异同程度呢?
主要采取二分类思想进行评价。但针对不同需求，气象上有许多预报准确度评价指标。

图1 14时的降水预测与观测值对比

2 评价指标及其python实现

2.1 二分类介绍

假设有两个类别，正和负，分别用1，0表示，如下表格。

	预测负例	预测正例
真实负例	TN(True negative)	FP(False positive)
真实正例	FN(False negative)	TP(True positive)

该表格称为混淆矩阵(confusion matrix)。

TN : 真阴性。实际为0，预测为0.
TP : 真阳性。实际为1，预测为1.
FN: 假阳性。实际为0，预测为1.
FP: 假阴性。实际为1，预测为0.
召回率(Recall)：R=TP/(TP+FN) ，指的是被预测为正例的占总的正例的比重；
精准度(precision)：P = TP/(TP+FP) ，指被分类器判定正例中的正样本的比重；
准确率(Accuracy)：A = (TP+TN)/(TP+FN+FP+TN) ，反映了分类器对整个样本的判定能力，也就是说能将正的判定为正，负的判定为负。
F1为：

在实际应用中，我们不仅希望Accuracy高，还希望模型对每个类别都有很强的分类能力，即recall 和 precision都要高。

2.2 降水评价

2.2.1 气象二分类指标

气象上的降水评价指标基本都建立在二分类基础上。

以上面的y_pre 和 y_obs 为例，共计有3600个格点，选定一个阈值rain_threshold ，格点数值 >= rain_threshold 即为正例，否则为负例。这里采取晴雨分类，即rain_threshold = 0.1

构建混淆矩阵，晴为负例，雨为正例，如下：

	预测晴	预测雨
真实晴	TN: 1968	FP: 52	2020
真实雨	FN: 458	TP: 1122	1580
	2426	1174

Recall: R=TP/(TP+FN) = 1122/(1122 + 458) = 0.71
precision: p = TP/(TP + FP) = 1122/(1122 + 52) = 0.95
Accuracy: (TP + TN)/(TN + FP + FN + TP) = 0.86

类比到气象上，概念一致，只是换了名称。

	预测晴	预测雨
真实晴	correctnegatives	falsealarms(误警)	2020
真实雨	misses(漏报)	hits(击中)	1580
	2426	1174

代码如下：

def prep_clf(obs,pre, threshold=0.1):
    '''
    func: 计算二分类结果-混淆矩阵的四个元素
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。

    returns:
        hits, misses, falsealarms, correctnegatives
        #aliases: TP, FN, FP, TN 
    '''
    #根据阈值分类为 0, 1
    obs = np.where(obs >= threshold, 1, 0)
    pre = np.where(pre >= threshold, 1, 0)

    # True positive (TP)
    hits = np.sum((obs == 1) & (pre == 1))

    # False negative (FN)
    misses = np.sum((obs == 1) & (pre == 0))

    # False positive (FP)
    falsealarms = np.sum((obs == 0) & (pre == 1))

    # True negative (TN)
    correctnegatives = np.sum((obs == 0) & (pre == 0))

    return hits, misses, falsealarms, correctnegatives


def precision(obs, pre, threshold=0.1):
    '''
    func: 计算精确度precision: TP / (TP + FP)
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。

    returns:
        dtype: float
    '''

    TP, FN, FP, TN = prep_clf(obs=obs, pre = pre, threshold=threshold)

    return TP / (TP + FP)


def recall(obs, pre, threshold=0.1):
    '''
    func: 计算召回率recall: TP / (TP + FN)
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。

    returns:
        dtype: float
    '''

    TP, FN, FP, TN = prep_clf(obs=obs, pre = pre, threshold=threshold)

    return TP / (TP + FN)


def ACC(obs, pre, threshold=0.1):
    '''
    func: 计算准确度Accuracy: (TP + TN) / (TP + TN + FP + FN)
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。

    returns:
        dtype: float
    '''

    TP, FN, FP, TN = prep_clf(obs=obs, pre = pre, threshold=threshold)

    return (TP + TN) / (TP + TN + FP + FN)

def FSC(obs, pre, threshold=0.1):
    '''
    func:计算f1 score = 2 * ((precision * recall) / (precision + recall))
    '''
    precision_socre = precision(obs, pre, threshold=threshold)
    recall_score = recall(obs, pre, threshold=threshold)

    return 2 * ((precision_socre * recall_score) / (precision_socre + recall_score))

由以上四个基本指标，引申出许多气象降水评价指标。

1 物理概念

TS：风险评分ThreatScore;
CSI: critical success index 临界成功指数;
两者的物理概念完全一致。
如下图：y_pre_1为预测的降水区( >= threshold，下同)，y_obs_1为观测的降水区，hits为两者交界区， TS = hits/(hits + falsealarms + misses) 。其中falsealarms = y_pre_1 - hits, misses = y_obs_1 - hits。

2 代码

def TS(obs, pre, threshold=0.1):

    '''
    func: 计算TS评分: TS = hits/(hits + falsealarms + misses) 
          alias: TP/(TP+FP+FN)
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。
    returns:
        dtype: float
    '''

    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre, threshold=threshold)

    return hits/(hits + falsealarms + misses)

2.2.3 公平技巧评分(ETS)

1 物理概念

公平技巧评分(Equitable Threat Score, ETS)用于衡量对流尺度集合预报的预报效果。ETS评分表示在预报区域内满足某降水阈值的降水预报结果相对于满足同样降水阈值的随机预报的预报技巧；
ETS评分是对TS评分的改进，能对空报或漏报进行惩罚，使评分相对后者更加公平.

2 代码

def ETS(obs, pre, threshold=0.1):
    '''
    ETS - Equitable Threat Score
    details in the paper:
    Winterrath, T., & Rosenow, W. (2007). A new module for the tracking of
    radar-derived precipitation with model-derived winds.
    Advances in Geosciences,10, 77–83. https://doi.org/10.5194/adgeo-10-77-2007
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: ETS value
    '''
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)
    num = (hits + falsealarms) * (hits + misses)
    den = hits + misses + falsealarms + correctnegatives
    Dr = num / den

    ETS = (hits - Dr) / (hits + misses + falsealarms - Dr)

    return ETS

2.2.4 空报率(FAR)

1 物理概念

False Alarm Rate 。在预报降水区域中实际没有降水的区域占总预报降水区域的比重。
FAR = (y_pre_1 - hits)/y_pre_1 = falsealarms / (hits + falsealarms)

2 代码

def FAR(obs, pre, threshold=0.1):
    '''
    func: 计算误警率。falsealarms / (hits + falsealarms) 
    FAR - false alarm rate
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: FAR value
    '''
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)

    return falsealarms / (hits + falsealarms)

2.2.5 漏报率(MAR)

1 物理概念

Missing Alarm Rate。实际降水区域中漏报的区域占据全部实际降水区域的比重。
MAR = (y_obs_1 - hits)/y_obs_1 = misses / (hits + misses)

2 代码

def MAR(obs, pre, threshold=0.1):
    '''
    func : 计算漏报率 misses / (hits + misses)
    MAR - Missing Alarm Rate
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: MAR value
    '''
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)

    return misses / (hits + misses)

2.2.6 命中率(POD)

1 物理概念

Probability of Detection。即预测出的实际的降水区域占据全部实际降水区域的比重。
POD = hits / y_obs_1 = hits / (hits + misses) = 1- MAR

2 代码

def POD(obs, pre, threshold=0.1):
    '''
    func : 计算命中率 hits / (hits + misses)
    pod - Probability of Detection
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: PDO value
    '''
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)

    return hits / (hits + misses)

2.2.7 偏差评分(Bias score)

1 物理概念

偏差评分(Bias score)主要用来衡量模式对某一量级降水的预报偏差, 该评分在数值上等于预报区域内满足某降水阈值的总格点数与对应实况降水总格点数的比值(Kong et al, 2008)。用来反映降水总体预报效果的检验方法。
Bias = y_pred_1/y_obs_1 = (hits + falsealarms)/(hits + misses)

2 代码

def BIAS(obs, pre, threshold = 0.1):
    '''
    func: 计算Bias评分: Bias =  (hits + falsealarms)/(hits + misses) 
          alias: (TP + FP)/(TP + FN)
    inputs:
        obs: 观测值，即真实值；
        pre: 预测值；
        threshold: 阈值，判别正负样本的阈值,默认0.1,气象上默认格点 >= 0.1才判定存在降水。
    returns:
        dtype: float
    '''    
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)

    return (hits + falsealarms) / (hits + misses)

2.2.8 其他评分

1. HSS

HSS公式如下：

def HSS(obs, pre, threshold=0.1):
    '''
    HSS - Heidke skill score
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): pre
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: HSS value
    '''
    hits, misses, falsealarms, correctnegatives = prep_clf(obs=obs, pre = pre,
                                                           threshold=threshold)

    HSS_num = 2 * (hits * correctnegatives - misses * falsealarms)
    HSS_den = (misses**2 + falsealarms**2 + 2*hits*correctnegatives +
               (misses + falsealarms)*(hits + correctnegatives))

    return HSS_num / HSS_den

2. BSS

def BSS(obs, pre, threshold=0.1):
    '''
    BSS - Brier skill score
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
        threshold (float)  : threshold for rainfall values binaryzation
                             (rain/no rain)
    Returns:
        float: BSS value
    '''
    obs = np.where(obs >= threshold, 1, 0)
    pre = np.where(pre >= threshold, 1, 0)

    obs = obs.flatten()
    pre = pre.flatten()

    return np.sqrt(np.mean((obs - pre) ** 2))

3. MAE

def MAE(obs, pre):
    """
    Mean absolute error
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
    Returns:
        float: mean absolute error between observed and simulated values
    """
    obs = pre.flatten()
    pre = pre.flatten()

    return np.mean(np.abs(pre - obs))

4. RMSE

def RMSE(obs, pre):
    """
    Root mean squared error
    Args:
        obs (numpy.ndarray): observations
        pre (numpy.ndarray): prediction
    Returns:
        float: root mean squared error between observed and simulated values
    """
    obs = obs.flatten()
    pre = pre.flatten()

    return np.sqrt(np.mean((obs - pre) ** 2))

2.2.9 阈值选取

上述评分都与阈值threshold密切相关，我们关注什么类型降水的预报准确度，就使用对应的threshold。

threshold : mm	0.1	10	25	50	100
降雨类型	小雨	中雨	大雨	暴雨	特大暴雨

3 应用举例

选取上述例子，来看在不同阈值下各评分函数数值。

除Bia外，其他评分函数数值范围都在[0,1]之间；其中FAR \ MAR越低越好，其他越高越好。
实际情况：FAR和 MAR一般随着降水阈值增大而显著增加，CSI 、ETS、POD、HSS、BSS随阈值增大而减小。即该预报模式对强降水的预报能力较弱，对是否降水预测更准确。

在真实的检验中，y_obs并不是均匀网格的，而是站点分布的，依据相同思路，比较区域内的所有站点预测和站点观测值，也能得到对应评分。

References

[1] Kong et al, 2008: http://www.gyqx.ac.cn/article/2018/02/20180217.htm#bkong2008 [2] rainymotion v0.1 github: https://github.com/hydrogo/rainymotion/blob/master/rainymotion/metrics.py [3] 分类模型评估指标——准确率、精准率、召回率、F1、ROC曲线、AUC曲线: https://easyai.tech/ai-definition/accuracy-precision-recall-f1-roc-auc/

python-气象

实用编程 | 气象上常见的评分函数及其Python实现

2.1 二分类介绍

2.2 降水评价

2.2.1 气象二分类指标

2 代码

2.2.3 公平技巧评分(ETS)

1 物理概念

2 代码

2.2.4 空报率(FAR)

1 物理概念

2 代码

2.2.5 漏报率(MAR)

1 物理概念

2 代码

2.2.6 命中率(POD)

1 物理概念

2 代码

2.2.7 偏差评分(Bias score)

1 物理概念

2 代码

2.2.8 其他评分

1. HSS

2. BSS

3. MAE

4. RMSE

2.2.9 阈值选取

3 应用举例

References

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐