我想用PyTorch报告我的数据的90,95,99等置信区间。但是信心间隔似乎太重要了,不能让我的实现没有经过测试或批评,所以我想要反馈--至少应该由一些专家来检查。此外,我已经注意到,当我的值为负值时,我得到了NaN值,这使我认为我的代码只适用于分类(至少),但我也做了回归。我还感到惊讶的是,直接使用numpy代码实际上给了我一些可微的tensors...not,这是我所期待的。
这是对的吗?:
import numpy as np
import scipy
import torch
from torch import Tensor
P_CI = {0.90: 1.64,
0.95: 1.96,
0.98: 2.33,
0.99: 2.58,
}
def mean_confidence_interval_rfs(data, confidence=0.95):
"""
https://stackoverflow.com/a/15034143/1601580
"""
a = 1.0 * np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scipy.stats.t.ppf((1 + confidence) / 2., n - 1)
return m, h
def mean_confidence_interval(data, confidence=0.95):
a = 1.0 * np.array(data)
n = len(a)
m, se = np.mean(a), scipy.stats.sem(a)
h = se * scipy.stats.t.ppf((1 + confidence) / 2., n - 1)
return m, m - h, m + h
def ci(a, p=0.95):
import numpy as np, scipy.stats as st
st.t.interval(p, len(a) - 1, loc=np.mean(a), scale=st.sem(a))
# def ci(a, p=0.95):
# import statsmodels.stats.api as sms
#
# sms.DescrStatsW(a).tconfint_mean()
def compute_confidence_interval_classification(data: Tensor,
by_pass_30_data_points: bool = False,
p_confidence: float = 0.95
) -> Tensor:
"""
Computes CI interval
[B] -> [1]
According to [1] CI the confidence interval for classification error can be calculated as follows:
error +/- const * sqrt( (error * (1 - error)) / n)
The values for const are provided from statistics, and common values used are:
1.64 (90%)
1.96 (95%)
2.33 (98%)
2.58 (99%)
Assumptions:
Use of these confidence intervals makes some assumptions that you need to ensure you can meet. They are:
Observations in the validation data set were drawn from the domain independently (e.g. they are independent and
identically distributed).
At least 30 observations were used to evaluate the model.
This is based on some statistics of sampling theory that takes calculating the error of a classifier as a binomial
distribution, that we have sufficient observations to approximate a normal distribution for the binomial
distribution, and that via the central limit theorem that the more observations we classify, the closer we will get
to the true, but unknown, model skill.
Ref:
- computed according to: https://machinelearningmastery.com/report-classifier-performance-confidence-intervals/
todo:
- how does it change for other types of losses
"""
B: int = data.size(0)
# assert data >= 0
assert B >= 30 and (not by_pass_30_data_points), f' Not enough data for CI calc to be valid and approximate a' \
f'normal, you have: {B=} but needed 30.'
const: float = P_CI[p_confidence]
error: Tensor = data.mean()
val = torch.sqrt((error * (1 - error)) / B)
print(val)
ci_interval: float = const * val
return ci_interval
def compute_confidence_interval_regression():
"""
todo
:return:
"""
raise NotImplementedError
# - tests
def ci_test():
x: Tensor = abs(torch.randn(35))
ci_pytorch = compute_confidence_interval_classification(x)
ci_rfs = mean_confidence_interval(x)
print(f'{x.var()=}')
print(f'{ci_pytorch=}')
print(f'{ci_rfs=}')
x: Tensor = abs(torch.randn(35, requires_grad=True))
ci_pytorch = compute_confidence_interval_classification(x)
ci_rfs = mean_confidence_interval(x)
print(f'{x.var()=}')
print(f'{ci_pytorch=}')
print(f'{ci_rfs=}')
x: Tensor = torch.randn(35) - 10
ci_pytorch = compute_confidence_interval_classification(x)
ci_rfs = mean_confidence_interval(x)
print(f'{x.var()=}')
print(f'{ci_pytorch=}')
print(f'{ci_rfs=}')
if __name__ == '__main__':
ci_test()
print('Done, success! \a')
产出:
tensor(0.0758)
x.var()=tensor(0.3983)
ci_pytorch=tensor(0.1486)
ci_rfs=(tensor(0.8259), tensor(0.5654), tensor(1.0864))
tensor(0.0796, grad_fn=<SqrtBackward>)
x.var()=tensor(0.4391, grad_fn=<VarBackward>)
ci_pytorch=tensor(0.1559, grad_fn=<MulBackward0>)
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/metrics/metrics.py", line 154, in <module>
ci_test()
File "/Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/metrics/metrics.py", line 144, in ci_test
ci_pytorch = compute_confidence_interval_classification(x, by_pass_30_data_points)
如何修复上面用于回归的代码,例如任意大小的负值?
有点惊讶的是,还没有一个实现,尤其是没有正式的PyTorch实现,考虑到CI对于深度学习坏习惯来说是多么重要吗?不幸的是,很少在报纸上看到。
参考文献:
发布于 2021-12-17 17:40:06
tldr;
置信区间(ci)计算:
mu_n +- ci
)。假设:
n>=30
)。如果这些假设成立(**,即通过样本平均值与+-值**相匹配真正的平均值),那么使用下面的代码,我称之为torch_compute_confidence_interval
,用于回归,分类,任何你想要的东西。
首先,asfaik置信区间(ci)是深度学习(DL)中一个开放的研究问题,因此可能存在更复杂的答案。但我将提供一个实际的答案,我计划使用(并看到其他人使用时,报告结果在DL)。
要计算置信区间,我们必须先了解一点ci。它们是关于随机调查/数据集样本的概率声明,你想要报告的平均值就是报告的间隔。所以当人们说:
mean_error +- CI for p=95%
这意味着,如果你抽样了95个数据集,你就会期望真正的平均值就在这个时间间隔95中(但是你不知道是哪一个,所以你不能说你计算的任何特定的时间间隔都会有平均值)。
这意味着您只能将其用于报告,意味着。这是因为它后面的数学(不是很难)通过利用我们可以解析地计算样本平均值的概率来近似计算有界保持(或置信区间)的概率,因为根据中心极限定理CLT,近似一个法线。因此,计算出的特定CI假设你想要计算的质量是一个样本平均值,并使用这个正态近似来计算你的+-数。因此,通常建议为您正在使用的特定数据集设置n>=30
数据点,但是仍然可以很好地解决问题,因为ci可以用t分布而不是正常值(在stats软件中表示z)计算。
考虑到这些假设,您可以简单地做以下工作:
def torch_compute_confidence_interval(data: Tensor,
confidence: float = 0.95
) -> Tensor:
"""
Computes the confidence interval for a given survey of a data set.
"""
n = len(data)
mean: Tensor = data.mean()
# se: Tensor = scipy.stats.sem(data) # compute standard error
# se, mean: Tensor = torch.std_mean(data, unbiased=True) # compute standard error
se: Tensor = data.std(unbiased=True) / (n**0.5)
t_p: float = float(scipy.stats.t.ppf((1 + confidence) / 2., n - 1))
ci = t_p * se
return mean, ci
我对它进行了测试,并将其与专门用于分类的东西进行了比较,它们在1e-2
之前的值上都是一致的,因此代码可以工作。输出:
Connected to pydev debugger (build 213.5744.248)
x_bernoulli.std()=tensor(0.5040)
ci_95=0.1881992999915952
ci_95_cls=tensor(0.1850)
ci_95_anything=tensor(0.1882)
x_bernoulli.std()=tensor(0.5085, grad_fn=<StdBackward>)
ci_95_torch=tensor(0.1867, grad_fn=<MulBackward0>)
x.std()=tensor(0.9263)
ci_95=0.3458867459004733
ci_95_torch=tensor(0.3459)
x.std()=tensor(1.0181, grad_fn=<StdBackward>)
ci_95_torch=tensor(0.3802, grad_fn=<MulBackward0>)
有关更多细节,请参见我的最终-utils库,在该库中,我将评论docs:intervals.py#L1中的数学。
对DL的评论
如果你报告的是一个特定模型的误差,比如神经网络,你或多或少地报告了这个非常特定的神经网络和权重的真正平均误差就在这些范围内。但是,正如我所说的,这是一个开放的研究领域,所以必须有更有价值的东西,例如,考虑到一些层实际上是随机的,等等。
https://stackoverflow.com/questions/70356922
复制相似问题