你回测了 100 个因子,20 个显著 p < 0.05。太棒了?
等等——随机数据也会有 5 个假阳性。这些因子有多少是真的?
多重检验是量化最大的陷阱。不做校正,你的"发现"可能全是噪音。
一次检验,p < 0.05 意味着 5% 概率假阳性。
100 次独立检验,至少一个假阳性的概率:
至少一个假阳性
几乎必然有假阳性。 不校正,你筛选出的"显著因子"里混杂了大量噪音。
概率 | 名称 | 含义 |
|---|---|---|
FWER | 家族错误率 | 至少一个假阳性的概率 |
FDR | 假发现率 | 假阳性占所有阳性的比例 |
FWER 严格(至少一个错误就算失败),FDR 宽松(允许部分假阳性)。
最简单、最保守:
校正
拒绝阈值从 0.05 变成 0.05/n。
fn bonferroni_correction(p_values: &[f64]) -> Vec<f64> {
let n = p_values.len() as f64;
p_values.iter()
.map(|p| (p * n).min(1.0))
.collect()
}缺点:太保守。 100 个检验,阈值变成 0.0005,真阳性也可能被漏掉。
比 Bonferroni 更平衡:
fn holm_correction(p_values: &[f64]) -> Vec<f64> {
let n = p_values.len();
// 按原始 p 值排序,记录原始索引
let mut indexed: Vec<(usize, f64)> = p_values.iter()
.enumerate()
.map(|(i, p)| (i, *p))
.collect();
indexed.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
let mut corrected = vec![0.0; n];
let mut prev_corrected = 0.0;
for (rank, (orig_idx, p)) in indexed.iter().enumerate() {
let m = n - rank;
let adj_p = (p * m as f64).min(1.0);
// 确保单调性:校正后的 p 值不能比前一个小
let final_p = adj_p.max(prev_corrected);
corrected[*orig_idx] = final_p;
prev_corrected = final_p;
}
corrected
}控制假发现率,最适合因子筛选:
fn benjamini_hochberg(p_values: &[f64]) -> Vec<f64> {
let n = p_values.len();
// 排序
let mut indexed: Vec<(usize, f64)> = p_values.iter()
.enumerate()
.map(|(i, p)| (i, *p))
.collect();
indexed.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
let mut corrected = vec![0.0; n];
let mut prev_corrected = 1.0;
// 从最大到最小处理
for (rank, (orig_idx, p)) in indexed.iter().enumerate().rev() {
let i = rank + 1;
let adj_p = (p * n as f64 / i as f64).min(1.0);
// 确保单调性
let final_p = adj_p.min(prev_corrected);
corrected[*orig_idx] = final_p;
prev_corrected = final_p;
}
corrected
}use polars::prelude::*;
fn compare_corrections(p_values: &[f64]) -> Result<DataFrame> {
let bonferroni = bonferroni_correction(p_values);
let holm = holm_correction(p_values);
let bh = benjamini_hochberg(p_values);
let n_significant = |corrected: &[f64]| {
corrected.iter().filter(|p| **p < 0.05).count()
};
let methods: Vec<String> = vec![
"原始".into(),
"Bonferroni".into(),
"Holm".into(),
"BH".into(),
];
let counts: Vec<i32> = vec![
n_significant(p_values) as i32,
n_significant(&bonferroni) as i32,
n_significant(&holm) as i32,
n_significant(&bh) as i32,
];
df![
"method" => methods,
"significant_count" => counts,
]
}输出示例:
┌──────────────┬────────────────────┐
│ method │ significant_count │
╞══════════════╪════════════════════╡
│ 原始 │ 20 │
│ Bonferroni │ 3 │
│ Holm │ 5 │
│ BH │ 12 │
└──────────────┴────────────────────┘BH 校正后仍有 12 个显著因子,比 Bonferroni 的 3 个宽松,比原始的 20 个保守。
use anyhow::Result;
use polars::prelude::*;
struct Factor {
name: String,
p_value: f64,
ic: f64, // 信息系数
}
fn factor_screening(factors: Vec<Factor>) -> Result<DataFrame> {
let p_values: Vec<f64> = factors.iter().map(|f| f.p_value).collect();
let corrected_p = benjamini_hochberg(&p_values);
let names: Vec<String> = factors.iter().map(|f| f.name.clone()).collect();
let ics: Vec<f64> = factors.iter().map(|f| f.ic).collect();
let significant: Vec<bool> = corrected_p.iter().map(|p| *p < 0.05).collect();
df![
"factor" => names,
"p_value" => p_values,
"corrected_p" => corrected_p,
"ic" => ics,
"significant" => significant,
]
}使用示例:
let factors = vec![
Factor { name: "momentum_5d".into(), p_value: 0.001, ic: 0.05 },
Factor { name: "momentum_20d".into(), p_value: 0.012, ic: 0.03 },
Factor { name: "volume_ratio".into(), p_value: 0.045, ic: 0.02 },
Factor { name: "pe_ratio".into(), p_value: 0.089, ic: 0.01 },
// ... 更多因子
];
let df = factor_screening(factors)?;
println!("{}", df.filter(&col("significant").lit(true))?.collect()?);功效 = 1 - β = P(正确拒绝 H0 | H1 为真)
功效告诉你:如果因子真的有效,你有多大概率发现它。
低功效 = 真因子被漏掉(假阴性)。
功效
因素 | 影响 |
|---|---|
样本量 n | 越大越好 |
效应量 δ | 越大越好 |
显著性水平 α | 越小,功效越低 |
波动 σ | 越小越好 |
use statrs::distribution::{ContinuousCDF, Normal as StatrsNormal};
fn compute_power(
effect_size: f64, // Cohen's d = (μ1 - μ0) / σ
n: usize,
alpha: f64,
) -> f64 {
let z_alpha = StatrsNormal::new(0.0, 1.0).unwrap()
.inverse_cdf(1.0 - alpha / 2.0);
let z_effect = effect_size * (n as f64).sqrt();
let power = StatrsNormal::new(0.0, 1.0).unwrap()
.cdf(z_effect - z_alpha);
power
}反过来:给定目标功效,计算需要的样本量:
fn required_sample_size(
effect_size: f64,
alpha: f64,
target_power: f64,
) -> usize {
let z_alpha = StatrsNormal::new(0.0, 1.0).unwrap()
.inverse_cdf(1.0 - alpha / 2.0);
let z_beta = StatrsNormal::new(0.0, 1.0).unwrap()
.inverse_cdf(target_power);
let n = ((z_alpha + z_beta) / effect_size).powi(2);
n.ceil() as usize
}使用示例:
// 想检测 0.3 的效应量,功效 80%,显著性 5%
let n = required_sample_size(0.3, 0.05, 0.8);
println!("需要样本量: {}", n);
// 输出:需要样本量: 88struct PowerReport {
n_obs: usize,
expected_ic: f64,
power_at_5pct: f64,
power_at_1pct: f64,
n_for_80pct: usize,
}
fn factor_power_analysis(
n_obs: usize,
expected_ic: f64,
) -> PowerReport {
// IC 转效应量(简化)
let effect_size = expected_ic * 2.0;
let power_5pct = compute_power(effect_size, n_obs, 0.05);
let power_1pct = compute_power(effect_size, n_obs, 0.01);
// 反算:要达到 80% 功效需要多少样本
let n_for_80pct = required_sample_size(effect_size, 0.05, 0.8);
PowerReport {
n_obs,
expected_ic,
power_at_5pct: power_5pct,
power_at_1pct: power_1pct,
n_for_80pct,
}
}输出示例:
=== 因子功效分析 ===
当前样本量: 252
预期 IC: 0.05
5% 显著性水平下功效: 42.3%
1% 显著性水平下功效: 23.1%
要达到 80% 功效需要: 623 个样本252 个样本,检测 IC=0.05 的因子,功效只有 42%。 大部分真因子会被漏掉。
这就是为什么因子研究需要长历史+多股票——样本量不足,功效太低。
校正不是为了否定发现,而是为了让发现经得起考验。
下一站:ANOVA——多组对比,一网打尽。