文章/答案/技术大牛

发布

社区首页 >问答首页 >使用confusionMatrix ` `data`‘和’`reference`‘的错误应该是相同级别的因素

问使用confusionMatrix ` `data`‘和’`reference`‘的错误应该是相同级别的因素
EN

Stack Overflow用户

提问于 2020-05-02 16:04:05

回答 1查看 2.6K关注 0票数 1

我是第一次使用R(在R演播室)，所以为任何愚蠢的错误道歉。

我正在运行一个机器学习模型。在我的脚本中，我得到了下面的错误，

Error: `data` and `reference` should be factors with the same levels. 
4. stop("`data` and `reference` should be factors with the same levels.", call. = FALSE) 
3. confusionMatrix.default(Y.pr, Y.ob)

当我踏入confusionMatrix时，我有点困惑。

数据(我的Y.pr)变量存储在数据部分下，而引用(我的Y.ob)存储在值下面。当我点击引用时，它会显示

num [1:8593] 0 0 1 1 1 0 0 0 1 1 ...

当我展开它时，数据变量如下所示。

Large matrix (8593 elements, 604.6 kb)
- attr(*, "dimnames")= List of 2
..$ : chr [1:8593] "34371" "34372" "34373" "34374" ...
..$ : NULL

对我来说没有任何意义。我猜是Null造成了这个问题？

更新

使用相同的数据，我能够在Python中运行一个完全工作的模型。

更新的端

confusion-matrix

回答 1

Stack Overflow用户

发布于 2020-05-20 15:28:19

我将使用?confusionMatrix中的示例来了解您的错误，然后通过一种方法从它中恢复。

前排

解决问题的方法是为一个非factor变量分配级别。如果你不确切地知道与pred相关的数值水平，那么你的临床研究就结束了:任何结果都是可疑和站不住脚的。其余的答案假设你对水平是确定的(或者你只是在玩游戏，而且没有正式的研究或调查，或者任何关于这些数据的东西)。即使原始数据不是以factor的形式出现，验证什么是"1“和"2”(或数字是什么)是关键的一步。

游行示威

library(caret)
lvs <- c("normal", "abnormal")
truth <- factor(rep(lvs, times = c(86, 258)),
                levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs, times = c(54, 32)),
    rep(lvs, times = c(27, 231))),
  levels = rev(lvs))

head(truth)
# [1] normal normal normal normal normal normal
# Levels: abnormal normal
head(pred)
# [1] normal normal normal normal normal normal
# Levels: abnormal normal

正常(理想)执行：

confusionMatrix(pred, truth)
# Confusion Matrix and Statistics
#           Reference
# Prediction abnormal normal
#   abnormal      231     32
#   normal         27     54
#                                           
#                Accuracy : 0.8285          
#                  95% CI : (0.7844, 0.8668)
#     No Information Rate : 0.75            
#     P-Value [Acc > NIR] : 0.0003097       
#                                           
#                   Kappa : 0.5336          
#  Mcnemar's Test P-Value : 0.6025370       
#                                           
#             Sensitivity : 0.8953          
#             Specificity : 0.6279          
#          Pos Pred Value : 0.8783          
#          Neg Pred Value : 0.6667          
#              Prevalence : 0.7500          
#          Detection Rate : 0.6715          
#    Detection Prevalence : 0.7645          
#       Balanced Accuracy : 0.7616          
#                                           
#        'Positive' Class : abnormal

但如果第二个论点不是一个因素呢？

truth_num <- as.integer(truth)
head(truth_num)
# [1] 2 2 2 2 2 2
confusionMatrix(pred, truth_num)
# Error: `data` and `reference` should be factors with the same levels.

修正

我们需要做的是将truth_num带回到一个因素。

首先，理论:如果它一度是一个factor，并以某种方式转换为integer，那么它就是一堆1s和2s (最初是其级别上的指数)。如果它从来不是一个因素，它可能是任何数字，真的，但底线是:我们知道哪个(整数)是哪个(级别)？如果你猜错了，那么你的测试就会给出绝对错误的结果(没有错误/警告)。

table(pred)
# pred
# abnormal   normal 
#      263       81 
table(truth_num)
# truth_num
#   1   2 
# 258  86

只要看一下相对比例，就会发现truth_num的级别应该与c("abnormal", "normal")相同。(但请再读一遍我写的关于追逐结果的文章；不要相信比例，回到源数据中找出哪个是哪一个。)我们就是这么安排的。从指数到因素有几种方法，这里有两种方法：

### one way
truth_num_fac <- factor(truth_num)
levels(truth_num_fac)
# [1] "1" "2"
head(truth_num_fac)
# [1] 2 2 2 2 2 2
# Levels: 1 2
levels(truth_num_fac) <- levels(pred)
head(truth_num_fac)
# [1] normal normal normal normal normal normal
# Levels: abnormal normal

### another way
dput(head(pred))
# structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("abnormal", "normal"
# ), class = "factor")
truth_num_fac <- structure(truth_num, .Label = levels(pred), class = "factor")
head(truth_num_fac)
# [1] normal normal normal normal normal normal
# Levels: abnormal normal

不管怎么说，现在测试成功了。

confusionMatrix(pred, truth_num_fac)
# Confusion Matrix and Statistics
#           Reference
# Prediction abnormal normal
#   abnormal      231     32
#   normal         27     54
#                                           
#                Accuracy : 0.8285          
#                  95% CI : (0.7844, 0.8668)
#     No Information Rate : 0.75            
#     P-Value [Acc > NIR] : 0.0003097       
#                                           
#                   Kappa : 0.5336          
#  Mcnemar's Test P-Value : 0.6025370       
#                                           
#             Sensitivity : 0.8953          
#             Specificity : 0.6279          
#          Pos Pred Value : 0.8783          
#          Neg Pred Value : 0.6667          
#              Prevalence : 0.7500          
#          Detection Rate : 0.6715          
#    Detection Prevalence : 0.7645          
#       Balanced Accuracy : 0.7616          
#                                           
#        'Positive' Class : abnormal        
#

，如果.

级别是正确的，但您可以看到以下警告：

confusionMatrix(pred，truth_num_fac) #confusionMatrix.default中的警告(pred，truth_num_fac)：#级别与引用和数据的顺序不同。重构数据以匹配。#混淆矩阵和统计### .

这表示您的级别不符合相同的顺序。解决办法并不难：

水平( pred )#1“异常”“正常”水平( truth_num_fac) #1“正常”“异常”<- -应首先异常，根据pred truth_num_fac <- relevel(truth_num_fac，“异常”) confusionMatrix(pred，truth_num_fac)#混淆矩阵和truth_num_fac水平是否不正确？您将不会收到错误或警告，虽然您的测试结果将是完全不同的；这并不意味着您应该追求所期望的结果，但如果结果是严重错误的，则值得关注的是:设置反向数据truth_num_fac_backwards <- structure(truth_num，.Label = rev(levels(pred))，class = "factor") truth_num_fac_backwards <- relevel(truth_num_fac_backwards，“异常”) head(truth_num_fac_backwards) #1异常#级别:异常正常confusionMatrix(pred )，( truth_num_fac_backwards) #混淆矩阵和统计#参考#预测异常正常# 32 231 #正常54 27 #准确性: 0.1715 <0.2156) #无信息率: 0.75 #P-值Acc > NIR :1## Kappa：-0.3103 # Mcnemar's Test P-值：<2e-16 #敏感性: 0.37209 #特异性: 0.10465 # Pred值: 0.12167 # Neg Pred值: 0.33333 #患病率: 0.25000 #检出率: 0.09302 #检测流行率: 0.76453平衡精度: 0.23837 ##‘积极’类: abnormalThe正确的解决方法是返回并验证哪个级别是哪个级别。这可能是因为你做对了，结果告诉你事情不是很好的匹配。任何其他修复方法(在我看来)都是追逐结果:确保第一次获得正确的数据，不要更改数据以匹配预期的结果。

I试图将数字向量转换为factor，但levels(...)返回NULL。

这很可能是因为非数字向量不是factor，而是character。这个修复应该足够容易:设置假字符数据pred_chr <- pred pred_chr <- as.character(pred) head(pred_chr) 1 " normal“" normal”###补救pred_chr_fac <- factor(pred_chr) head(pred_chr_fac) #1正常#水平:异常正常水平(Pred_chr_fac)#1“异常”“正常”

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61562377

复制

相似问题

问使用confusionMatrix ` `data`‘和’`reference`‘的错误应该是相同级别的因素
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用confusionMatrix ` `data`‘和’`reference`‘的错误应该是相同级别的因素EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用confusionMatrix ` `data`‘和’`reference`‘的错误应该是相同级别的因素
EN