blocks|key|903855|text|当您执行k折叠交叉验证时，您已经对每个样本进行了预测，仅超过10个不同的模型(假定k=+10)。没有必要对完整的数据进行预测，因为您已经从k个不同的模型中得到了它们的预测。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|903856|你能做的是：|903857|train_control<-+trainControl(method="cv",+number=10,+savePredictions+=+TRUE)|code-block|syntax|javascript|903858|然后|903859|model<-+train(resp~.,+data=mydat,+trControl=train_control,+method="rpart")|903860|如果您想以一种很好的格式查看观察到的和预测，只需键入：|903861|model$pred|903862|同样，对于问题的第二部分，插入符号应该处理所有参数内容。如果你想要的话，你可以手动调优参数。|903863|entityMap^0|0|0|0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|U|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|V|8|@]|9|@]|A|$]]|$1|D|3|E|5|F|7|W|8|@]|9|@]|A|$G|H]]|$1|I|3|J|5|6|7|X|8|@]|9|@]|A|$]]|$1|K|3|L|5|F|7|Y|8|@]|9|@]|A|$G|H]]|$1|M|3|N|5|6|7|Z|8|@]|9|@]|A|$]]|$1|O|3|P|5|F|7|10|8|@]|9|@]|A|$G|H]]|$1|Q|3|R|5|6|7|11|8|@]|9|@]|A|$]]|$1|S|3|-4|5|6|7|12|8|@]|9|@]|A|$]]]|T|$]]

when you perform k-fold cross validation you are already making a prediction for each sample, just over 10 different models (presuming k = 10). 
There is no need make a prediction on the complete data, as you already have their predictions from the k different models.

What you can do is the following:

<pre><code>train_control&lt;- trainControl(method="cv", number=10, savePredictions = TRUE)
</code></pre>

Then

<pre><code>model&lt;- train(resp~., data=mydat, trControl=train_control, method="rpart")
</code></pre>

if you want to see the observed and predictions in a nice format you simply type:

<pre><code>model$pred
</code></pre>

Also for the second part of your question, caret should handle all the parameter stuff. You can manually try tune parameters if you desire.

blocks|key|1509652|text|在简短的导言文件插入包的第一页中，提到了在参数之间选择最优模型。作为一个起点，必须理解交叉验证是一个选择最佳建模方法的过程，而不是模型本身的CV+-最终型号选择。Caret使用tuneGrid提供网格搜索选项，您可以在其中提供要测试的参数值列表。最后的模型在进行训练后将具有最优的参数。|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|1509653|entityMap|0|LINK|mutability|MUTABLE|url|https://cran.csiro.au/web/packages/caret/vignettes/caret.html|1|https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation^0|2G|8|1|7|0|1Y|A|1|0^^$0|@$1|2|3|4|5|6|7|P|8|@$9|Q|A|R|B|C]]|D|@$9|S|A|T|1|U]|$9|V|A|W|1|X]]|E|$]]|$1|F|3|-4|5|6|7|Y|8|@]|D|@]|E|$]]]|G|$H|$5|I|J|K|E|$L|M]]|N|$5|I|J|K|E|$L|O]]]]

In the first page of the <a href="https://cran.csiro.au/web/packages/caret/vignettes/caret.html" rel="nofollow noreferrer">short introduction document</a> for caret package, it is mentioned that the optimal model is chosen across the parameters. 
As a starting point, one must understand that cross-validation is a procedure for selecting best modeling approach rather than the model itself <a href="https://stats.stackexchange.com/questions/52274/how-to-choose-a-predictive-model-after-k-fold-cross-validation">CV - Final model selection</a>. Caret provides grid search option using <code>tuneGrid</code> where you can provide a list of parameter values to test. The final model will have the optimized parameter after training is done.

blocks|key|1509694|text|这里需要注意的一件重要的事情是不要混淆模型选择和模型误差估计。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1509695|您可以使用交叉验证来估计模型的超参数(例如正则化参数)。|1509696|这通常是通过10倍的交叉验证来完成的，因为这是比较好的偏差-方差权衡的选择(2倍可能导致高偏差的模型，而忽略一次cv会导致高方差/过度拟合的模型)。|1509697|在此之后，如果您没有独立的测试集，您可以使用交叉验证来估计某个性能度量的经验分布:一旦您找到了最佳的超参数，就可以使用它们来估计de+cv误差。|1509698|注意，在这一步中，超参数是固定的，但由于交叉验证模型的不同，模型参数可能是不同的。|1509699|entityMap^0|0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|L|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|M|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|N|8|@]|9|@]|A|$]]|$1|F|3|G|5|6|7|O|8|@]|9|@]|A|$]]|$1|H|3|I|5|6|7|P|8|@]|9|@]|A|$]]|$1|J|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|K|$]]

An important thing to be noted here is not confuse model selection and model error estimation. 

You can use cross-validation to estimate the model hyper-parameters (regularization parameter for example). 

Usually that is done with 10-fold cross validation, because it is good choice for the bias-variance trade-off (2-fold could cause models with high bias, leave one out cv can cause models with high variance/over-fitting). 

After that, if you don't have an independent test set you could estimate an empirical distribution of some performance metric using cross validation: once you found out the best hyper-parameters you could use them in order to estimate de cv error. 

Note that in this step the hyperparameters are fixed but maybe the model parameters are different accross the cross validation models.

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this:

<ol>
<li>Perform k-fold Cross Validation i.e. 10 folds to understand the average error across the 10 folds.</li>
<li>If acceptable then train the model on the complete data set.</li>
</ol>

I am attempting to build a decision tree using <code>rpart</code> in R and taking advantage of the <code>caret</code> package. Below is the code I am using.

<pre><code># load libraries
library(caret)
library(rpart)

# define training control
train_control&lt;- trainControl(method="cv", number=10)

# train the model 
model&lt;- train(resp~., data=mydat, trControl=train_control, method="rpart")

# make predictions
predictions&lt;- predict(model,mydat)

# append predictions
mydat&lt;- cbind(mydat,predictions)

# summarize results
confusionMatrix&lt;- confusionMatrix(mydat$predictions,mydat$resp)
</code></pre>

I have one question regarding the caret train application. I have read <a href="https://cran.csiro.au/web/packages/caret/vignettes/caret.html" rel="nofollow noreferrer">A Short Introduction to the caret Package</a> train section which states during the resampling process the "optimal parameter set" is determined. 

In my example have I coded it up correctly? Do I need to define the <code>rpart</code> parameters within my code or is my code sufficient?

Applying k-fold Cross Validation model using caret package

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

首先，我要说，我读过许多关于交叉验证的文章，而且似乎有很多混淆之处。我对此的理解很简单：执行k-折叠交叉验证，即10倍，以了解10倍的平均误差。如果可以接受，则在完整的数据集上对模型进行培训。我试图在R中使用rpart并利用caret包构建一个决策树。下面是我正在使用的代码。# load librarieslibrar...

问使用插入包的k-折叠交叉验证模型
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用插入包的k-折叠交叉验证模型EN