我注意到,无论测试或训练模型中有多少数据,SentimentAnalysis样本项目中的Microsoft.Ml.Legacy.LearningPipeline.Row计数始终为10。
https://github.com/dotnet/samples/blob/master/machine-learning/tutorials/SentimentAnalysis.sln
有人能解释一下10的意义吗?
// LearningPipeline allows you to add steps in order to keep everything together
// during the learning process.
// <Snippet5>
var pipeline = new LearningPipeline();
// </Snippet5>
// The TextLoader loads a dataset with comments and corresponding postive or negative sentiment.
// When you create a loader, you specify the schema by passing a class to the loader containing
// all the column names and their types. This is used to create the model, and train it.
// <Snippet6>
pipeline.Add(new TextLoader(_dataPath).CreateFrom<SentimentData>());
// </Snippet6>
// TextFeaturizer is a transform that is used to featurize an input column.
// This is used to format and clean the data.
// <Snippet7>
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
//</Snippet7>
// Adds a FastTreeBinaryClassifier, the decision tree learner for this project, and
// three hyperparameters to be used for tuning decision tree performance.
// <Snippet8>
pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 50, NumTrees = 50, MinDocumentsInLeafs = 20 });
// </Snippet8>
发布于 2018-10-22 01:42:50
调试器仅显示数据的预览-前10行。这里的目标是显示几个示例行,以及每个转换如何对它们进行操作,以使调试变得更容易。
读取整个训练数据并在其上运行所有转换的代价很高,而且只有在到达.Train()
时才会发生。由于转换只在几行上操作,因此在整个数据集上操作时,它们的效果可能会有所不同(例如,文本字典可能会更大),但希望在运行整个训练过程之前显示的数据预览有助于调试和确保将转换应用于正确的列。
如果你有任何关于如何使这一点更清楚或更有用的想法,如果你能在GitHub上创建一个问题,那就太好了!
https://stackoverflow.com/questions/52916374
复制相似问题