首页
学习
活动
专区
工具
TVP
发布

翻译scikit-learn Cookbook

学习sklearn
专栏作者
78
文章
52849
阅读量
15
订阅数
Automatic cross validation自动交叉验证
We've looked at the using cross validation iterators that scikit-learn comes with, but we can also use a helper function to perform cross validation for use automatically. This is similar to how other objects in scikit-learn are wrapped by helper functions, pipeline for instance.
到不了的都叫做远方
2019-12-10
6170
Using many Decision Trees – random forests使用多棵决策树--随机森林
In this recipe, we'll use random forests for classification tasks. random forests are used because they're very robust to overfitting and perform well in a variety of situations.
到不了的都叫做远方
2019-11-29
6230
Probabilistic clustering with Gaussian Mixture Models
用基于概率的高斯混合模型聚类 In KMeans, we assume that the variance of the clusters is equal. This leads to a subdivision of space that determines how the clusters are assigned; but, what about a situation where the variances are not equal and each cluster point has som
到不了的都叫做远方
2019-11-25
6050
Finding the closest objects in the feature space在特征空间中找到最接近的对象
Sometimes, the easiest thing to do is to just find the distance between two objects. We just need to find some distance metric, compute the pairwise distances, and compare the outcomes to what's expected.
到不了的都叫做远方
2019-11-24
6490
Quantizing an image with KMeans clustering使用KMeans聚类量化图片
Image processing is an important topic in which clustering has some application.
到不了的都叫做远方
2019-11-23
1K0
Directly applying Bayesian ridge regression直接使用贝叶斯岭回归
In the Using ridge regression to overcome linear regression's shortfalls recipe, we discussed the connections between the constraints imposed by ridge regression from an optimization standpoint. We also discussed the Bayesian interpretation of priors on the coefficients, which attract the mass of the density towards the prior, which often has a mean of 0 .
到不了的都叫做远方
2019-11-18
1.5K0
Using sparsity to regularize models使用稀疏性来正则化模型
The least absolute shrinkage and selection operator (LASSO) method is very similar to ridge regression and LARS. It's similar to Ridge Regression in the sense that we penalize our regression by some amount, and it's similar to LARS in that it can be used as a parameter selection, and it typically leads to a sparse vector of coefficients.
到不了的都叫做远方
2019-11-14
5120
Optimizing the ridge regression parameter最优化岭回归参数
Once you start using ridge regression to make predictions or learn about relationships in the system you're modeling, you'll start thinking about the choice of alpha.For example, using OLS regression might show some relationship between two variables;however, when regularized by some alpha, the relationship is no longer significant. This can be a matter of whether a decision needs to be taken.
到不了的都叫做远方
2019-11-13
1.5K0
Using stochastic gradient descent for regression使用随机梯度下降进行回归分析
In this recipe, we'll get our first taste of stochastic gradient descent. We'll use it for regression here, but for the next recipe, we'll use it for classification.
到不了的都叫做远方
2019-11-09
5410
Using Gaussian processes for regression降维之高斯过程
In this recipe, we'll use the Gaussian process for regression. In the linear models section,we saw how representing prior information on the coefficients was possible using Bayesian Ridge Regression.
到不了的都叫做远方
2019-11-07
9770
Using truncated SVD to reduce dimensionality使用截断奇异值进行降维
Truncated Singular Value Decomposition (SVD) is a matrix factorization technique that factors a matrix M into the three matrices U, Σ, and V. This is very similar to PCA, excepting that the factorization for SVD is done on the data matrix, whereas for PCA, the factorization is done on the covariance matrix. Typically, SVD is used under the hood to find the principle components of a matrix.
到不了的都叫做远方
2019-11-03
2.1K0
Reducing dimensionality with PCA主成分分析之降维
Now it's time to take the math up a level! Principal component analysis (PCA) is the first somewhat advanced technique discussed in this book. While everything else thus far has been simple statistics, PCA will combine statistics and linear algebra to produce a preprocessing step that can help to reduce dimensionality, which can be the enemy of a simple model.
到不了的都叫做远方
2019-10-31
7470
使用Pipelines来整合多个数据预处理步骤
Pipelines are (at least to me) something I don't think about using often, but are useful.They can be used to tie together many steps into one object. This allows for easier tuning and better access to the configuration of the entire model, not just one of the steps.
到不了的都叫做远方
2019-10-30
1.6K0
Imputing missing values through various strategies填充处理缺失值的不同方法
Data imputation is critical in practice, and thankfully there are many ways to deal with it.In this recipe, we'll look at a few of the strategies. However, be aware that there might be other approaches that fit your situation better.
到不了的都叫做远方
2019-10-30
8360
Working with categorical variables处理分类变量
Categorical variables are a problem. On one hand they provide valuable information; on the other hand, it's probably text—either the actual text or integers corresponding to the text—like an index in a lookup table.So, we clearly need to represent our text as integers for the model's sake, but we can't just use the id field or naively represent them. This is because we need to avoid a similar problem to the Creating binary features through thresholding recipe. If we treat data that is continuous, it must be interpreted as continuous.
到不了的都叫做远方
2019-10-29
8050
Creating binary features through thresholding通过阈值来生成二元特征
In the last recipe, we looked at transforming our data into the standard normal distribution.Now, we'll talk about another transformation, one that is quite different.
到不了的都叫做远方
2019-10-28
4190
Scaling data to the standard normal缩放数据到标准正态形式
A preprocessing step that is almost recommended is to scale columns to the standard normal. The standard normal is probably the most important distribution of all statistics.
到不了的都叫做远方
2019-10-27
1.2K0
scikit-learn Cookbook 01
I will again implore you to use some of your own data for this book, but in the event you cannot,we'll learn how we can use scikit-learn to create toy data.
到不了的都叫做远方
2019-10-26
3990
scikit-learn Cookbook 00
This chapter discusses setting data, preparing data, and premodel dimensionality reduction.These are not the
到不了的都叫做远方
2019-10-25
4140
没有更多了
社区活动
腾讯技术创作狂欢月
“码”上创作 21 天,分 10000 元奖品池!
Python精品学习库
代码在线跑,知识轻松学
博客搬家 | 分享价值百万资源包
自行/邀约他人一键搬运博客,速成社区影响力并领取好礼
技术创作特训营·精选知识专栏
往期视频·千货材料·成员作品 最新动态
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档