腾讯云开发者社区-腾讯云

开发者社区

文档建议反馈控制台

最新优惠活动

文章/答案/技术大牛

发布

翻译scikit-learn Cookbook

学习sklearn

专栏作者

78

文章

52849

阅读量

15

订阅数

Automatic cross validation自动交叉验证

spring scikit-learn 机器学习神经网络

We've looked at the using cross validation iterators that scikit-learn comes with, but we can also use a helper function to perform cross validation for use automatically. This is similar to how other objects in scikit-learn are wrapped by helper functions, pipeline for instance.

到不了的都叫做远方

2019-12-10

6170

Using many Decision Trees – random forests使用多棵决策树--随机森林

scikit-learn 机器学习神经网络深度学习人工智能

In this recipe, we'll use random forests for classification tasks. random forests are used because they're very robust to overfitting and perform well in a variety of situations.

到不了的都叫做远方

2019-11-29

6230

Probabilistic clustering with Gaussian Mixture Models

scikit-learn 机器学习 http 神经网络深度学习

用基于概率的高斯混合模型聚类 In KMeans, we assume that the variance of the clusters is equal. This leads to a subdivision of space that determines how the clusters are assigned; but, what about a situation where the variances are not equal and each cluster point has som

到不了的都叫做远方

2019-11-25

6050

Finding the closest objects in the feature space在特征空间中找到最接近的对象

scikit-learn 机器学习神经网络深度学习人工智能

Sometimes, the easiest thing to do is to just find the distance between two objects. We just need to find some distance metric, compute the pairwise distances, and compare the outcomes to what's expected.

到不了的都叫做远方

2019-11-24

6490

Quantizing an image with KMeans clustering使用KMeans聚类量化图片

scikit-learn 机器学习神经网络深度学习

Image processing is an important topic in which clustering has some application.

到不了的都叫做远方

2019-11-23

1K0

Directly applying Bayesian ridge regression直接使用贝叶斯岭回归

scikit-learn 机器学习神经网络深度学习人工智能

In the Using ridge regression to overcome linear regression's shortfalls recipe, we discussed the connections between the constraints imposed by ridge regression from an optimization standpoint. We also discussed the Bayesian interpretation of priors on the coefficients, which attract the mass of the density towards the prior, which often has a mean of 0 .

到不了的都叫做远方

2019-11-18

1.5K0

Using sparsity to regularize models使用稀疏性来正则化模型

线性回归 scikit-learn 机器学习神经网络深度学习

The least absolute shrinkage and selection operator (LASSO) method is very similar to ridge regression and LARS. It's similar to Ridge Regression in the sense that we penalize our regression by some amount, and it's similar to LARS in that it can be used as a parameter selection, and it typically leads to a sparse vector of coefficients.

到不了的都叫做远方

2019-11-14

5120

Optimizing the ridge regression parameter最优化岭回归参数

scikit-learn 机器学习神经网络深度学习人工智能

Once you start using ridge regression to make predictions or learn about relationships in the system you're modeling, you'll start thinking about the choice of alpha.For example, using OLS regression might show some relationship between two variables;however, when regularized by some alpha, the relationship is no longer significant. This can be a matter of whether a decision needs to be taken.

到不了的都叫做远方

2019-11-13

1.5K0

Using stochastic gradient descent for regression使用随机梯度下降进行回归分析

scikit-learn 机器学习神经网络深度学习人工智能

In this recipe, we'll get our first taste of stochastic gradient descent. We'll use it for regression here, but for the next recipe, we'll use it for classification.

到不了的都叫做远方

2019-11-09

5410

Using Gaussian processes for regression降维之高斯过程

scikit-learn 机器学习神经网络深度学习人工智能

In this recipe, we'll use the Gaussian process for regression. In the linear models section,we saw how representing prior information on the coefficients was possible using Bayesian Ridge Regression.

到不了的都叫做远方

2019-11-07

9770

Using truncated SVD to reduce dimensionality使用截断奇异值进行降维

数据分析 scikit-learn 机器学习神经网络

Truncated Singular Value Decomposition (SVD) is a matrix factorization technique that factors a matrix M into the three matrices U, Σ, and V. This is very similar to PCA, excepting that the factorization for SVD is done on the data matrix, whereas for PCA, the factorization is done on the covariance matrix. Typically, SVD is used under the hood to find the principle components of a matrix.

到不了的都叫做远方

2019-11-03

2.1K0

Reducing dimensionality with PCA主成分分析之降维

数据分析 scikit-learn 机器学习神经网络深度学习

Now it's time to take the math up a level! Principal component analysis (PCA) is the first somewhat advanced technique discussed in this book. While everything else thus far has been simple statistics, PCA will combine statistics and linear algebra to produce a preprocessing step that can help to reduce dimensionality, which can be the enemy of a simple model.

到不了的都叫做远方

2019-10-31

7470

使用Pipelines来整合多个数据预处理步骤

css scikit-learn 机器学习神经网络深度学习

Pipelines are (at least to me) something I don't think about using often, but are useful.They can be used to tie together many steps into one object. This allows for easier tuning and better access to the configuration of the entire model, not just one of the steps.

到不了的都叫做远方

2019-10-30

1.6K0

Imputing missing values through various strategies填充处理缺失值的不同方法

scikit-learn 机器学习神经网络深度学习人工智能

Data imputation is critical in practice, and thankfully there are many ways to deal with it.In this recipe, we'll look at a few of the strategies. However, be aware that there might be other approaches that fit your situation better.

到不了的都叫做远方

2019-10-30

8360

Working with categorical variables处理分类变量

scikit-learn 机器学习神经网络深度学习人工智能

Categorical variables are a problem. On one hand they provide valuable information; on the other hand, it's probably text—either the actual text or integers corresponding to the text—like an index in a lookup table.So, we clearly need to represent our text as integers for the model's sake, but we can't just use the id field or naively represent them. This is because we need to avoid a similar problem to the Creating binary features through thresholding recipe. If we treat data that is continuous, it must be interpreted as continuous.

到不了的都叫做远方

2019-10-29

8050

Creating binary features through thresholding通过阈值来生成二元特征

scikit-learn 机器学习神经网络深度学习人工智能

In the last recipe, we looked at transforming our data into the standard normal distribution.Now, we'll talk about another transformation, one that is quite different.

到不了的都叫做远方

2019-10-28

4190

Scaling data to the standard normal缩放数据到标准正态形式

c++面向对象编程 scikit-learn 机器学习神经网络

A preprocessing step that is almost recommended is to scale columns to the standard normal. The standard normal is probably the most important distribution of all statistics.

到不了的都叫做远方

2019-10-27

1.2K0

scikit-learn Cookbook 01

scikit-learn numpy 机器学习神经网络深度学习

I will again implore you to use some of your own data for this book, but in the event you cannot,we'll learn how we can use scikit-learn to create toy data.

到不了的都叫做远方

2019-10-26

3990

scikit-learn Cookbook 00

scikit-learn 机器学习神经网络深度学习人工智能

This chapter discusses setting data, preparing data, and premodel dimensionality reduction.These are not the

到不了的都叫做远方

2019-10-25

4140

没有更多了

社区活动

腾讯技术创作狂欢月

“码”上创作 21 天，分 10000 元奖品池！

Python精品学习库

代码在线跑，知识轻松学

博客搬家 | 分享价值百万资源包

自行/邀约他人一键搬运博客，速成社区影响力并领取好礼

技术创作特训营·精选知识专栏

往期视频·千货材料·成员作品最新动态