机器学习实战 | 第二章:线性回归模型

线性回归(Linear Regression)

这个类是传统最小二乘回归的类.是最基础的线性回归的类.

class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

参数: fit_intercept : 布尔型,可选.是否计算模型的截距.要是设置为False的话,就不会计算截距了.(表明数据已经中心化了.) normalize : 布尔型,可选,默认是False.如果是True的话,X就会在回归之前标准化.当fit_intercept被设置为False后,这个参数会被忽略. copy_X : 布尔型,可选,默认是True.表示X会被拷贝.否则的话,X可能被重写改变. n_jobs : int类型,可选,默认是1. 表示计算的时候使用的多个线程.如果设置为-1的话,那么所有CPU都会被使用到.

属性

coef_ : array类型, 形状可以是 (n_features, )或者(n_targets, n_features) (至于原因可以看理论笔记). 这个表示的是线性模型的系数 residues_ : array, shape (n_targets,) or (1,) or empty Sum of residuals. Squared Euclidean 2-norm for each target passed during the fit. If the linear regression problem is under-determined (the number of linearly independent rows of the training matrix is less than its number of linearly independent columns), this is an empty array. If the target vector passed during the fit is 1-dimensional, this is a (1,) shape array. New in version 0.18. intercept_ : array类型,表示截距.

函数

fit(X, y, sample_weight=None)

拟合线性模型.这个函数在以后的很多其他的机器学习方法类中都会有. 参数: X : numpy array类型或者系数矩阵类型,形状为[n_samples,n_features] 表述训练数据集. y : numpy array类型,形状为[n_samples, n_targets],标签值. sample_weight : numpy array类型,形状为[n_samples]每个样本的权重.

get_params(deep=True)

Get parameters for this estimator. Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params : mapping of string to any Parameter names mapped to their values.

predict(X)

使用训练好的线性模型去预测.返回的是形状为(n_samples,)的array,表示预测值. 参数: X : {array-like, sparse matrix}, 形状为 (n_samples, n_features),表示测试集合.

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) * 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) * 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters: X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns: score : float R^2 of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it’s possible to update each component of a nested object. Returns: self :

岭回归(Ridge)

class sklearn.linear_model.Ridge(alpha=1.0,fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver=’auto’, random_state=None)

岭回归是以损失函数为线性最小二乘函数,同时带L2正则的线性回归形式.

参数: alpha : {float, array-like}, 形状为 (n_targets).这个是正则项的参数,表示调节的强度.必须是正的浮点型. 一般来说,越大的值,表示越强有力的调节强度. copy_X : 布尔型,可选,默认是True.表示X会被拷贝.否则的话,X可能被重写改变. fit_intercept : 布尔型,可选.是否计算模型的截距.要是设置为False的话,就不会计算截距了.(表明数据已经中心化了.) max_iter : 整形,可选.表示共轭梯度求解器(conjugate gradient solver)最大的迭代次数. 对于 ‘sparse_cg’ 和‘lsqr’ 来说,默认值为scipy.sparse.linalg中的默认值.对于‘sag’来说,默认值是1000 normalize : 布尔型,可选,默认是False.如果是True的话,X就会在回归之前标准化.当fit_intercept被设置为False后,这个参数会被忽略. solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’} 计算方式. ‘auto’ 根据数据的类型自动选择s ‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’. ‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution. ‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter). ‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest but may not be available in old scipy versions. It also uses an iterative procedure. ‘sag’ uses a Stochastic Average Gradient descent. It also uses an iterative procedure, and is often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. All last four solvers support both dense and sparse data. However, only ‘sag’ supports sparse input when fit_intercept is True. New in version 0.17: Stochastic Average Gradient descent solver. tol : 浮点型,表示结果的精度. random_state : int seed, RandomState instance, or None (default) The seed of the pseudo random number generator to use when shuffling the data. Used only in ‘sag’ solver. New in version 0.17: random_state to support Stochastic Average Gradient.

属性

coef_ : array类型, 形状可以是 (n_features, )或者(n_targets, n_features) (至于原因可以看理论笔记). 这个表示的是线性模型的系数 intercept_ : array类型,表示截距. n_iter_ : 表示每个target实际上迭代的次数.仅仅对sag和lsqr有用.其他的会返回None.

函数

fit(X, y, sample_weight=None)

拟合岭回归模型. 参数: X : numpy array类型或者系数矩阵类型,形状为[n_samples,n_features] 表述训练数据集. y : numpy array类型,形状为[n_samples, n_targets],标签值. sample_weight : numpy array类型,形状为[n_samples]每个样本的权重.

get_params(deep=True)[source] Get parameters for this estimator. Parameters: deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params : mapping of string to any Parameter names mapped to their values. predict(X)[source] Predict using the linear model Parameters: X : {array-like, sparse matrix}, shape = (n_samples, n_features) Samples. Returns: C : array, shape = (n_samples,) Returns predicted values. score(X, y, sample_weight=None)[source] Returns the coefficient of determination R^2 of the prediction. The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) * 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) * 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0. Parameters: X : array-like, shape = (n_samples, n_features) Test samples. y : array-like, shape = (n_samples) or (n_samples, n_outputs) True values for X. sample_weight : array-like, shape = [n_samples], optional Sample weights. Returns: score : float R^2 of self.predict(X) wrt. y. set_params(**params)[source] Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it’s possible to update each component of a nested object. Returns: self :

例子:

1.from sklearn.linear_model import Ridgeimport numpy as np 2.n_samples, n_features = 10, 5np.random.seed(0) 3.y = np.random.randn(n_samples) 4.X = np.random.randn(n_samples, n_features) 5.clf = Ridge(alpha=1.0) 6.clf.fit(X, y)

原文发布于微信公众号 - 人工智能LeadAI(atleadai)

原文发表时间:2017-09-04

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏Vamei实验室

Python标准库12 数学与随机数 (math包,random包)

我们已经在Python运算中看到Python最基本的数学运算功能。此外,math包补充了更多的函数。当然,如果想要更加高级的数学功能,可以考虑选择标准库之外的n...

1708
来自专栏yl 成长笔记

时间复杂度的概念以及计算

The time complexity of an algorithm quantifies the amout of time taken by an alg...

542
来自专栏爱撒谎的男孩

设计模式之常见关系

1783
来自专栏西枫里博客

Python学习笔记十一(递归)

本次学习先回顾了前两天的lambda表达式,使用lambda表达式创建匿名函数。接着学习本次课程的内容:Python的递归。什么是递归,程序调用自身的编程方法叫...

522
来自专栏有趣的Python

3- OpenCV+TensorFlow 入门人工智能图像处理-TensorFlow入门

tensorflow基础入门 思考一个问题: 如何刚好学习TensorFlow 类比为一门开发语言,学会语法,api的调用, 原理性掌握。 语言的要素: 基础...

6438
来自专栏数据结构与算法

1062. 计算曼哈顿距离

题目描述 给出平面上两个点的坐标(x1,y1),(x2,y2),求两点之间的曼哈顿距离。曼哈顿距离=|x1-x2|+|y1-y2|。 输入 一行四个空格隔开的实...

3067
来自专栏青青天空树

小白详细讲解快速幂--杭电oj2035-A^B

输入数据包含多个测试实例,每个实例占一行,由两个正整数A和B组成(1<=A,B<=10000),如果A=0, B=0,则表示输入数据的结束,不做处理。

713
来自专栏ArrayZoneYour的专栏

如何用Python将时间序列转换为监督学习问题

像深度学习这样的机器学习方法可以用于时间序列预测。

3468
来自专栏抠抠空间

逻辑运算

一、逻辑运算符的种类及优先级 ▷逻辑运算符包括 not and or  ▷他们的优先级是 () > not > and > or 二、普通逻辑运算 ▷A and...

2629
来自专栏数据结构与算法

P2331 [SCOI2005]最大子矩阵

题目描述 这里有一个n*m的矩阵,请你选出其中k个子矩阵,使得这个k个子矩阵分值之和最大。注意:选出的k个子矩阵不能相互重叠。 输入输出格式 输入格式: 第一行...

33210

扫描关注云+社区