# Using ridge regression to overcome linear regression's shortfalls

### 使用岭回归克服线性回归的偏差

In this recipe, we'll learn about ridge regression. It is different from vanilla linear regression;it introduces a regularization parameter to "shrink" the coefficients. This is useful when the dataset has collinear factors.

Let's load a dataset that has a low effective rank and compare ridge regression with linear regression by way of the coefficients. If you're not familiar with rank, it's the smaller of the linearly independent columns and the linearly independent rows. One of the assumptions of linear regression is that the data matrix is of "full rank".

How to do it...怎么做

First, use make_regression to create a simple dataset with three predictors, but an effective rank of 2 .

Effective rank means that while technically the matrix is of full rank,many of the columns have a high degree of colinearity:

```from sklearn.datasets import make_regression
reg_data,reg_target=make_regression(n_samples=2000,n_features=3, effective_rank=2, noise=10)```

First, let's take a look at regular linear regression:首先，我们看一下常规线性回归

```import numpy as np
n_bootstraps = 1000
len_data = len(reg_data)
subsample_size = np.int(0.75*len_data)
subsample = lambda: np.random.choice(np.arange(0, len_data),size=subsample_size)
coefs = np.ones((n_bootstraps, 3))
for i in range(n_bootstraps):
subsample_idx = subsample()
subsample_X = reg_data[subsample_idx]
subsample_y = reg_target[subsample_idx]
lr.fit(subsample_X, subsample_y)
coefs[i][0] = lr.coef_[0]
coefs[i][1] = lr.coef_[1]
coefs[i][2] = lr.coef_[2]```

The following is the output that gets generated:输出如下图所示

Follow the same procedure with Ridge , and have a look at the output:用同样的步骤实现Ridge，然后看一下输出结果：

```from sklearn.linear_model import Ridge()
r = Ridge()
n_bootstraps = 1000
len_data = len(reg_data)
subsample_size = np.int(0.75*len_data)
subsample = lambda: np.random.choice(np.arange(0, len_data),size=subsample_size)
coefs_r = np.ones((n_bootstraps, 3))  # carry out the same procedure from above和上面的步骤一样```

The following is the output that gets generated:输出结果如下图所示

Don't let the similar width of the plots fool you; the coefficients for ridge regression are much closer to 0 . Let's look at the average spread between the coefficients:

```>>> np.mean(coefs - coefs_r, axis=0)
#coefs_r stores the ridge regression coefficients    coefs_r 存储着岭回归的系数
array([13.24098749, 18.28340271, 61.73626459])```

So, on an average, the coefficients for linear regression are much higher than the ridge regression coefficients. This difference is the bias in the coefficients (forgetting, for a second,the potential bias of the linear regression coefficients). So then, what is the advantage of ridge regression? Well, let's look at the variance of our coefficients:

```np.var(coefs, axis=0)
array([255.01858444, 182.01195126, 218.14725252])
np.var(coefs_r, axis=0)
array([19.87551666, 22.97529897, 20.99950272])```

The variance has been dramatically reduced. This is the bias-variance trade-off that is so often discussed in machine learning. The next recipe will introduce how to tune the regularization parameter in ridge regression, which is at the heart of this trade-off.

How it works...怎么运行的

Speaking of the regularization parameter, let's go through how ridge regression differs from linear regression. As was already shown, linear regression works, but it finds the vector of betas that minimize ||y-X β||^2

Ridge regression finds the vector of betas that minimize ||y-X β||^2+|| ΓX||^2 岭回归是通过最小化||y-X β||^2+|| ΓX||^2

Γ is typically al, or it's some scalar times the identity matrix. We actually used the default alpha when initializing ridge regression.

Γ代表al，或者缩放过的单位矩阵。初始化岭回归时，我们实际上使用自定义的α

Now that we created the object, we can look at its attributes:现在我们生成一个对象来看一下它的属性

```r #notice the alpha paramete
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
normalize=False, random_state=None, solver='auto', tol=0.001)
```

This minimization has the following solution:优化过程经过以下步骤

The previous solution is the same as linear regression, except for the term. For a matrix A,is symmetric, and thus positive semidefinite. So, thinking about the translation of matrix algebra from scalar algebra, we effectively divide by a larger number. Multiplication by an inverse is analogous to division. So, this is what squeezes the coefficients towards 0. This is a bit of a crude explanation; for a deeper understanding, you should look at the connections between SVD and ridge regression.

• ### Evaluating the linear regression model评估线性回归模型

In this recipe, we'll look at how well our regression fits the underlying data. ...

• ### Feature selection特征选择

This recipe along with the two following it will be centered around automatic fe...

• ### Regression model evaluation回归模型评估

We learned about quantifying the error in classification, now we'll discuss quan...

• ### 【Codeforces 738A】Interview with Oleg

http://codeforces.com/contest/738/problem/A

• ### 在SAPGUI的SE16事务码里直接查看类型为RAWSTRING的数据库表字段内容

Sometimes you would like to view the content of field with RAWSTRING type for a ...

• ### 分布式并行架构Ray介绍

Ray is a fast and simple framework for building and running distributed applicat...

• ### Head First Stanford NLP (1)

(深入浅出Stanford NLP 基础篇) 本文主要介绍Stanford NLP工具的基本使用方法。

• ### 统计学习导论 Chapter3--Linear Regression

Book: An Introduction to Statistical Learning with Applications in R http:...