# 统计学习导论 Chapter3--Linear Regression

Book: An Introduction to Statistical Learning with Applications in R http://www-bcf.usc.edu/~gareth/ISL/

3.1 Simple Linear Regression Simple linear regression is a useful approach for predicting a response on the basis of a single predictor variable 单个变量的分析

we can regress sales onto TV by fitting the model

3.1.1 Estimating the Coefficients 如何估计这些模型参数了 我们有 n 个训练数据

3.1.2 Assessing the Accuracy of the Coefficient Estimates 如何评估我们估计的参数值有多准确了？ 假定 X和Y 的 true relationship 的形式如下：

Notice that different data sets generated from the same true model result in slightly different least squares lines, but the unobserved population regression line does not change.

The sample mean and the population mean are different,但是通常 sample mean 可以对 population mean 提供一个很好的估计。类似的， the unknown coefficients β 0 and β 1 in linear regression define the population regression line,我们对这些参数使用（3.4）进行估计，这些参数估计定义了 least squares line

linear regression 和随机变量均值的估计 都涉及到一个概念： bias 偏差。如果我们用样本均值 sample mean µ^ 来估计 µ，这个估计就是 unbiased，从平均的意义上来说，我们期望µ^ 等于 µ，这究竟是什么意思了？ 对于某一特定观测数据集，µ^可能 overestimate µ，对另一观测数据集，µ^可能 underestimate µ。但是如果我们的观测样本数量足够大，那么这个估计的均值就完全等于µ。 所以一个无偏估计器对于要估计的参数没有系统的误差。无偏属性对于用（3.4）得到的最小二乘参数估计同样成立：如果我们在某一特定数据集上估计 β 0 和 β 1，我们的估计结果可能不会完全等于 β 0 和 β 1。但是 如果我们的数据集足够的大，那么这个估计值就完全等于参数的真值。

Standard errors 也可以用于参数的 hypothesis tests，最常用的 hypothesis test 涉及测试 the null hypothesis of

3.1.3 Assessing the Accuracy of the Model 如何评估模型的拟合精度了？这里我们介绍 对线性回归拟合质量的评估的两个相关 quantities： the residual standard error (RSE) and the R2 statistic Residual Standard Error

RSE is an estimate of the standard deviation of

. Roughly speaking, it is the average amount that the response will deviate from the true regression line

R2 Statistic

R2 measures the proportion of variability in Y that can be explained using X. An R2 statistic that is close to 1 indicates that a large proportion of the variability in the response has been explained by the regression. A number near 0 indicates that the regression did not explain much of the variability in the response; this might occur because the linear model is wrong, or the inherent error σ2 is high, or both.

3.2 Multiple Linear Regression Simple linear regression is a useful approach for predicting a response on the basis of a single predictor variable 单个变量分析

3.2.1 Estimating the Regression Coefficients 多变量参数估计还是使用 least squares approach，只不过需要使用矩阵来表示更简洁，所以这里我们就可以给出具体推导 当我们进行multiple linear regression，我们主要关注以下四个问题： 1. Is at least one of the predictors X 1 ,X 2 ,…,X p useful in predicting the response? 2. Do all the predictors help to explain Y , or is only a subset of the predictors useful? 3. How well does the model fit the data? 4. Given a set of predictor values, what response value should we predict, and how accurate is our prediction? 后面的讨论都是围绕这个四个问题展开的。

3.3 Other Considerations in the Regression Model 3.3.1 Qualitative Predictors 不是定量描述变量，而是定性描述变量 predictors are qualitative

3.3.2 Extensions of the Linear Model 线性模型有两个假设：additive and linear 在实际问题中，有时不满足这两个假设 所以有时需要我们去掉这两个假设 ： Removing the Additive Assumption

Non-Linear Relationships

3.3.3 Potential Problems 使用线性回归模型可能存在的问题 1. Non-linearity of the response-predictor relationships. 2. Correlation of error terms. 3. Non-constant variance of error terms. 4. Outliers. 5. High-leverage points. 6. Collinearity. 这里做了些简要的分析，不是本书关注的重点

0 条评论

## 相关文章

### 互联网广告CTR预估新算法：基于神经网络的DeepFM原理解读

CTR（Click-Through-Rate）即点击通过率，是互联网广告常用的术语，指网络广告（图片广告/文字广告/关键词广告/排名广告/视频广告等）的点击到达...

34320

30360

22640

33450

35560

797140

81470

35780

456100

69050