其实最后一天,反而是任务最繁重的。这一天,需要纵览SAS的各个常用的统计模块。BTW,在用惯了ggplot2之后,再也不认为有任何理由用其他软件画图了...所以SAS的图形模块自动被我无视(貌似很多SAS用户也一直在吐槽这东西着实不好使)。
其实前几天也说过了PROC MEANS,不过这里稍稍补充一点置信区间的东西吧。其实它的参数真的挺多的:
ode variance
在调用CLM的时候需要指定ALPHA:
结果如下:
虽然correlation一直被各种批判,但是往往在拿到数据的第一步、毫无idea的时候,correlation还是值得一看的参考指标。SAS里面的PROC CORR提供了相应的功能。
SAS的相关性分析结果输出如下:
类似于R中的lm(),这个实在是没什么好说的了,最基本的最小二乘法。
SAS的输出结果如下:
包含了回归模型的基本统计量。我们一般更关注的回归系数:
到这里,我的感慨就是:真的很像Stata呀!值得注意的是,REG有很多可选的参数,对于这些参数是干嘛用的,最权威的自然还是SAS官方的文档:http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect007.htm。其实熟悉了SAS的语法和工作模式之后,具体到某个模型还是看官方文档比较舒服。不愧是商业软件啊,文档写的都很专业,有很多模型选择问题其实看看文档就能多少明白一些了。
比如PROC REG的参数就有:
Option | Description |
---|---|
Data Set Options | |
DATA= | names a data set to use for the regression |
OUTEST= | outputs a data set that contains parameter estimates and othermodel fit summary statistics |
OUTSSCP= | outputs a data set that contains sums of squares and crossproducts |
COVOUT | outputs the covariance matrix for parameter estimates to theOUTEST= data set |
EDF | outputs the number of regressors, the error degrees of freedom,and the model to the OUTEST= data set |
OUTSEB | outputs standard errors of the parameter estimates to theOUTEST= data set |
OUTSTB | outputs standardized parameter estimates to the OUTEST= dataset. Use only with the RIDGE= or PCOMIT= option. |
OUTVIF | outputs the variance inflation factors to the OUTEST= data set.Use only with the RIDGE= or PCOMIT= option. |
PCOMIT= | performs incomplete principal component analysis and outputsestimates to the OUTEST= data set |
PRESS | outputs the PRESS statistic to the OUTEST= data set |
RIDGE= | performs ridge regression analysis and outputs estimates to theOUTEST= data set |
RSQUARE | same effect as the EDF option |
TABLEOUT | outputs standard errors, confidence limits, and associated teststatistics of the parameter estimates to the OUTEST= data set |
ODS Graphics Options | |
PLOTS= | produces ODS graphical displays |
Traditional Graphics Options | |
ANNOTATE= | specifies an annotation data set |
GOUT= | specifies the graphics catalog in which graphics output is saved |
Display Options | |
CORR | displays correlation matrix for variables listed in MODEL andVAR statements |
SIMPLE | displays simple statistics for each variable listed in MODEL andVAR statements |
USCCP | displays uncorrected sums of squares and crossproducts matrix |
ALL | displays all statistics (CORR, SIMPLE, and USSCP) |
NOPRINT | suppresses output |
LINEPRINTER | creates plots requested as line printer plot |
Other Options | |
ALPHA= | sets significance value for confidence and prediction intervals and tests |
SINGULAR= | sets criterion for checking for singularity |
方差分析也就不赘述了,其实我感觉没有回归分析更用的普遍...这俩东西某种程度上也是一回事儿,看怎么理解了。
SAS的输出如下:
先是用作分类的变量的基本统计。然后是模型的基本统计:
最后是各个组的分析结果(两两比较,由于指定了SCHEFFE参数):
最简单的离散被解释变量模型就是logit了,在SAS里面有直接的PROC LOGISTIC。官方文档在此:http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#logistic_toc.htm
语法自然是一如既往的简单:
结果返回:
The LOGISTIC Procedure
Model Information | |
---|---|
Data Set | WORK.INGOTS |
Response Variable (Events) | r |
Response Variable (Trials) | n |
Model | binary logit |
Optimization Technique | Fisher's scoring |
Number of Observations Read | 19 |
---|---|
Number of Observations Used | 19 |
Sum of Frequencies Read | 387 |
Sum of Frequencies Used | 387 |
首先自然是模型的统计信息。然后是数据的统计:
Response Profile | ||
---|---|---|
OrderedValue | Binary Outcome | TotalFrequency |
1 | Event | 12 |
2 | Nonevent | 375 |
Model Convergence Status |
---|
Convergence criterion (GCONV=1E-8) satisfied |
然后是假设检验:
Model Fit Statistics | ||
---|---|---|
Criterion | InterceptOnly | InterceptandCovariates |
AIC | 108.988 | 103.222 |
SC | 112.947 | 119.056 |
-2 Log L | 106.988 | 95.222 |
Testing Global Null Hypothesis: BETA=0 | |||
---|---|---|---|
Test | Chi-Square | DF | Pr > ChiSq |
Likelihood Ratio | 11.7663 | 3 | 0.0082 |
Score | 16.5417 | 3 | 0.0009 |
Wald | 13.4588 | 3 | 0.0037 |
最后是参数估计:
Analysis of Maximum Likelihood Estimates | |||||
---|---|---|---|---|---|
Parameter | DF | Estimate | StandardError | WaldChi-Square | Pr > ChiSq |
Intercept | 1 | -5.9901 | 1.6666 | 12.9182 | 0.0003 |
Heat | 1 | 0.0963 | 0.0471 | 4.1895 | 0.0407 |
Soak | 1 | 0.2996 | 0.7551 | 0.1574 | 0.6916 |
Heat*Soak | 1 | -0.00884 | 0.0253 | 0.1219 | 0.7270 |
而对于泊松模型,则需要PROC GENMOD。我觉得我一一个列出这些模型已经超出了这篇笔记的范围了...所以干脆就改成简单翻译一下各个PROC的主要模型吧。说过了,学习模型不是 主要的目的——模型终究不该通过软件来学...虽然SAS的user guide真的还算是比较好的统计学教材呢。
除了上面说到的PROC,SAS当然还有更多强大的模块。我就顺手一一点开看看这些东西都能做什么...