To address the outlier problem, instead of plotting the residuals, we can plot the studentized residuals, computed by dividing each residual e_i by its estimated standard error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers.
在去除outliers后,要考察对于模型的RSE和R^2的改善性况是否显著。
另一方面,我们也可以通过考查outliers的leverage水平高低,来判断其对least square fit线(红线)的影响程度,即leverage水平越高,对least square fit线的影响越大。
h_i=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum^n_{i \prime =1}(x_{i\prime}-\bar{x})^2}, 1/n \le h_i \le 1
正常情况下the average leverage for all the observations h_i = (p+1)/n,如果大幅超越了(p+1)/n,则要怀疑这个点的high leverage的影响。如:
No comments:
Post a Comment