An outlier is a point for which $y_i$ is far from the value predicted by the model. Outliers can arise for a variety of reasons, such as incorrect recording of an observation during data collection.
To address the outlier problem, instead of plotting the residuals, we can plot the studentized residuals, computed by dividing each residual $e_i$ by its estimated standard error. Observations whose studentized residuals are greater than 3 in absolute value are possible outliers.
在去除outliers后,要考察对于模型的RSE和$R^2$的改善性况是否显著。
另一方面,我们也可以通过考查outliers的leverage水平高低,来判断其对least square fit线(红线)的影响程度,即leverage水平越高,对least square fit线的影响越大。
$$h_i=\frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum^n_{i \prime =1}(x_{i\prime}-\bar{x})^2}, 1/n \le h_i \le 1$$
正常情况下the average leverage for all the observations $h_i = (p+1)/n$,如果大幅超越了$(p+1)/n$,则要怀疑这个点的high leverage的影响。如:
No comments:
Post a Comment