Indice

Regressione lineare multipla (multivariata)

Regressione lineare multipla (multivariata)

Bozza

Vedi:

I modelli

Tab. 1: Esempio: dataset *mtcars*

mpg	miglia per gallone
hp	potenza
wt	peso

modello additivo

$$\hat Y = a + b_1X1 + b_2X2$$

mpg ~ wt + hp

modello con interazione

$$\hat Y = a + b_1X1 + b_2X2 + b_3X1X2$$

mpg ~ wt * hp

Test di normalità

Per il test di normalità, utilizziamo la funzione shapiro.test() (vedi test di Shapiro).

Tab. 2: Test di normalità sulle variabili

	statistic	p.value
mpg	0,9475647	0,1228814
wt	0,9432577	0,0926550
hp	0,9334193	0,0488082

Modello additivo

fit1 <- lm(mpg ~ wt + hp, data = mtcars) 
summary(fit1)

## Call:
## lm(formula = mpg ~ wt + hp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268,	Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

`

Ovvero: $\hat{mpg} = 32,23 -3,88wt -0,03hp$

Modello con interazione

fit2 <- lm(mpg ~ wt * hp, data = mtcars) 
summary(fit2)

## Call:
## lm(formula = mpg ~ wt * hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0632 -1.6491 -0.7362  1.4211  4.5513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
## wt          -8.21662    1.26971  -6.471 5.20e-07 ***
## hp          -0.12010    0.02470  -4.863 4.04e-05 ***
## wt:hp        0.02785    0.00742   3.753 0.000811 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.153 on 28 degrees of freedom
## Multiple R-squared:  0.8848,	Adjusted R-squared:  0.8724 
## F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13

Ovvero: $\hat {mpg} = 49,81 -8,21 wt -0,12 hp +0,03 wt \cdot hp$

L'R quadro corretto

Aumentando il numero delle variabili indipendenti, vi è una certa quota di varianza aggiuntiva spiegata, indipendentemente dalla significatività del contributo di ciascuna variabile. Per questa ragione, si utilizza un coefficiente di determinazione “corretto” (Adjusted R-squared):

$$\text{adj.R2} = 1 - (1- R^2)\frac{n-1} {n - p - 1}$$

# per il primo modello
1 - (1 - 0.8268) * (31 / 29)
 
# oppure
1 - (1 - summary(fit1)$r.squared) * (31 / fit1$df.residual)

## [1] 0.8148396

# per il secondo modello
1 - (1 - 0.8848) * (31 / 28)
 
# oppure
1 - (1 - summary(fit2)$r.squared) * (31 / fit2$df.residual)

## [1] 0.872417

Scomposizione dei quadrati (analisi della devianza) e test F

Vedi Modelli lineari e scomposizione della devianza

Confronto fra i modelli

Tabella delle statistiche

Con la funzione glance() del pacchetto broom (vedi la pagina Broom, e rbind().

library(broom)
 
rbind("fit1" = glance(fit1), 
      "fit2" = glance(fit2))

## # A tibble: 2 x 11
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC deviance df.residual
## *     <dbl>         <dbl> <dbl>     <dbl>    <dbl> <int>  <dbl> <dbl> <dbl>    <dbl>       <int>
## 1     0.827         0.815  2.59      69.2 9.11e-12     3  -74.3  157.  163.     195.          29
## 2     0.885         0.872  2.15      71.7 2.98e-13     4  -67.8  146.  153.     130.          28

Analisi della devianza e test F

Vedi: Analisi della devianza nel confronto di modelli

anova(fit1, fit2)

## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + hp
## Model 2: mpg ~ wt * hp
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     29 195.05                                  
## 2     28 129.76  1    65.286 14.088 0.0008108 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Script di esempio

E' possibile scaricare ed eseguire lo script dell'esempio:

Es-regr-mult.R

# additivo
fit1 <- lm(mpg ~ wt + hp, data = mtcars) 
summary(fit1)
 
# con interazione
fit2 <- lm(mpg ~ wt * hp, data = mtcars) 
summary(fit2)
 
# confronto
library(broom)
 
rbind("fit1" = glance(fit1), 
      "fit2" = glance(fit2))
 
anova(fit1, fit2)

Analisi multivariata, Regressione lineare, Modelli lineari