library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.3
In Section 10.2.3, a formula for calculating PVE (proportion of variance explained) was given in Equation 10.8. We also saw that the PVE can be obtained using the sdev output of the prcomp() function.
On the USArrests data, calculate PVE in two ways:
Using the sdev output of the prcomp() function, as was done in Section 10.2.3.
By applying Equation 10.8 directly. That is, use the prcomp() function to compute the principal component loadings. Then, use those loadings in Equation 10.8 to obtain the PVE.
These two approaches should give the same results.
Hint: You will only obtain the same results in (a) and (b) if the same data is used in both cases. For instance, if in (a) you performed prcomp() using centered and scaled variables, then you must center and scale the variables before applying Equation 10.3 in (b).
pr.out=prcomp(USArrests, scale=TRUE)
pr.out$sdev
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
pr.var = pr.out$sdev^2
pr.var
## [1] 2.4802416 0.9897652 0.3565632 0.1734301
We must match these in part b
pve = pr.var / sum(pr.var)
pve
## [1] 0.62006039 0.24744129 0.08914080 0.04335752
loadings = pr.out$rotation
loadings
## PC1 PC2 PC3 PC4
## Murder -0.5358995 0.4181809 -0.3412327 0.64922780
## Assault -0.5831836 0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158 0.13387773
## Rape -0.5434321 -0.1673186 0.8177779 0.08902432
From Prince
pve2 = rep(NA, 4)
dmean = apply(USArrests, 2, mean)
dsdev = sqrt(apply(USArrests, 2, var))
dsc = sweep(USArrests, MARGIN=2, dmean, "-")
dsc = sweep(dsc, MARGIN=2, dsdev, "/")
for (i in 1:4) {
proto_x = sweep(dsc, MARGIN=2, loadings[,i], "*")
pc_x = apply(proto_x, 1, sum)
pve2[i] = sum(pc_x^2)
}
pve2 = pve2/sum(dsc^2)
pve2
## [1] 0.62006039 0.24744129 0.08914080 0.04335752
These numbers match Part a