Principal Components Analysis and visualization tools for exploring shape space

Antigoni Kaliontzopoulou

2021-10-13

Since version 3.1.0 of geomorph, we have introduced the function gm.prcomp, and related utility functions (summary() and plot()), for performing principal components analyses on Procrustes shape variables for a set of aligned specimens. This function now includes several different types of analytical options and, combined with other visualization tools available in geomorph, provides tools for exploring variation in shape space. This vignette illustrates the analytical, plotting, and visualization capacities of gm.prcomp and other related functions.

NOTE: The combination of the functions and procedures illustrated here are meant to substitute and extend the options previously available through the functions plotTangentSpace and plotGMPhyloMorphoSpace, which are no longer maintained and have thus been deprecated.


Example dataset

Throughout, we will be using shape data of several Plethodon species as an example, so let´s first load and superimpose those.

library(geomorph)
## Loading required package: RRPP
## Loading required package: rgl
## Loading required package: Matrix
data(plethspecies) 
Y.gpa <- gpagen(plethspecies$land, print.progress = F)

1. Traditional PCA and visualization of shape patterns

One first option is to perform a “traditional” PCA, i.e. based on OLS-centering and projection of the data, very much like what is performed in the basic R function prcomp. Note that this also corresponds to the analytical part of the old (now deprecated) geomorph function plotTangentSpace.

PCA <- gm.prcomp(Y.gpa$coords)
summary(PCA)
## 
## Ordination type: Principal Component Analysis 
## Centering by OLS mean
## Orthogonal projection of OLS residuals
## Number of observations: 9 
## Number of vectors 8 
## 
## Importance of Components:
##                               Comp1        Comp2        Comp3        Comp4
## Eigenvalues            0.0002720474 0.0001120524 0.0001084758 0.0000568924
## Proportion of Variance 0.4564029477 0.1879858086 0.1819855633 0.0954461044
## Cumulative Proportion  0.4564029477 0.6443887563 0.8263743196 0.9218204240
##                               Comp5        Comp6        Comp7        Comp8
## Eigenvalues            0.0000264508 1.260516e-05 5.959622e-06 1.584785e-06
## Proportion of Variance 0.0443754550 2.114717e-02 9.998220e-03 2.658730e-03
## Cumulative Proportion  0.9661958790 9.873431e-01 9.973413e-01 1.000000e+00
plot(PCA, main = "PCA")

Note that, by contrast to older functions, gm.prcomp provides a much higher flexibility of plotting options, by allowing to directly pass arguments to the plot() R-base function, e.g.:

plot(PCA, main = "PCA", pch = 22, bg = "green", cex = 1.5, cex.lab = 1.5, font.lab = 2)

One then has several solutions for exploring shape variation across PC space and visualizing shape patterns. First, the user may choose to manually produce deformation grids to compare the shapes corresponding to the extremes of a chosen PC axis using plotRefToTarget, i.e.:

  1. By comparing the minimum and maximum values to the global consensus:
msh <- mshape(Y.gpa$coords)
plotRefToTarget(PCA$shapes$shapes.comp1$min, msh)

plotRefToTarget(msh, PCA$shapes$shapes.comp1$max)

or b) by comparing the two extremes to each other:

plotRefToTarget(PCA$shapes$shapes.comp1$min, PCA$shapes$shapes.comp1$max, method = "vector", mag = 2)

Of course here one can use all the plotting options available in plotRefToTarget. Please see the help file of that function for details.

Alternatively, one may use the interactive function picknplot.shape to explore shape variation across the PC scatterplot produced above. This allows one to click on any point in the above scatterplot (or any scatterplot produced by geomorph) to obtain a deformation grid illustrating the shape that corresponds to the selected point, as compared to the global mean shape. The function then interactively prompts the user to export the grid as an image file, if desired, and to continue by picking another point. To explore the function, run the code below and then click at any point inside the PCA scatterplot:

picknplot.shape(plot(PCA), method = "vector")

Again, options of the function plotRefToTarget are directly available and can be passed to picknplot.shape as additional arguments.

2. Phylomorphospace

One may also want to project a phylogeny (if dealing with species-level observations), and estimated ancestral states into the ordination plot produced before, to obtain what is commonly referred to as a “phylomorphospace” plot (Klingenberg & Ekau 1996, Biol. J. Linn. Soc. 59: 143 - 177; Rohlf 2002, Geometric morphometrics and phylogeny. Pp. 175-193 in N. MacLeod, and P. L. Forey, eds. Morphology, shape and phylogeny. Francis & Taylor, London; Sidlauskas 2008, Evolution 62: 3135 - 3156). This can be easily done by providing a phylogenetic tree during the analysis (which is however NOT used in the analytical step in this case, other than for estimating ancestral states), and then indicating that the phylogeny should also be plotted during the plotting step. For the Plethodon example data, we may project the phylogeny into the previous ordination plot as such:

PCA.w.phylo <- gm.prcomp(Y.gpa$coords, phy = plethspecies$phy)
summary(PCA.w.phylo)
## 
## Ordination type: Principal Component Analysis 
## Centering by OLS mean
## Orthogonal projection of OLS residuals
## Number of observations: 9 
## Number of vectors 8 
## 
## Importance of Components:
##                               Comp1        Comp2        Comp3        Comp4
## Eigenvalues            0.0002720474 0.0001120524 0.0001084758 0.0000568924
## Proportion of Variance 0.4564029477 0.1879858086 0.1819855633 0.0954461044
## Cumulative Proportion  0.4564029477 0.6443887563 0.8263743196 0.9218204240
##                               Comp5        Comp6        Comp7        Comp8
## Eigenvalues            0.0000264508 1.260516e-05 5.959622e-06 1.584785e-06
## Proportion of Variance 0.0443754550 2.114717e-02 9.998220e-03 2.658730e-03
## Cumulative Proportion  0.9661958790 9.873431e-01 9.973413e-01 1.000000e+00
## 
## 
## Dispersion (variance) of points, after projection:
##                                        Comp1        Comp2        Comp3
## Tips Dispersion                 2.720474e-04 1.120524e-04 1.084758e-04
## Proportion Tips Dispersion      4.564029e-01 1.879858e-01 1.819856e-01
## Cumulative Tips Dispersion      4.564029e-01 6.443888e-01 8.263743e-01
## Ancestors Dispersion            3.073342e-05 2.320038e-05 2.039005e-05
## Proportion Ancestors Dispersion 3.854327e-01 2.909597e-01 2.557149e-01
## Cumulative Ancestors Dispersion 3.854327e-01 6.763923e-01 9.321072e-01
##                                        Comp4        Comp5        Comp6
## Tips Dispersion                 5.689240e-05 2.645080e-05 1.260516e-05
## Proportion Tips Dispersion      9.544610e-02 4.437545e-02 2.114717e-02
## Cumulative Tips Dispersion      9.218204e-01 9.661959e-01 9.873431e-01
## Ancestors Dispersion            3.171758e-06 1.748508e-06 3.945777e-07
## Proportion Ancestors Dispersion 3.977752e-02 2.192831e-02 4.948462e-03
## Cumulative Ancestors Dispersion 9.718847e-01 9.938130e-01 9.987615e-01
##                                        Comp7        Comp8
## Tips Dispersion                 5.959622e-06 1.584785e-06
## Proportion Tips Dispersion      9.998220e-03 2.658730e-03
## Cumulative Tips Dispersion      9.973413e-01 1.000000e+00
## Ancestors Dispersion            8.354868e-08 1.520561e-08
## Proportion Ancestors Dispersion 1.047797e-03 1.906960e-04
## Cumulative Ancestors Dispersion 9.998093e-01 1.000000e+00
plot(PCA.w.phylo, phylo = TRUE, main = "PCA.w.phylo")

Note that the summary statistics obtained for this analysis are identical to those from the previous one. This is because here the phylogeny is merely used for plotting, and is NOT considered during the analytical procedures.

Again, all plotting arguments can be directly manipulated by the user. Please see the help file of plot.gm.prcomp for details.

3. phyloPCA

Here, the phylogeny IS considered during the analytical step of the ordination, as the principal components analysis is in this case calculated based on GLS-centering and projection of the data. For details on the analytical part of this method, see Revell 2009, Evolution 63: 3258 - 3268; Polly et al 2013, Hystrix 24: 33 - 41; Collyer & Adams, submitted.

For the Plethodon example data, this analysis would be implemented and plotted as follows (fir with untransformed residual projection and second with transformed residual projection):

# Phylo PCA without projecting untransformed residuals
phylo.PCA <- gm.prcomp(Y.gpa$coords, phy = plethspecies$phy, GLS = TRUE)
summary(phylo.PCA)
## 
## Ordination type: Principal Component Analysis 
## Centering by GLS mean
## Oblique projection of GLS-centered residuals
## Number of observations: 9 
## Number of vectors 8 
## 
## Importance of Components:
##                               Comp1        Comp2        Comp3        Comp4
## Eigenvalues            2.761405e-05 9.322868e-06 7.889847e-06 6.605120e-06
## Proportion of Variance 4.855705e-01 1.639351e-01 1.387365e-01 1.161457e-01
## Cumulative Proportion  4.855705e-01 6.495056e-01 7.882421e-01 9.043878e-01
##                               Comp5        Comp6        Comp7        Comp8
## Eigenvalues            3.017886e-06 1.203779e-06 9.895110e-07 2.262231e-07
## Proportion of Variance 5.306707e-02 2.116747e-02 1.739975e-02 3.977950e-03
## Cumulative Proportion  9.574548e-01 9.786223e-01 9.960221e-01 1.000000e+00
## 
## 
## Dispersion (variance) of points, after projection:
##                                        Comp1        Comp2        Comp3
## Tips Dispersion                 0.0002498092 8.395185e-05 1.057114e-04
## Proportion Tips Dispersion      0.4190949497 1.408427e-01 1.773479e-01
## Cumulative Tips Dispersion      0.4190949497 5.599376e-01 7.372855e-01
## Ancestors Dispersion            0.0000169982 9.862682e-06 2.194379e-05
## Proportion Ancestors Dispersion 0.2131770701 1.236895e-01 2.752006e-01
## Cumulative Ancestors Dispersion 0.2131770701 3.368665e-01 6.120671e-01
##                                        Comp4        Comp5        Comp6
## Tips Dispersion                 0.0001027383 2.765308e-05 1.675548e-05
## Proportion Tips Dispersion      0.1723598707 4.639247e-02 2.811000e-02
## Cumulative Tips Dispersion      0.9096453329 9.560378e-01 9.841478e-01
## Ancestors Dispersion            0.0000272722 1.114653e-06 1.772175e-06
## Proportion Ancestors Dispersion 0.3420249711 1.397905e-02 2.222513e-02
## Cumulative Ancestors Dispersion 0.9540920945 9.680711e-01 9.902963e-01
##                                        Comp7        Comp8
## Tips Dispersion                 7.598972e-06 1.850019e-06
## Proportion Tips Dispersion      1.274849e-02 3.103703e-03
## Cumulative Tips Dispersion      9.968963e-01 1.000000e+00
## Ancestors Dispersion            7.068511e-07 6.689930e-08
## Proportion Ancestors Dispersion 8.864732e-03 8.389947e-04
## Cumulative Ancestors Dispersion 9.991610e-01 1.000000e+00
plot(phylo.PCA, phylo = TRUE, main = "phylo PCA")

# Phylo PCA without projecting transformed residuals
phylo.tPCA <- gm.prcomp(Y.gpa$coords, phy = plethspecies$phy, GLS = TRUE, transform = TRUE)
summary(phylo.tPCA)
## 
## Ordination type: Principal Component Analysis 
## Centering by GLS mean
## GLS residuals transformed for orthogonal projection
## Number of observations: 9 
## Number of vectors 8 
## 
## Importance of Components:
##                               Comp1        Comp2        Comp3        Comp4
## Eigenvalues            2.761405e-05 9.322868e-06 7.889847e-06 6.605120e-06
## Proportion of Variance 4.855705e-01 1.639351e-01 1.387365e-01 1.161457e-01
## Cumulative Proportion  4.855705e-01 6.495056e-01 7.882421e-01 9.043878e-01
##                               Comp5        Comp6        Comp7        Comp8
## Eigenvalues            3.017886e-06 1.203779e-06 9.895110e-07 2.262231e-07
## Proportion of Variance 5.306707e-02 2.116747e-02 1.739975e-02 3.977950e-03
## Cumulative Proportion  9.574548e-01 9.786223e-01 9.960221e-01 1.000000e+00
## 
## 
## Dispersion (variance) of points, after projection:
##                                        Comp1        Comp2        Comp3
## Tips Dispersion                 2.750816e-05 9.285019e-06 7.862015e-06
## Proportion Tips Dispersion      4.869073e-01 1.643492e-01 1.391614e-01
## Cumulative Tips Dispersion      4.869073e-01 6.512565e-01 7.904178e-01
## Ancestors Dispersion            1.105956e-06 5.912352e-07 1.414269e-06
## Proportion Ancestors Dispersion 2.314141e-01 1.237121e-01 2.959265e-01
## Cumulative Ancestors Dispersion 2.314141e-01 3.551261e-01 6.510526e-01
##                                        Comp4        Comp5        Comp6
## Tips Dispersion                 6.459043e-06 2.992022e-06 1.176990e-06
## Proportion Tips Dispersion      1.143281e-01 5.296020e-02 2.083327e-02
## Cumulative Tips Dispersion      9.047459e-01 9.577061e-01 9.785394e-01
## Ancestors Dispersion            1.447127e-06 6.980246e-08 1.004032e-07
## Proportion Ancestors Dispersion 3.028017e-01 1.460570e-02 2.100870e-02
## Cumulative Ancestors Dispersion 9.538543e-01 9.684600e-01 9.894687e-01
##                                        Comp7        Comp8
## Tips Dispersion                 9.865623e-07 2.258693e-07
## Proportion Tips Dispersion      1.746261e-02 3.997992e-03
## Cumulative Tips Dispersion      9.960020e-01 1.000000e+00
## Ancestors Dispersion            4.581973e-08 4.510542e-09
## Proportion Ancestors Dispersion 9.587476e-03 9.438010e-04
## Cumulative Ancestors Dispersion 9.990562e-01 1.000000e+00
plot(phylo.tPCA, phylo = TRUE, main = "phylo PCA with transformed projection")

par(mfrow = c(1, 1))

Again, one can use the geomorph functions plotRefToTarget or picknplot.shape to visualize variation on the produced ordination plot, as described for the case of the simple PCA above.

3. PaCA: Phylogenetically-aligned PCA

This recently introduced method (Collyer & Adams, submitted) provides an ordination that aligns phenotypic data with phylogenetic signal, by maximizing variation in directions that describe phylogenetic signal, while simultaneously preserving the the Euclidean distances among observations in the data space. PaCA provides a projection that shows the most phylogenetic signal in the first few components, irrespective of other signals in the data. By comparing PCA, phyloPCA and PaCA results, one may glean the relative importance of phylogenetic and other (ecological) signals in the data.

For the Plethodon example data, this analysis would be implemented and plotted as follows:

PaCA <- gm.prcomp(Y.gpa$coords, phy = plethspecies$phy, align.to.phy = TRUE)
summary(PaCA)
## 
## Ordination type: Alignment to an alternative matrix 
## Alignment matrix: phy 
## Centering by OLS mean
## OLS residuals
## Alignment to  phy means residual projection is not orthogonal.
## Number of observations: 9 
## Number of vectors 8 
## 
## Importance of Components:
##                                 Comp1        Comp2        Comp3        Comp4
## Singular Value           0.0003931629 0.0001155198 5.091258e-05 2.660786e-05
## Proportion of Covariance 0.6595936040 0.1938028696 8.541399e-02 4.463894e-02
## Cumulative Proportion    0.6595936040 0.8533964737 9.388105e-01 9.834494e-01
## RV by Component          0.0993211764 0.0291827102 1.286158e-02 6.721702e-03
## Cumulative RV            0.0993211764 0.1285038867 1.413655e-01 1.480872e-01
##                                 Comp5        Comp6        Comp7        Comp8
## Singular Value           6.598675e-06 2.351469e-06 7.108871e-07 2.042572e-07
## Proportion of Covariance 1.107033e-02 3.944966e-03 1.192627e-03 3.426742e-04
## Cumulative Proportion    9.945197e-01 9.984647e-01 9.996573e-01 1.000000e+00
## RV by Component          1.666963e-03 5.940304e-04 1.795850e-04 5.159965e-05
## Cumulative RV            1.497541e-01 1.503482e-01 1.505278e-01 1.505794e-01
## 
## 
## Dispersion (variance) of points, after projection:
##                                        Comp1        Comp2        Comp3
## Tips Dispersion                 2.301865e-04 1.124810e-04 1.133102e-04
## Proportion Tips Dispersion      3.861747e-01 1.887049e-01 1.900960e-01
## Cumulative Tips Dispersion      3.861747e-01 5.748795e-01 7.649755e-01
## Ancestors Dispersion            4.069158e-05 2.437196e-05 8.067595e-06
## Proportion Ancestors Dispersion 5.103196e-01 3.056526e-01 1.011770e-01
## Cumulative Ancestors Dispersion 5.103196e-01 8.159722e-01 9.171492e-01
##                                        Comp4        Comp5        Comp6
## Tips Dispersion                 7.526878e-05 4.046474e-05 1.571235e-05
## Proportion Tips Dispersion      1.262754e-01 6.788606e-02 2.635999e-02
## Cumulative Tips Dispersion      8.912509e-01 9.591370e-01 9.854970e-01
## Ancestors Dispersion            6.244464e-06 3.539330e-07 3.508799e-09
## Proportion Ancestors Dispersion 7.831282e-02 4.438730e-03 4.400441e-05
## Cumulative Ancestors Dispersion 9.954620e-01 9.999008e-01 9.999448e-01
##                                        Comp7        Comp8
## Tips Dispersion                 6.396433e-06 2.248360e-06
## Proportion Tips Dispersion      1.073104e-02 3.771984e-03
## Cumulative Tips Dispersion      9.962280e-01 1.000000e+00
## Ancestors Dispersion            2.594059e-09 1.810230e-09
## Proportion Ancestors Dispersion 3.253251e-05 2.270238e-05
## Cumulative Ancestors Dispersion 9.999773e-01 1.000000e+00
plot(PaCA, phylo = TRUE, main = "PaCA")

4. Three-dimensional PCA plot with a phylogeny and time on the z-axis

Finally, substituting the corresponding option previously available through plotGMPhyloMorphoSpace, plot.gm.prcomp provides the possibility of producing a 3D plot of any two PCA axes, with the phylogenetic tree connecting the observations and time on the z-axis. Again, different plotting parameters can be controlled to manipulate plot aesthetics. Note, that in this case an rgl plotting device will open for the 3D plot, but the corresponding biplot with the phylogeny projected (option 2, above) will also be produced.

plot(PCA.w.phylo, time.plot = TRUE, pch = 22, bg = c(rep("red", 5), rep("green", 4)), cex = 2, 
     phylo.par = list(edge.color = "grey60", edge.width = 1.5, tip.txt.cex = 0.75,
                      node.labels = F, anc.states = F))