Thursday, February 23, 2012

Procrustes analysis


In Greek mythology, Procrustes was a son of Poseidon with a sadistic streak. He would welcome weary travelers into his home, and after filling them with good food and cheer would invite them to sleep in his special bed. Unfortunately, if his guests didn't fit the bed exactly (and they never did!) Procrustes, in an anal retentive fury, would find a way to make them fit, either by cutting off a bit of their legs or flattening them out with a hammer. Procrustes was eventually done in by Theseus, who gave him a taste of his own medicine (see vase above).

As with many other mythological characters, Procrustes' name was adopted by nerds to denote a nerdy but essentially simple concept: if you want to analyze the shape of two similar objects, you need to first scale, rotate and translate the objects so that they are as perfectly superimposed as possible. Except instead of using hammers or axes, a few equations are used.

Now say you want to compare the shapes of a whole set of skulls, and see how they vary (based on landmarks that you define). You can compute the average shape of the lot, called the consensus, and see how the landmarks of all the other skull landmarks compare. The differences between each homologous landmark and the consensus are called Procrustes residuals. These are the starting point of the statistical analysis.

But it is useless to do a classic statistical treatment comparing all of your landmark variables. It's hard enough finding two-dimensional relationships. Imagine trying to compare the 25 (x,y,z) landmark coordinates of 100 skulls. And how will you represent the data? The answer is to use a Principal Component (PC) Analysis. This reduces your Procrustes residuals into more manageable chunks of data by finding which ensemble of landmark changes explain the largest part of the variation. Berge and Penin (2004) explain it better:

1) After superimposition, all landmarks are interdependent (they are all used for fitting). Therefore, an isolated landmark movement is hardly interpretable. Because PCs are a composite variable, shape changes are analyzed in PCs as a movement of a set of landmarks.

2) Procrustes residuals are too numerous to be used directly in statistical tests. In other words, too many variables create an excess of degrees of freedom and a decrease in the power of statistical tests. PCs are a means to reduce the number of variables by selecting those which have the greatest eigenvalues. The selection of PCs also allows us to eliminate nuisance parameters generated by superimposition (citations removed).

No comments: