Thursday, February 23, 2012

Procrustes analysis

In Greek mythology, Procrustes was a son of Poseidon with a sadistic streak. He would welcome weary travelers into his home, and after filling them with good food and cheer would invite them to sleep in his special bed. Unfortunately, if his guests didn't fit the bed exactly (and they never did!) Procrustes, in an anal retentive fury, would find a way to make them fit, either by cutting off a bit of their legs or flattening them out with a hammer. Procrustes was eventually done in by Theseus, who gave him a taste of his own medicine (see vase above).

As with many other mythological characters, Procrustes' name was adopted by nerds to denote a nerdy but essentially simple concept: if you want to analyze the shape of two similar objects, you need to first scale, rotate and translate the objects so that they are as perfectly superimposed as possible. Except instead of using hammers or axes, a few equations are used.

Now say you want to compare the shapes of a whole set of skulls, and see how they vary (based on landmarks that you define). You can compute the average shape of the lot, called the consensus, and see how the landmarks of all the other skull landmarks compare. The differences between each homologous landmark and the consensus are called Procrustes residuals. These are the starting point of the statistical analysis.

But it is useless to do a classic statistical treatment comparing all of your landmark variables. It's hard enough finding two-dimensional relationships. Imagine trying to compare the 25 (x,y,z) landmark coordinates of 100 skulls. And how will you represent the data? The answer is to use a Principal Component (PC) Analysis. This reduces your Procrustes residuals into more manageable chunks of data by finding which ensemble of landmark changes explain the largest part of the variation. Berge and Penin (2004) explain it better:

1) After superimposition, all landmarks are interdependent (they are all used for fitting). Therefore, an isolated landmark movement is hardly interpretable. Because PCs are a composite variable, shape changes are analyzed in PCs as a movement of a set of landmarks.

2) Procrustes residuals are too numerous to be used directly in statistical tests. In other words, too many variables create an excess of degrees of freedom and a decrease in the power of statistical tests. PCs are a means to reduce the number of variables by selecting those which have the greatest eigenvalues. The selection of PCs also allows us to eliminate nuisance parameters generated by superimposition (citations removed).

Thursday, February 16, 2012

Morpho J

Using my landmark data I made a simple wireframe in Morpho J, and applied a Principal Component Analysis (PCA). The idea is to reduce a lot of data into a smaller number of variables and identify which of these "components" explain most of the variation. I read that it's similar to image compression, and indeed similar math is used to create say a .JPG from a raw image file.

Principal Compontent 1 is by definition the component that is most variable, and in my case represents almost 80% of variation. As you can see, all the Homos are on the plus side and the Pongos and Pans are grouped on the negative side of the mean. Homo is closer to 0 because there were many more Homos in the sample. PC1 clearly delimits Homo from Pan and Pongo, and even shows a difference between Pongo and Pan.

PC2 on the other hand, does not distinguish Pan and Pongo from Homo, but shows a difference between Chimps and Ourangs.

Great, but what the hell are these "components" really? They show you there's a difference and quantify it, but they don't tell you what it is! That's where playing with the wireframe comes in handy.

The light blue wireframe represents the "consensus" individual - the mean. As you play with the PC1 weight, you can see how the dark blue shape changes from one extreme to the other. Here I slid it over to between the Pan and Pongo clusters.

You can also play with the PC2 weight. Notice how it makes much less of a difference.

I'm still feel pretty in the dark as to how the Principal Components are actually calculated, and thus what they really mean. I'll keep you posted when I learn more about eigenvalues.


Wrote a log shape ratio script.


Everyone hates me because I lobbied for including the vertex in the analysis. Ugh, now we have to put the skulls in the Frankfort Horizontal they whined! Well, we couldn't rightly go from the glabella to the inion could we?! Bunch of flatheads.

Thursday, February 2, 2012

