Thursday, February 16, 2012

Morpho J

Using my landmark data I made a simple wireframe in Morpho J, and applied a Principal Component Analysis (PCA). The idea is to reduce a lot of data into a smaller number of variables and identify which of these "components" explain most of the variation. I read that it's similar to image compression, and indeed similar math is used to create say a .JPG from a raw image file.



Principal Compontent 1 is by definition the component that is most variable, and in my case represents almost 80% of variation. As you can see, all the Homos are on the plus side and the Pongos and Pans are grouped on the negative side of the mean. Homo is closer to 0 because there were many more Homos in the sample. PC1 clearly delimits Homo from Pan and Pongo, and even shows a difference between Pongo and Pan.

PC2 on the other hand, does not distinguish Pan and Pongo from Homo, but shows a difference between Chimps and Ourangs.



Great, but what the hell are these "components" really? They show you there's a difference and quantify it, but they don't tell you what it is! That's where playing with the wireframe comes in handy.



The light blue wireframe represents the "consensus" individual - the mean. As you play with the PC1 weight, you can see how the dark blue shape changes from one extreme to the other. Here I slid it over to between the Pan and Pongo clusters.



You can also play with the PC2 weight. Notice how it makes much less of a difference.





I'm still feel pretty in the dark as to how the Principal Components are actually calculated, and thus what they really mean. I'll keep you posted when I learn more about eigenvalues.

No comments: