Sunday, April 22, 2012

My daily trip to class

Phase One : The car





Phase Two : The Train





Phase Three : The Subway





There are only two subway lines in Marseille, and I've only ever had use for Line One. All the cars are 70s-style brown and orange.

Destination : Crash Test Lab


Saturday, April 21, 2012

Well I've been to one world fair, a picnic and a rodeo and that is the stupidest thing I've ever heard come over a set of ear-phones!

- Major T.J. "King" Kong

Tuesday, April 17, 2012

Two thoughts

I still often feel that when I am read something I'm getting some privileged peek into some better world, a Great Gatsby-like dinner party filled with bon mots and subtle references. It's usually by reading something that the possibility of some higher good overcoming the injustice in the world brings a smile to my face. Reading can be transcendant like that. But then the writer might stumble and snap me into realizing the terrestrial baseness of what they're selling.

As I was driving in dense rush-hour traffic today, I fixed my eyes longly on the middle-aged faces of all the lawyers and doctors and what-haveyous slowly inching past me. A question screamed out: what are you people doing?

Monday, March 12, 2012

Randomosity

I've taken at least 3 different statistics courses, and I feel like I'm only now internalizing the core concepts. The course I took a month ago was key, because the prof started from the very beginning, and was very, very thorough. He explained a few things that I'm sure I learned many times before, but are often glazed over (I guess because they're deemed too simple to dwell on). Yet if these key ideas fly out of your head for even a moment, the whole enterprise starts to seem spooky and incomprehensible. One of these ideas is what I'm going to call randomosity. How random is randomosity? Well, not at all.

Statistics is simply math applied to the real world. But the real world is incredibly complicated, so complicated that it often seems like chaos - completely random. But what seems random isn't always really random! The key is discerning the random from the non-random.

This is possible because pure chance is possible to calculate with absolute accuracy (a funny thought eh?). On the other hand, natural phenomena with non-random causes and effects have so many variables and are so complex that even our most powerful models are only approximations. Chance, on the other hand, is known. The chance of flipping heads is always the same - exactly 50%. With a die you have exactly 1 in 6 chances of rolling a given number. Not sort of. Exactly. (Sorry lotto players.)

We're comparing what would happen under random circumstances to what we have actually observed, and then seeing to what extent they differ. This is the guiding principle of many statistical tests.

This site is pretty easy to understand: http://www.socialresearchmethods.net/

Thursday, February 23, 2012

Procrustes analysis


In Greek mythology, Procrustes was a son of Poseidon with a sadistic streak. He would welcome weary travelers into his home, and after filling them with good food and cheer would invite them to sleep in his special bed. Unfortunately, if his guests didn't fit the bed exactly (and they never did!) Procrustes, in an anal retentive fury, would find a way to make them fit, either by cutting off a bit of their legs or flattening them out with a hammer. Procrustes was eventually done in by Theseus, who gave him a taste of his own medicine (see vase above).

As with many other mythological characters, Procrustes' name was adopted by nerds to denote a nerdy but essentially simple concept: if you want to analyze the shape of two similar objects, you need to first scale, rotate and translate the objects so that they are as perfectly superimposed as possible. Except instead of using hammers or axes, a few equations are used.

Now say you want to compare the shapes of a whole set of skulls, and see how they vary (based on landmarks that you define). You can compute the average shape of the lot, called the consensus, and see how the landmarks of all the other skull landmarks compare. The differences between each homologous landmark and the consensus are called Procrustes residuals. These are the starting point of the statistical analysis.

But it is useless to do a classic statistical treatment comparing all of your landmark variables. It's hard enough finding two-dimensional relationships. Imagine trying to compare the 25 (x,y,z) landmark coordinates of 100 skulls. And how will you represent the data? The answer is to use a Principal Component (PC) Analysis. This reduces your Procrustes residuals into more manageable chunks of data by finding which ensemble of landmark changes explain the largest part of the variation. Berge and Penin (2004) explain it better:

1) After superimposition, all landmarks are interdependent (they are all used for fitting). Therefore, an isolated landmark movement is hardly interpretable. Because PCs are a composite variable, shape changes are analyzed in PCs as a movement of a set of landmarks.

2) Procrustes residuals are too numerous to be used directly in statistical tests. In other words, too many variables create an excess of degrees of freedom and a decrease in the power of statistical tests. PCs are a means to reduce the number of variables by selecting those which have the greatest eigenvalues. The selection of PCs also allows us to eliminate nuisance parameters generated by superimposition (citations removed).

Thursday, February 16, 2012

Morpho J

Using my landmark data I made a simple wireframe in Morpho J, and applied a Principal Component Analysis (PCA). The idea is to reduce a lot of data into a smaller number of variables and identify which of these "components" explain most of the variation. I read that it's similar to image compression, and indeed similar math is used to create say a .JPG from a raw image file.



Principal Compontent 1 is by definition the component that is most variable, and in my case represents almost 80% of variation. As you can see, all the Homos are on the plus side and the Pongos and Pans are grouped on the negative side of the mean. Homo is closer to 0 because there were many more Homos in the sample. PC1 clearly delimits Homo from Pan and Pongo, and even shows a difference between Pongo and Pan.

PC2 on the other hand, does not distinguish Pan and Pongo from Homo, but shows a difference between Chimps and Ourangs.



Great, but what the hell are these "components" really? They show you there's a difference and quantify it, but they don't tell you what it is! That's where playing with the wireframe comes in handy.



The light blue wireframe represents the "consensus" individual - the mean. As you play with the PC1 weight, you can see how the dark blue shape changes from one extreme to the other. Here I slid it over to between the Pan and Pongo clusters.



You can also play with the PC2 weight. Notice how it makes much less of a difference.





I'm still feel pretty in the dark as to how the Principal Components are actually calculated, and thus what they really mean. I'll keep you posted when I learn more about eigenvalues.

RStudio



Wrote a log shape ratio script.