If you have been over at my reading list today, you might have come across this post where I link to a study that seems to be saying that milk consumption is related to prostate cancer. This got me musing about how statistics can be – and often are – abused.
I was rather surprised about the premise of the study – link between milk and prostate cancer – and I looked at the abstract to find an explanation. There was none. Also, in fact what the study has found that there was a statistically significant link between advanced prostate cancer and (teenage-) milk consumption, whilst there was no statistically significant link between general prostate cancer, and milk consumption. What does this mean? Maybe not much.
All depends whether the scientists went into the study with the hypothesis “there is a link between advanced prostate cancer and early life milk consumption” or not. This sounds a bit funny, doesn’t it? How can the interpretation of the data depend on the whether or not you went into the study with the right assumption? Well, it can – sort of – and I will explain why.
Lets start with an analogy. Assume someone has placed its whole set three darts “bullseye”. Extraordinary, isnt it? Well. maybe, maybe not. It could be that he has been trying it for one week straight, and he has just succeeded. It could also be that he is one of 10,000 people who have attempted it, and he just got lucky. If someone just shows you a recording of the successful trial, there is no way you can know. If is different of course if someone invites you to see “the guy who will put 3 darts bull’s-eye”, you go there, see one try, and it is successful.
Now replace “tries at dart” with different “hypothesis”, and “3-dart’s-bulls-eye” with “statistically significant hypothesis’ under a given dataset”. Exactly the same: just throw enough hypothesis at any fixed amount of data, and some will “stick”, ie they will appear statistically significant. So when attempting data-mining with enough hypothesis, one will always find one or more that appear statistically significant. What one would need to do really is the meta-analyis, asking: “given the statistics of the dataset, and the number of hypothesis that I am testing, what number of ‘apparently statistically significant’ hypothesis will I find?”. Or – even more pertinent – “Is the fact that I have found [1,2,3,4…] ‘apparently-statistically-significant-hypothesis’ statistically significant”? Evidently, the more hypothesis’ have been thrown at the data the more you expect to “stick” and the less likely that the finding that ‘one hypothesis was statistically significant’ was statistically significant.
A last comment on the study in question: one would assume that if there was a link between prostate cancer and milk then it would be at the general level as well as at the severe level (unless there is a good reason for it to be otherwise). The detailed analysis here is a bit complicated as the hypothesis’ are not independent but roughly speaking, the fact that a statistically significant association could only be found at the severe level makes it more likely that this finding was indeed spurious,