This is amazing – there was this man was able to predict the 21-17 accurately. Scientists say that the probability that someone gets the final score right is smaller than 1:1000, and this guys is a genius, – scientifically speaking with a “confidence of 99.9%”. In order to understand how amazing this is it is important to know that most medical and nutritional studies are done at a confidence level of typically 95% or 99% (rare). This may not sound like a big difference, but it in fact is: the important number is the residual – 0.1%, 1%, 5% – so the finding that the man is a genius is 10-50x “stronger” than that of the average medical or nutritional study.
Alright, I have a confession to make, and you probably already knew this: This man is not necessarily a genius – thousands if not millions of people are forecasting the results of the Superbowl, and some are bound to get it right. Quite a few actually. They are not geniuses, they are just lucky (or rather, they might be geniuses, but looking at it after the fact does not help).
Why am I writing about this? I thought this would provide a good explanation for data mining studies: throw enough false (!) hypothesis’ (eg, Hypothesis N: Mr/Ms X(N) is a football genius, N=1…1,000,000) at a given dataset (the result of the actual game) then some of them will forcibly accepted as true (say, Mr 1,234, Ms 2,347, Ms 2,778, …). Everyone understands that this is bogus when it comes to Superbowl predictions (with the possible exception of those who made those correct predictions), but somehow many data mining studies somehow still seem to make it into the journals and their results are taken as gospel until they can be finally disproved.
And of course this can take a long time. Remember spinach and how it is healthy because it contains so much iron? Or Popeye? Well it isnt – just the first person who measured it got it wrong (….those pesky decimals…) and it took forever until people had the courage to stand up to it. Or is it even more complicated? I learnt something new today!