Exercise “Science” and Data Mining – a follow-up

I have recently written a post on how data-mining studies (and there are plenty of those in exercise sciences) need to be looked at very carefully. The result holds, now more than ever, but I was overcomplicating things – it is really very easy: 

Scientific hypothesis are tested using confidence levels – a 95% confidence means that only in 5 out of 100 cases a false hypothesis would be accepted as true. So if you have one hypothesis that you are testing then your chances are 95% that this hypothesis is right. Of course, if you throw 100 hypothesis at the data, then you expect five false hypothesis’ to be erroneously accepted as true.

That’s all it is really: you do data mining, which is really just a fancy name for throwing as many hypothesis at a problem as you can, then you will forcibly get false positives. You can either make sure you weed those out, or you publish a paper (and later a correction) for each of them. What do you think the academic incentive system (also called “publish-or-perish”) is geared towards?

 

Advertisements

One thought on “Exercise “Science” and Data Mining – a follow-up

  1. Pingback: WOW – This man CORRECTLY predicted the Giants vs Patriots Superbowl Score!?! | Thor Falk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s