I have recently written a post on how data-mining studies (and there are plenty of those in exercise sciences) need to be looked at very carefully. The result holds, now more than ever, but I was overcomplicating things – it is really very easy:
Scientific hypothesis are tested using confidence levels – a 95% confidence means that only in 5 out of 100 cases a false hypothesis would be accepted as true. So if you have one hypothesis that you are testing then your chances are 95% that this hypothesis is right. Of course, if you throw 100 hypothesis at the data, then you expect five false hypothesis’ to be erroneously accepted as true.
That’s all it is really: you do data mining, which is really just a fancy name for throwing as many hypothesis at a problem as you can, then you will forcibly get false positives. You can either make sure you weed those out, or you publish a paper (and later a correction) for each of them. What do you think the academic incentive system (also called “publish-or-perish”) is geared towards?