This is part three in a four-part series on the challenges in using study results to shape our training as runners. Part one discussed why studies are important, and part two explained the numerous issues that occur in the reporting of study results. This part demonstrates that even the supposed “experts” often get things wrong, and part four will share thoughts on how we can use critical thinking to extract whatever value is possible from studies.
While it is now clear that a lot of the flaws in what we read about study results are due to misinterpretation by the media or other “distributors” of such information, the reality is that even scientists often get things wrong or, at the least, fail to consider or reveal the full range of limitations in their research. There are several reasons for this:
- Lack of qualifications of the researcher – sometimes research is performed by graduate students as part of their degree requirements, and they may not be as fully qualified (or supervised as well) as we may be led to believe. Full rigor in designing and executing studies is not easy, and the difficulty is compounded by the time pressure to meet graduation or publication requirements.
- Conflict of interest – one must pay special attention to the source of funding that a researcher or study cites. A corporation is often looking for a specific result from a study, and a scientist my well allow a desire to please the funding source (and increase the potential for future funding) to influence results, even if just subconsciously. Every study involves some degree of judgment, and the assumptions inherent in the study can easily be influenced by outside stimuli.
While it can be difficult to identify study flaws, and this is compounded by the fact that we often see only the abstracts as full papers are hidden behind the pay walls of scientific publications, there are some key areas that we as consumers can look for in reviewing study results.
For larger studies, scientists often rely on historic data o feed their statistical tools. One challenge that is faced in such situations is at all potential “input variables” that can impact the “output variables” are not controlled. In fact, they may not even be known, as the records used to build the results may not contain key data. This can often lead to the “correlation versus causation” dilemma (i.e., if two variables move together, does one cause the other, or is it the other way around, or is there a third variable that impacts both). Methods of eliminating this issue may create problems of their own, such as controlling for potentially important factors. One example would be the recent study on the impact of “extreme” endurance training on mortality rates. The study authors controlled for weight and blood pressure in their analysis (in which they concluded that the optimal amount of running for health reasons was far lower than a typical marathoner would like to hear). Alex Hutchinson points out that is basically eliminates two of the primary health benefits of running and concludes that it’s not that healthy.
Small Sample Sizes
While proper statistical methods eliminate one issue with small samples (as statistical significance includes consideration of sample quantities), there remains the issue that small samples are more likely to suffer from the influence of uncontrolled variables that influence results. You will see some examples below.
Even with larger samples, it can be very hard to avoid bias in the selection of test subjects, largely because finding willing subjects is time-consuming and expensive. For example, using the time-honored human version of the lab rat – the desperate-for-money-or-course-credit college student – introduces age bias into any study. In many cases, one cannot assume that what is true for a 20-year-old is necessarily true for a 50-year-old. His is also blatantly obvious with any study that focuses on elite runners, which may as well be an entirely different species compared to us workaday types. In a specific example (that I’m admittedly guilty of citing, though with appropriate caveats), a study on the fat-burning effects of working out before breakfast included only young male subjects. There is no guarantee that the results would be the same for women or older individuals.
In long-term studies, one significant risk of error that can arise is that the only subjects for whom results are available are those who “survived” the study. This is especially a problem for topics where the drop-out rate would be expected to be high, like, say, endurance sports. Injuries are often the culprit here, and it may be the survival rate itself that is the more significant conclusion to draw from a study. And this issue implicitly creates the “correlation versus causation” problem. As an example (admittedly less scientific in nature), several writers try to cite a link between higher running mileage and reduced injury rate, implying those who put in the mileage build resistance. It would probably be more logical to reach the other conclusion – that those less likely to be injured are better able to sustain high mileage. It would be difficult to really design a good study on this topic, as the effort to get the subjects to high mileage would by its very nature “weed out” the injury-prone.
The placebo effect occurs when the subjects of a study who are receiving the targeted treatment (for example, taking a pill) believe that they should expect results and therefore make it more likely for such results to appear. This can be designed out of many studies – in medicine, for example, those not receiving “the treatment” might be given a similar but ineffectual treatment instead, like a “sugar pill”. But it can be hard to design this out of some studies, like one focused on “massage” (it’s hard to fake an effective massage) or training programs. The 30-20-10 workout hyped recently on Runner’s World may provide an example – the seven subjects (also a small sample size, thus more subject to the influence of other variables) had reason to expect they’d do better in their 5k, so perhaps they went out harder or were more confident than those who continued their same-old training approach.
Focus on Averages
This isn’t so much a flaw behind the design of many studies as it is a recognition that for certain subjects (especially those related to complex organisms or systems like human beings), the variability in response to stimuli can overwhelm the “average effect” that researchers focus on. This is why a cancer treatment may work for one individual but not for another with seemingly-like conditions. Or, hopefully more relevant for readers, why a certain training approach or diet yields results for your training partner but not you. A well-designed and executed study may reveal general trends, but it can be frustrating when such conclusions don’t yield the predicted results. Once again, Alex Hutchinson points out an example in a weight-loss study, and concludes that it may be more interesting to understand the drivers of variability in the results than the average result itself.
As you can see, it’s usually not intentional deception or unqualified researchers that leads to shoddy studies (though this happens more often than we’d like to believe) but simply the complex nature of humans and the tedium of statistical rigor that makes us need to critically review scientific papers before applying the results to our training.