Note: This is the second in a four-part series about the misuse of scientific studies in the development of training approaches for runners. Part one discussed why studies matter, and future posts will cover flaws in study design and how you can be smarter about using such results in planning your training.
Even the best-designed studies with the best possible interpretation of results are subject to misreporting by writers. This trend is growing as those less qualified to report on studies (including bloggers such as myself!) play a notable role in spreading news about such results. While some of these situations are obvious and driven by a desire to sensationalize the news to build an audience (as can be seen in this type of post summarizing a highly publicized study regarding the risks from running long distances), others are more insidious or unintentional.
In many cases, those who report on these studies are ill qualified to actually understand them fully. Often, they read just the abstract and not the full study. Additionally, they need to distill the technical details into language that grabs and holds readers attention, and one way to do so is to be brief. Thus, critical details are often misinterpreted, misstated, or ignored. Even worse is when a study is cited as a supporting point for an article but is not the central theme – often the result is mentioned in passing as evidence without any real review or interpretation added by the writer.
Here are some of the typical ways in which scientific studies are misrepresented, even when the intention of the writer is good. You can read more about the misuse of statistics by writers in this concise (if cited as biased) article on Wikipedia.
- Overstating confidence in the results – a typical scientific study is designed to prove that a null hypothesis (more on that in a second) is not true to a 95% confidence level (or, with a p-value <0.05). I don’t want to get into a full review of statistical principles here, but suffice it to say that it is not enough to simply show that the average value of two populations is different. Statistical significance (i.e., a result that is not likely to occur randomly) requires consideration of means, variability, and sample sizes. Thus, two populations may have different “mean” results, yet a small sample size or high variability in individual results may mean that, statistically speaking, you don’t have enough confidence to state that the populations are different. Sometimes, a writer citing a study may take a shortcut and simply report that the average results for one group were better than another. This can therefore be misleading.
Example: This summary of studies on the effect of music on fitness (besides generalizing single-workout studies to the broad concept of “fitness”) includes reference to a study on the impact of music on the speed and exertion of trained and untrained runners. It refers to average losses of speed with the use of music, but provides no indication whether such results were statistically significant or not. And, surprisingly, this isn’t entirely the author’s fault, as the study itself is vague and poorly written (describing t-tests for significance in the abstract but not sharing the results of the t-tests in the write-up).
- Misunderstanding the concept of the null hypothesis – a well-designed study seeks to disprove a null hypothesis, which typically states that there is no difference between two populations. For example, a null hypothesis might be that increasing training mileage has no impact on marathon results. The goal in a study is to reject (note – not disprove, as that can never be done with 100% confidence) the null hypothesis in a statistically significant manner (typically, with a 95% confidence level as mentioned above). However, not being able to reject the null hypothesis is not the same as proving the null hypothesis. In other words, if the study mentioned didn’t find a significant difference in marathon results, it doesn’t prove that such a relationship doesn’t exist. It only means that this particular study didn’t find such a relationship. This sometimes gets misstated in the press – look for such statements as “proved there is no correlation between x and y.” This is an overstatement.
Examples: Sometimes even the researchers get it wrong. In a study looking at the relationship between physical fitness and posture, the authors claim in the closing sentence of the abstract that “there is no correlation” between the two. This would be better stated as “no correlation was demonstrated.” In a more typical example of casual reporting of study results, note the sixth point in this list of fitness fads being debunked. While the thought may be correct (it’s about calories in versus calories out), it is incorrect to state, “studies have shown that there is no correlation,” as studies really can’t be designed to do that.
- Omitting limitations or assumptions – this may be the biggest issue of all in reporting on the results of experiments, as the list of limitations or assumptions in a given study is often lengthy. A good example is bias in the sampling approach (for example, a study involving only young men – in this case, the fact is mentioned but not emphasized as being potentially important). Often the method of treating the data is important but overlooked.
Example: In the risk-of-death-from-extreme-exercise study mentioned at the opening of this post, Alex Hutchinson points out that the data is controlled for the weight and blood pressure of the subjects. Thus, one benefit of a hearty amount of exercise – weight loss – is effectively eliminated from the study, yet most summaries neglect that critical fact.
- Oversimplification and overgeneralization of the results – some articles are written with citations of multiple studies to support a central theme. Such approaches are especially rife with misstatements (or overstatements) of conclusions from the studies, as if the writer simply looks at the Google summary of the abstract in deciding what studies to include. One common issue is applying results from a study focusing on a specific set of subjects (young men, heavy smokers, etc.) to the general population. And in these cases, the writers at least made some attempt at research. Even more egregious are the articles with vague references to “studies show…” or “no study has ever shown…”, where the reader has no opportunity to vet the quality of the studies mentioned.
Example: A few years ago, and article in Runner’s World promoting the idea of running less mileage to boost performance quoted a trainer (or physiologist, I can’t recall), who said “No study has ever shown an improvement in performance with increased miles for anyone but elite runners.” (or something to that effect). I remember thinking “really?” It’s not hard to find such studies – this article refers to at least two of them. So perhaps you disagree with the studies, but they do exist and should be acknowledged. Unfortunately, despite a significant amount of time spent searching for the article, I’m unable to provide a link.
- Correlation doesn’t prove causation – this fact has been stated so often that it’s become a cliché. Yet it still seems to be one of the most neglected principles in the analysis and reporting of scientific studies. The premise is this – just because two variables (say, fitness and intelligence) have a relationship, this doesn’t mean that one causes the other (for example, being fit makes you score better on tests). It could be that the direction is the opposite of what is reported (being more intelligent makes you more likely to want to stay fit). Or there could be a third variable influencing both (a higher family income increases your tendency towards fitness and higher test scores). These mistakes are most prevalent when studies employ historical data where variables are recorded but not controlled; variables that may influence the results are often omitted (and sometimes they are impossible to capture).
Examples: These situations are easy to find. A commonly cited study discusses the relationship between intelligence and cardiovascular fitness among Swedish males aged 15-18. While the study made efforts to analyze siblings to rule out environmental influences and twins to rule out genetic factors, the authors still admit that the direction of causality is difficult to conclude. However, as you can see from the discussion mentioned, this doesn’t always translate to how the study is reported.
In another example I’ve seen discussed several times, some studies have shown a correlation between higher mileage and reduced injury risk (once you overcome a peak in the curve, where injuries increase for a period of time). The implication is that running higher mileage makes you more resistant to injuries. However, it’s just as reasonable to conclude that runners who are less prone to injuries can therefore sustain higher mileage (in fact, I’d argue that this conclusion is more logical).
It is important to maintain a critical eye when reading accounts of scientific studies from mainstream or other media sources, as it is convenient to disregard basic statistical principles when using a study to try and make a point. And this is compounded when the study itself is flawed, as I’ll cover in the next post in this series.