Unhealthful News 167 - "Don't worry about it" is not sufficient advice

The health information and advice business (which some would prefer to call the health scare business) is about 90% warnings and simplistic advice and 9% calls to dismiss the former because it is overstated or otherwise flawed.  I am trying to build up the remaining 1%. 

Actually, that is probably ridiculously optimistic – I doubt that the niche that I am trying to fill is nearly 1% of the discourse.  That niche is trying to offer ways to understand the claims without just choosing to believe either the "worry" or the "don't bother about it" faction.  You might think that the "don't worry" advocates would help with that, but they usually get it wrong.

For example, this AP article was an attempt to reduce the worry about two recent health scares (HT to @garyschwitzer for the reference).  In her attempt to reassure people about the mobile phone and cancer scare, which I wrote a bit about in UN158 the reporter wrote that IARC
...said there is a possibility cellphones raise the risk of brain tumors.  "The operative word is 'possibility,'" said  [the American Cancer Society's deputy chief medical officer, Len] Lichtenfeld
Both the reporter and Lichtenfeld got that wrong.  As I explained in UN15 the rating that IARC put on mobile phones was basically "code yellow" or "3 on a scale of 5".  They label it "possible carcinogen" but that is really just an arbitrary phrase and does not have its natural language meaning.  It is based on an unspecified combination of apparent level of risk and certainty of the estimate, both of which turn out to be low in this case, mingled with no small amount of worldly politics.  But in the news story, the reporter and the ACS guy both managed to compound the confusion of that phrase by misrepresenting it as "it is a possibility", which clearly is a natural language statement that is very much not what the IARC report means.  Their reassurances that we should not worry too much about this are valid (though they tend to overshoot and suggest "no one should worry at all, or even investigate further).  But they do not seem to understand enough to offer the most useful possible observations about what IARC said.

The same article talks about the recently reported cancer risk from styrene, formaldehyde, and a few other chemicals, but offers reassurance from Linda Birnbaum head of the National Toxicology Program of the National Institute of Environmental Health Sciences, which issued the report of he risk.  Of course this is not terrible reassuring given that this is the same unit that, as I pointed out yesterday, put out its current report about carcinogens with a section about smokeless tobacco that was roughly 2.5 decades out of date.  But the message from Birnbaum and the reporter was that these warnings were based on occupational exposures (though the story does not actually use that standard term), which are much higher than consumer exposures.  Fine.  But then they go on to declare that consumer exposures therefore pose no risk.  But we obviously do not know that.  It would be fine to say that we have not detected a risk at consumer levels of exposure, but that is different.  One of the advantages of occupational studies is that they lets us look at exposures that are very common at low levels when they occur at high levels, and that allows some guess as to whether they might be causing some problem at the low levels.  If a problem is observed at the high levels, the guess is elevated to "its a possibility" (the real meaning of that term).

Saying "these chemicals are absolutely harmless" is a message that more commonly comes from pro-industry groups like ACSH, but what they wrote about today was what I wrote about two days ago, the new "study" about television watching and risk of diabetes.  (They get really annoyed when people point out they are pro-industry, but it is pretty clear from a lot of what they write that they do not read my blog anyway.)  They correctly point out that the study was of little value, but note only that the results cannot be distinguished from the effects of just being sedentary, regardless of the television.  They suggest that this is the only limitation, substantially understating the limits of the research.  I will not expand on that, since I already did (recall: snacking is up there with sitting still; the study method added no information to what we already had; etc.).

Finally (and the math phobic might want to just quit reading here), the Freakonomics blog takes on the recent suggestion that more driving is causing the increases in the obesity rate.  The author points out that the supposed evidence is that both have been trending up, basically linearly, over time.  He offers the clever counter that his age, which obviously trends up linearly over time, is just as good a predictor of obesity over time.  He goes on to explain that in general, for a variable that follows a simple time trend, almost any other time trending variable will fit it.  He goes on to note that the original authors concede that correlation does not equal causation, but argues that this is an understatement in this case.

He does not complete the explanation, however, and observe that there is not a correlation in a meaningful sense here.  That is, this is not a case of a correlation with some other explanation, but it strains the term to claim there is a correlation.  To try to explain:

You could observe that the height of the Empire State Building is a great predictor of the height of the Chrysler Building, always 62 meters taller, every time you measure them.  But it should be obvious that it makes no sense to declare that they are correlated because they are both constants – each can be described by only a single number, so the variable does not vary.  It takes a bit more thinking to see it, but it is also the case for linear trends, which you might say are a constant of sorts.

Any two series that can be described by only two values also cannot be described as correlated with each other in any meaningful way.  Or to put it another way, they will always be perfectly related by a simple function, which means they are perfectly correlated so suggesting that their correlation means anything is nonsense.  When something must be true, there is no information in discovering it is true.  As an example, consider the two series x={1,3} and y={100,114}; they are perfect correlated, in that the first always perfectly predicts the second with the simple rule "if x=1 then y=100 and if 3 then 144" or if you prefer a linear equation, "y=93+x*7".  But the same is true when the two values that represent each series are not just two observations, but a linear trend (a line can be described with two values in terms of either two points on it, or one point and the slope).

The point is that the driving-obesity result is silly at a much more fundamental level than is implied by "correlation is not causation", one that is not difficult to understand.  The "correlation is not causation" phrase is used to describe situations where there is meaningful correlation between two variables that calls for an explanation (so, for example, it is not explained by "neither changes, so of course the difference is constant"), but the explanation might be some common cause that relates them (confounding) or perhaps causation in the other direction.  But two curves that have the same basic shape will always be correlated  if you choose the right scale, so there is no correlation that needs to be explained.

Failure to recognize this is what gives a lot of useful information a bad name.  People see aggregate trend data and recognize it is not convincing but do not understand why.  So they figure that all aggregate trends are uninformative (creating the fallacy that there is an "ecological fallacy").  This removes the ability to observe, for example, that mobile phones must not be causing many brain cancers because brain cancer rates are not trending up, or perhaps that television cannot explain much about diabetes because it exploded in popularity many decades ago and stayed up, while diabetes rates have changed dramatically over that time.  The difference:  several of those variables have some distinctive patterns, rather than just being straight lines over the whole range, so the bends in the curves should have matched, but did not.