Wonky reasoning on public health
In keeping with my current fixation on bad logic from professional thinkers, I was delighted to find Robin Hanson of George Mason University on Cato Unbound running some excruciatingly poor arguments about public health spending. He’s another associate professor, like Jonathan Haidt, which makes me wonder if there is a certain pattern developing here. Hanson is a respected economist, and one would be tempted to make a stinging joke at the expense of economists, except that David M Cutler, another academic health economist, has already responded to to Hanson’s article and done an excellent job of addressing the simplistic remedy suggested by Hanson, as has another professor of economics, Dana Goldman, here. But there’s more to be said, for Hanson has indulged in an extraordinarily blatant example of seeing what he wants to see.
Before we get to that, let me just say that Hanson’s basic premise is correct: the US spends an extragavant amount of money on health, and for all that spending, health outcomes in the US are no better and in many cases significantly worse than similar Western countries that spend much less per capita. But that does not absolve him of his statistical shenanigans.
Regions that paid more to have patients stay in intensive care rooms for one more day during their last six months of life were estimated, at a 2% significance level, to make patients live roughly forty fewer days, even after controlling for: individual age, gender, and race; zipcode urbanity, education, poverty, income, disability, and marital and employment status; and hospital-area illness rates.
Now, I don’t know how to analyse this figure as the paper itself won’t be available to me until I get onto the UQ campus next week. But I can tell you this: there is something deeply fishy about that figure. Hanson has worded this the wrong way round. Intensive care specialists don’t hang onto patients just because they’re given funding. What happens is they discharge patients when they are stable enough to be looked after in normal medical wards. Except that in real life, there is always pressure on intensive care beds as new patients are admitted to hospital, and in most intensive care units there is a constant process of triaging to decide which patients can be discharged to the wards despite them not being as stable as we’d like. Now, according to what Hanson has reported, if you give extra funding to hospitals and this funding allows patients to stay in intensive care for an extra day, this leads to an average loss of forty days of life.
Now, I’m sorry I don’t have the paper at hand to say why this happens, but it has to be a statistical anomaly. And, in fact, the clue is in Hanson’s choice of data. This figure applied to those in the last six months of their lives. Clearly, this is not something that could have been known at the time that these people were in intensive care. This is a statistical blip by analysing the data in hindsight. Why, for instance, did Hanson not quote the numbers for people who had been in intensive care and then lived more than six months? If you exclude success stories from your analysis, clearly you won’t find any benefits. I can’t give more details until I see the paper, but Hanson has quoted a dubious figure without caveats.
It gets worse.
Among their many specific findings, the most significant was at the 0.1% level: people with free eyeglasses could see better. But it has long been obvious that eyeglasses help people see, and eyeglasses are basically physics, not “medicine.”
Hanson has created a category — medicine based on physics — for the specific purpose of excluding a successful and cost-effective medical intervention. If physics doesn’t count as medicine, then medical insurers will be delighted to know that they can henceforth exclude optometry from their benefits, along with cataract lens replacement, pacemakers, defibrillators, X-rays, CT scans, MRI scans, nuclear medicine and ultrasounds — the entire field of medical imaging — not to mention lithotripsy, dialysis, electrophoresis, electrophysiological studies, radioablation, and laser surgery. Assuming Hanson includes mechanics within the field of physics, insurers can also exclude casts and splints, slings, stents, shunts, joint replacements… It’s certainly one way of cutting medical costs. Redefine them as engineering costs.
It gets even worse.
At a 7% significance level they found that poor people in the top 80% of initial health ended up with a 3% lower general health index under free medicine than under full-priced medicine.
Why exactly did Hanson exclude the bottom 20% of people? The sicker people in the sample were the ones who were going to benefit from access to free medicine. If you take healthy people and spend more on their health, of course the benefits will be marginal.
Now you’ll have to excuse me a moment. I’m going to talk statistics. It won’t be hard and it will be worth it, I promise. When Hanson talks about a “7% significance level,” what he means is that there is a 7% chance that the findings are a fluke rather than a sign of an actual process taking place. To put it another way, there is a 93% chance that this finding is due to an actual effect rather than chance alone. This is not the terminology that scientists use. A scientist would say that the p value was 0.07, but it means the same thing. Now 93% confidence might sound good, but in scientific circles, findings are not considered positive unless they reach at least the 95% confidence level, and in the case of extremely controversial hypotheses and certain study designs, 99% or even 99.9% confidence is the accepted standard. This may seem harsh, but there is a very good reason for it. It you accept a 95% confidence level, that means there is a 5% chance that positive findings are due to chance. To put it another way, one in twenty positive findings at 95% confidence are due to chance. This is why, by the way, scientists are never in a rush to accept “exciting new hypotheses” that only have one or two published papers of marginal significance in their favour.
Now Hanson is aware of this. He says:
The third most significant specific finding, and strongest unexpected one, was that people with free medicine had lower blood pressure, at a 3% significance level. But a study that looks at thirty measures in total should, just by chance, find one unexpected finding that seems significant at the 3% level. So taking data mining (i.e., searching for results) into account, this blood pressure result should be set aside.
Knowing that statistical significance can be misleading, Hanson insists that a finding with 97% confidence should be set aside even though it meets the minimum scientific standard for a positive finding. But did you notice that he wanted you to accept without question a finding from the same study that carried 93% confidence, less than the minimum scientific standard and less than half as statistically impressive as the finding he wants you to reject? This is not the only example.
…spending $1,000 more overall in the last six months of life gave local patients somewhere between a gain of five days of life and a loss of twenty days of life (95% confidence interval).
Again, it sounds impressive, but when the 95% confidence interval includes both improvement and worsening, what this means scientifically is that no effect was demonstrated. And yet again Hanson is busy looking at the patients who had six months to live. Again, this could only have been known in hindsight. Essentially what Hanson is saying is that if you look at people who died, any money spent in the last few months of their life was unlikely to make a difference. Is anyone surprised? By his choice of data cutoffs, he has excluded by definition those who would have benefited most from the extra $1000.
Hanson ends by telling us that he has a tentative explanation for the oddity of America’s excessive spending on health care.
Briefly, the idea is that our ancestors showed loyalty by taking care of sick allies, and that, for such signals, how much one spends matters more than how effective is the care, and commonly-observed clues of quality matter more than private clues.
Americans have different ancestors to the rest of us, I guess. Just what the world needs now. Another academic dabbling in evolutionary psychology who doesn’t know how to test a hypothesis.
5 People have left comments on this post
You misunderstand these studies. I think you’ll find that the other health economists replying to me there at CATO unbound accept my reading of the studies – disagreement is about how to come to term with those results.
Chris, you are on a foam-flecked roll. I love it.
But as I am now married to an associate professor, I feel compelled to rise to her defense. Not *all* of them are hopeless dabblers. :-)
Robin, thank you for taking the time to respond. I didn’t expect it, especially as you must be in the middle of a pile of correspondence right now. But with all due respect, I think you misunderstand me.
I agree with your premise: the US spends a fortune on health and has very little to show for it compared to other OECD nations. David Cutler and Dana Goldman also agree. In fact, I think one would be hard-pressed to find anyone in health economics or public health who disagrees, although I’m sure there are plenty of people ready to be groomed for a Bjorn Lomborg role in health economics should there be sufficient threat to the medical industry.
But I disagree that the other economists accept your reading of the findings. Both Cutler and Goldman discuss their disagreements with your interpretation of the RAND study. These are not major disagreements, to be sure, and they are a great deal more measured in their responses than I was. But then, their interest is in health economics, whereas my interest here is the use of statistics in medicine. Actually, I do have some disagreements with the economic side of your argument, but these are not in my area of knowledge and they did not interest me as much as the statistics so I left them out entirely. It may be that the other health economists have set aside objections they had about the stats in order to address the economics.
Of course there is always room for disagreement about complex issues like the distribution of health spending. But in the use of statistics, to be frank, your article was wrong. There is no other way to put it. You excluded the best result by reclassifying it as “physics” rather than medicine; you repeatedly referred to findings in the subgroups in the the study populations that were least likely to benefit while not reporting data for the entire group; and you rejected statistically significant findings that didn’t suit your argument while enthusiastically accepting findings that were not statistically significant but did suit your argument. These expeditions go well beyond the boundary of differing opinion and extend well into the land of frank statistical error. Here Be Dragons.
This does not mean that your conclusions are wrong. It is quite possible to use faulty statistical reasoning and still arrive at a correct conclusion; this happens all the time in medical research, much to my perpetual irritation. I would not suggest that the statistical errors in your article are reason to dismiss your premise. I try to encourage medical students not to dismiss findings out of hand just because the study behind them is flawed. Every study is flawed to some degree.
But even when I agree with an opinion, I get rather agitated when I see it supported with bad statistical arguments, regardless of whether they are rhetorically useful. Statistics is a widely abused science, and as a result it is easy for people to dismiss results they don’t like with Disraeli’s “lies, damned lies and statistics.” I don’t expect to change the statistical contortionism of anti-rationalists like creationists and global warming deniers, but there is a large pool of intelligent, interested readers who deserve to be given the best possible information, and in medical research that necessarily means statistics. Cruddy statistical arguments, even in support of good causes, poison the well for all statistics.
It would be clearer to say I wanted to set aside the eyeglasses result because it was very expected ex ante; it does not give news about anything else. I only reported on the subgroups that the study itself reported on; they didn’t report on the whole group. I didn’t enthusiastically accept any non-null results.
But, Robin, you can’t exclude a positive finding on the basis that you expected it to be positive beforehand. As for the reports from the study, as I said, I won’t have access to the actual paper for another week or so, but even as reported, I think a strong disclaimer should accompany any discussion of its findings if they analyse their results in that way.
Again, I stress that none of this goes against your central thesis about relative spending. I think that the optometry finding actually supports what you are trying to argue, but not for the reasons you gave. I think we can say that the strongest evidence of benefit was for one of the cheapest and least charismatic parts of medical care: basic optometry. Following this logically, we can say that this is the sort of medical intervention for which there is good evidence of benefit from spending. Instead of excluding it from your analysis, you should be including it as the sort of medicine that should be getting more funding instead of the miracle-cure stories that get all the media attention even though ninety percent never make it to market.
Sorry, comments for this entry are closed at this time.