In keeping with my current fixation on bad logic from professional thinkers, I was delighted to find Robin Hanson of George Mason University on Cato Unbound running some excruciatingly poor arguments about public health spending. He’s another associate professor, like Jonathan Haidt, which makes me wonder if there is a certain pattern developing here. Hanson is a respected economist, and one would be tempted to make a stinging joke at the expense of economists, except that David M Cutler, another academic health economist, has already responded to to Hanson’s article and done an excellent job of addressing the simplistic remedy suggested by Hanson, as has another professor of economics, Dana Goldman, here. But there’s more to be said, for Hanson has indulged in an extraordinarily blatant example of seeing what he wants to see.
Before we get to that, let me just say that Hanson’s basic premise is correct: the US spends an extragavant amount of money on health, and for all that spending, health outcomes in the US are no better and in many cases significantly worse than similar Western countries that spend much less per capita. But that does not absolve him of his statistical shenanigans.
Regions that paid more to have patients stay in intensive care rooms for one more day during their last six months of life were estimated, at a 2% significance level, to make patients live roughly forty fewer days, even after controlling for: individual age, gender, and race; zipcode urbanity, education, poverty, income, disability, and marital and employment status; and hospital-area illness rates.
Now, I don’t know how to analyse this figure as the paper itself won’t be available to me until I get onto the UQ campus next week. But I can tell you this: there is something deeply fishy about that figure. Hanson has worded this the wrong way round. Intensive care specialists don’t hang onto patients just because they’re given funding. What happens is they discharge patients when they are stable enough to be looked after in normal medical wards. Except that in real life, there is always pressure on intensive care beds as new patients are admitted to hospital, and in most intensive care units there is a constant process of triaging to decide which patients can be discharged to the wards despite them not being as stable as we’d like. Now, according to what Hanson has reported, if you give extra funding to hospitals and this funding allows patients to stay in intensive care for an extra day, this leads to an average loss of forty days of life.
Now, I’m sorry I don’t have the paper at hand to say why this happens, but it has to be a statistical anomaly. And, in fact, the clue is in Hanson’s choice of data. This figure applied to those in the last six months of their lives. Clearly, this is not something that could have been known at the time that these people were in intensive care. This is a statistical blip by analysing the data in hindsight. Why, for instance, did Hanson not quote the numbers for people who had been in intensive care and then lived more than six months? If you exclude success stories from your analysis, clearly you won’t find any benefits. I can’t give more details until I see the paper, but Hanson has quoted a dubious figure without caveats.
It gets worse.
Among their many specific findings, the most significant was at the 0.1% level: people with free eyeglasses could see better. But it has long been obvious that eyeglasses help people see, and eyeglasses are basically physics, not “medicine.”
Hanson has created a category — medicine based on physics — for the specific purpose of excluding a successful and cost-effective medical intervention. If physics doesn’t count as medicine, then medical insurers will be delighted to know that they can henceforth exclude optometry from their benefits, along with cataract lens replacement, pacemakers, defibrillators, X-rays, CT scans, MRI scans, nuclear medicine and ultrasounds — the entire field of medical imaging — not to mention lithotripsy, dialysis, electrophoresis, electrophysiological studies, radioablation, and laser surgery. Assuming Hanson includes mechanics within the field of physics, insurers can also exclude casts and splints, slings, stents, shunts, joint replacements… It’s certainly one way of cutting medical costs. Redefine them as engineering costs.
It gets even worse.
At a 7% significance level they found that poor people in the top 80% of initial health ended up with a 3% lower general health index under free medicine than under full-priced medicine.
Why exactly did Hanson exclude the bottom 20% of people? The sicker people in the sample were the ones who were going to benefit from access to free medicine. If you take healthy people and spend more on their health, of course the benefits will be marginal.
Now you’ll have to excuse me a moment. I’m going to talk statistics. It won’t be hard and it will be worth it, I promise. When Hanson talks about a “7% significance level,” what he means is that there is a 7% chance that the findings are a fluke rather than a sign of an actual process taking place. To put it another way, there is a 93% chance that this finding is due to an actual effect rather than chance alone. This is not the terminology that scientists use. A scientist would say that the p value was 0.07, but it means the same thing. Now 93% confidence might sound good, but in scientific circles, findings are not considered positive unless they reach at least the 95% confidence level, and in the case of extremely controversial hypotheses and certain study designs, 99% or even 99.9% confidence is the accepted standard. This may seem harsh, but there is a very good reason for it. It you accept a 95% confidence level, that means there is a 5% chance that positive findings are due to chance. To put it another way, one in twenty positive findings at 95% confidence are due to chance. This is why, by the way, scientists are never in a rush to accept “exciting new hypotheses” that only have one or two published papers of marginal significance in their favour.
Now Hanson is aware of this. He says:
The third most significant specific finding, and strongest unexpected one, was that people with free medicine had lower blood pressure, at a 3% significance level. But a study that looks at thirty measures in total should, just by chance, find one unexpected finding that seems significant at the 3% level. So taking data mining (i.e., searching for results) into account, this blood pressure result should be set aside.
Knowing that statistical significance can be misleading, Hanson insists that a finding with 97% confidence should be set aside even though it meets the minimum scientific standard for a positive finding. But did you notice that he wanted you to accept without question a finding from the same study that carried 93% confidence, less than the minimum scientific standard and less than half as statistically impressive as the finding he wants you to reject? This is not the only example.
…spending $1,000 more overall in the last six months of life gave local patients somewhere between a gain of five days of life and a loss of twenty days of life (95% confidence interval).
Again, it sounds impressive, but when the 95% confidence interval includes both improvement and worsening, what this means scientifically is that no effect was demonstrated. And yet again Hanson is busy looking at the patients who had six months to live. Again, this could only have been known in hindsight. Essentially what Hanson is saying is that if you look at people who died, any money spent in the last few months of their life was unlikely to make a difference. Is anyone surprised? By his choice of data cutoffs, he has excluded by definition those who would have benefited most from the extra $1000.
Hanson ends by telling us that he has a tentative explanation for the oddity of America’s excessive spending on health care.
Briefly, the idea is that our ancestors showed loyalty by taking care of sick allies, and that, for such signals, how much one spends matters more than how effective is the care, and commonly-observed clues of quality matter more than private clues.
Americans have different ancestors to the rest of us, I guess. Just what the world needs now. Another academic dabbling in evolutionary psychology who doesn’t know how to test a hypothesis.