Monday, 7 March 2011

In Defence of Impossible Precision

John Allen Paulos has quite a good column on innumeracy, first he asks readers to assess the following headlines:
1. After the Packers' Super Bowl victory, an exuberant Aaron Rogers Shook Hands with Everyone in the Stadium.

2. Experts Fear Total US Housing Costs (Rents plus Mortgage Payments) Will Top $2 Billion in 2011.

3. Only by Completely Eliminating Foreign Aid Can We Eliminate the Deficit.

What is wrong with them? Well, I don't expect anyone reading this not to have noticed that shaking 40,000 hands would take at least 10 hours, that $2 billion comes to about $10 per person or that Foreign Aid is such a tiny portion of the US deficit that even eliminating it entirely wouldn't make a big dent (this doesn't, of course, mean that it shouldn't be eliminated, only that if you're obsessed with the deficit, you have bigger fish to fry).

He then moves onto the following headline:
4. Number of Americans with Alzheimer's Believed to Be 5,451,213.
The supposed problem?
4. The problem here is that the number is ridiculously precise. Definitions of Alzheimer's vary and it's difficult to determine whether a single individual is suffering from it, much less whether five million plus are. Such impossible precision is common.
Well, yes, the number is ridiculously precise. No, no-one does think that we can measure the number of Americans with Alzheimer's to that degree of accuracy, but so what? If you do a survey of Americans, do some calculations, and your best estimate of the number of Americans with Alzheimer's comes out as 5,451,213, what number, exactly, does Paulos want you to report?

Assuming that you've done your sums correctly, 5,451,213 is an unbiased estimator of the number of Americans with Alzheimer's. Rounding your guess to 5.5 million does systematically worse than just reporting the estimator you got out of your calculations, so what exactly is the rationale behind it?

Yes, numbers like this should probably be reported along with some estimate of variance, and maybe it's a convention that we assume the number of signficant figures of a number to be a proxy for the size of its error bars, but it doesn't have to be that way: I look forward to a day when numbers like "5.5 million" get scoffed at by popular mathematics writers for being "overly round" or "not accurate enough".

Thursday, 3 March 2011

Why aren't all journals open access?

Here is the way the current system of academic publishing works, as far as I can tell: universities employ researchers who do original research, and produce journal papers; universities employ researchers who do peer-review, and make sure journal papers are up to standard; journals employ editors, who put the content together, and organise the referees; universities pay large amounts of money to journals in order to be allowed to read the articles.

Now, as you can see all of the money in the system comes from the universities. Universities pay the wages of the researchers and the reviewers directly, and they pay the wages of the editors indirectly (through journal subscriptions). So, here's an idea; why don't the universities club together to buy the journals, employ the editors directly, and publish all the content for free?

Note that buying the journals doesn't cost the universities (as a group) anything in the long-run, as the entire current value of the journal companies comes from the amount of money they expect to be paid in journal subscriptions by universities in the future. And there's no need for the journals to charge "submission fees", as those were all being paid by the universities in the first place: they can just come out of the communal pot.

So far as I can see, there is literally no downside to this - assuming coordination can be achieved, you have the same universities paying the same amount of money to the same people to produce the same articles, but the articles are now all available open-access. I admit that "assuming coordination can be achieved" is a fairly hefty assumption, but given the massive upsides, why isn't anyone at least suggesting this sort of approach?

There seems to be a general trend towards open-access publishing anyway, which is a Good Thing, but I don't undestand why this model isn't a strict Pareto improvement on the current system.