One part of the problem of anomaly is this. If a well-established scientific theory seems to predict something contrary to what we observe, we tend to stick to the theory, with barely a change in credence, while being dubious of the auxiliary hypotheses. What, if anything, justifies this procedure?

Here’s my setup. We have a well-established scientific theory *T* and (conjoined) auxiliary hypotheses *A*, and *T* together with *A* uncontroversially entails the denial of some piece of observational evidence *E* which we uncontroversially have (“the anomaly”). The auxiliary hypotheses will typically include claims about the experimental setup, the calibration of equipment, the lack of further causal influences, mathematical claims about the derivation of not-*E* from *T* and the above, and maybe some final catch-all thesis like the material conditional that if *T* and all the other auxiliary hypotheses obtain, then *E* does not obtain.

For simplicity I will suppose that *A* and *T* are independent, though of course that simplifying assumption is rarely true.

I suspect that often this happens: *T* is much better confirmed than *A*. For *T* tends to be a unified theoretical body that has been confirmed as a whole by a multitude of different kinds of observations, while *A* is a conjunction of a large number of claims that have been individually confirmed. Suppose, say, that *P*(*T*)=0.999 while *P*(*A*)=0.9, where all my probabilities are implicitly conditional on some background *K*. Given the observation *E*, and the fact that *T* and *A* entail its negation, we now know that the conjunction of *T* and *A* is false. But we don’t know where the falsehood lies. Here’s a quick and intuitive thought. There is a region of probability space where the conjunction of *T* and *A* is false. That area is divided into three sub-regions:

*T* is true and *A* is false

*T* is false and *A* is true

both are false.

The initial probabilities of the three regions are, respectively, 0.0999, 0.0009999 and 0.0001. We know we are in one of these three regions, and that’s all we now know. Most likely we are in the first one, and the probability that we are in that one given that we are in one of the three is around 0.99. So our credence in *T* has gone down from three nines (0.999) to two nines (0.99), but it’s still high, so we get to hold on to *T*.

Still, this answer isn’t optimistic. A move from 0.999 to 0.99 is actually an enormous decrease in confidence.

But there is a much more optimistic thought. Note that the above wasn’t a real Bayesian calculation, just a rough informal intuition. The tip-off is that I said nothing about the conditional probabilities of *E* on the relevant hypotheses, i.e., the “likelihoods”.

Now setup ensures:

*P*(*E*|*A* ∧ *T*)=0.

What can we say about the other relevant likelihoods? Well, if some auxiliary hypothesis is false, then *E* is up for grabs. So, conservatively:

*P*(*E*|∼*A* ∧ *T*)=0.5
*P*(*E*|∼*A* ∧ ∼*T*)=0.5

But here is something that I think is really, really interesting. I think that in typical cases where *T* is a well-established scientific theory and *A* ∧ *T* entails the negation of *E*, the probability *P*(*E*|*A* ∧ ∼*T*) is still low.

The reason is that all the evidence that we have gathered for *T* even better confirms the hypothesis that *T* holds to a high degree of approximation in most cases. Thus, even if *T* is false, the typical predictions of *T*, assuming they have conservative error bounds, are likely to still be true. Newtonian physics is false, but even conditionally on its being false we take individual predictions of Newtonian physics to have a high probability. Thus, conservatively:

*P*(*E*|*A* ∧ ∼*T*)=0.1

Very well, let’s put all our assumptions together, including the ones about *A* and *T* being independent and the values of *P*(*A*) and *P*(*T*). Here’s what we get:

*P*(*E*|*T*)=*P*(*E*|*A* ∧ *T*)*P*(*A*|*T*)+*P*(*E*|∼*A* ∧ *T*)*P*(∼*A*|*T*)=0.05
*P*(*E*|∼*T*)=*P*(*E*|*A* ∧ ∼*T*)*P*(*A*|∼*T*)+*P*(*E*|∼*A* ∧ ∼*T*)*P*(∼*A*|∼*T*) = 0.14.

Plugging *this* into Bayes’ theorem, we get *P*(*T*|*E*)=0.997. So our credence has crept down, but only a little: from 0.999 to 0.997. This is much more optimistic (and conservative) than the big move from 0.999 to 0.99 that the intuitive calculation predicted.

So, if I am right, at least one of the reasons why anomalies don’t do much damage to scientific theories is that when the scientific theory *T* is well-confirmed, the anomaly is not only surprising on the theory, but it is surprising on the denial of the theory—because the background includes the data that makes *T* “well-confirmed” and would make *E* surprising even if we knew that *T* was false.

Note that this argument works less well if the anomalous case is significantly different from the cases that went into the confirmation of *T*. In such a case, there might be much less reason to think *E* won’t occur if *T* is false. And that means that anomalies are more powerful as evidence against a theory the more distant they are from the situations we explored before when we were confirming *T*. This, I think, matches our intuitions: We would put almost no weight in someone finding an anomaly in the course of an undergraduate physics lab—not just because an undergraduate student is likely doing it (it could be the professor testing the equipment, though), but because this is ground well-gone over, where we expect the theory’s predictions to hold even if the theory is false. But if new observations of the center of our galaxy don’t fit our theory, that is much more compelling—in a regime so different from many of our previous observations, we might well expect that things *would* be different if our theory were false.

And this helps with the second half of the problem of anomaly: How do we keep from holding on to *T* too long in the light of contrary evidence, how do we allow anomalies to have a rightful place in undermining theories? The answer is: To undermine a theory effectively, we need anomalies that occur in situations significantly different from those that have already been explored.

Note that this post weakens, but does not destroy, the central arguments of this paper.