Proposed method for blunder management

In the past, I've taken a "binary" approach in deciding which observations are blunders and should be rejected from an orbit solution. (Or from an astrometric plate reduction or fitting magnitude parameters; the problem is a general one.) I've considered observations to be either perfectly okay or perfectly bad: if an observation is so-and-so many sigmas off from prediction, it's rejected; otherwise, it is accepted at full weight. To my knowledge, this is the standard method used by all orbit computers. But it's occurred to me that this is an excessively black-and-white viewpoint.

First, a distinction: I consider "outliers" and "blunders" as separate issues. If you have normally-distributed data that was perfectly gathered -- no mistakes such as identifying wrong objects or typing things in badly and so forth -- you'd still get one observation that was four sigmas off prediction out of every 15000 or so. Much more often than that, though, there will be blunders: observations that are flat-out Wrong and shouldn't be included in fitting an orbit.

So far, I've found three possible solutions to the blunder problem and created a fourth. Numerical Recipes has an interesting chapter on "robust estimation" methods, wherein one assumes the real error distribution consists of Gaussian errors plus blunders. This looks mathematically interesting, but (except for simple cases) computationally daunting. You have to switch from a least-squares solution to a downhill simplex or similar method. I've implemented downhill simplex in Find_Orb. It's a very interesting method, but tends to be quite slow for perturbed orbits.

The second possible solution is use of Peirce's criterion. This was pointed out to me by Dave Tholen, and is described a bit at these links:

   arXiv:1106.0564 [astro-ph]:
   Gould, B. A.,  AJ 4:81 (1855):  On Peirce's Criterion for the Rejection of
   Doubtful Observations, with tables for facilitating its application 

The source code for Peirce's criterion that I wrote has some comments that may also be helpful.

This method looked very promising at first. The problem is that it's more of an "outlier" method than a "blunder" method. That is, it's good for coming up with a more robust solution for normally or Rayleigh distributed data that has a few outlying points whose rejection would result in a better answer. It's not intended for data which is normally distributed, but with occasional blunders.

I learned of a third solution from Edwin Goffin: kurtosis rejection. Gaussian-distributed data should have a kurtosis γ2 = 0. If it's greater than that, you reject the observation with the largest weighted residual and recompute, repeating until γ2 < 0. (It seems to me that you would really want to minimize the absolute value of γ2; so after rejecting the last observation, you might find that the value of the kurtosis has dropped below zero, but the absolute value was actually less with the preceding observation included. In which case, you'd want to "backtrack" by one rejection.)

This is a really easy solution both to understand and to implement. With a little work, you could generalize it to handle Rayleigh-distributed astrometric residuals, as well as Gaussian distributed data. But you do still have a binary solution: observations are either "in" or "out", with no recognition of the fact that real-world data is sometimes, instead, "probably good" or "probably wrong".

I've come up with a fourth possible solution, outlined below. It seems to work well. It does lacks mathematical rigor, and I've had to resort to "hand-waving" arguments in places. With those warnings :

In reality, if an observation is twenty standard deviations from prediction, its likelihood of being a blunder is near-certain; you'd expect that to happen in a truly normal distribution about once or twice in the lifetime of the universe. If an observation is within a standard deviation of prediction, it's probably not a blunder. In between, there is a murky area where it's harder to say, and depends, in part, on how often blunders occur.

Suppose we assume, for each observation, a probability that it is a blunder. Maybe, after looking at the data coming from a particular observatory, we decide that 2% of its observations are simply wrong. And for the moment, assume the observational error is normally distributed. (The idea can be generalized to other distributions easily, including the Rayleigh distribution of astrometric data.)

Suppose furthermore that we get an observation that is 2.57 sigmas from prediction. 99% of the normal distribution is between -2.57 and +2.57 sigmas; the likelihood of an observation with this level of departure from the norm is only 1%, in the absence of blunders.

However, if 2% of the observations are blunders, I'd think that the probability that this is a "good" observation is .01 / (.02 + .01), or one in three. I would therefore accept it in the least-squares fit, but with only a third of its default weight.

To me, this reflects the sort of thinking I'd do anyway on seeing an observation 2.57 sigmas from prediction. Yes, such observations do happen, and you see them fairly often. But it's just at the point where you aren't sure if it's real data or a blunder; you don't know (in the binary model) if you should toss the observation or keep it. In the scheme I'm proposing, you do something in between.

If the observation was 0.67 sigmas off prediction, the the relevant area of the normal distribution would be about 50%; half of observations would, in a perfect world with perfectly Gaussian errors, fall within this range. The observation would have its weight adjusted by .50 / (.02 + .50), or a factor of about .96. In other words, I'd give this about a 96% chance of being a good observation.

At the other extreme, if the observation were off by 3.1 sigmas, 99.8% of the normal distribution would be within range; only one observation in 500 is expected to be so far out. So I'd re-weight the observation by .002/(.02 + .002), or .09. This isn't total rejection, but reflects a strong lack of faith that the observation is real. (Go out another sigma or two, and the observation gets effectively excluded very quickly.)

Note further that it could happen that you have a source that you think Never Makes Misteaks: the probability of a blunder is zero. In that case, the "re-weighting" is always by a factor of one. In other words, you go back to the usual type of least-squares fitting, in which if something is off by twenty sigmas, you act just as if it were perfectly okay.

A few final problems:

• What's the probability of a blunder? In Find_Orb, I now have this set to be the same for all observations (default is 2%). In practice, I think this will be about right. If an observer submits enough observations, we might be able to determine the blunder rate from the data. (Though we might not be able to tell that for the first part of the night, things went smoothly with a 1% blunder rate, while on the second part, things fell apart and the blunder rate was 10%.) Marco Micheli has proposed using Peirce's criterion to determine the blunder rate: if that method rejects everything outside +/-2.57 sigmas, for example, then the blunder rate is 1% (since 99% of the normal distribution lies within those limits).

Personally, I'd be a little skeptical of any attempt to analyze the blunder rate of an observatory unless it generated truly massive amounts of data and did so in a very consistent way (methods/equipment/personnel didn't change over the time span analyzed.) But I can understand the desire to do something better than just say, "I think astrometrists blunder 2% of the time."

• Note that the blunder rate isn't the same thing as the estimated uncertainty. Some observers might have gear that limits their data to be good to within two arcseconds, but they're very careful people who never make mistakes. Others might have wonderful gear that is usually reliable to 0.1 arcseconds, but they're a little sloppy and sometimes make really silly mistakes.

• At some point, I expect to have Find_Orb include uncertainties in time and magnitude as well as in RA/dec. Again, the rate at which people blunder in their measurements of time and magnitude will probably not match the rate at which they blunder in RA/dec measurement.