'Blunder management' in least-squares fitting

The page on astrometric error handling covers some related issues, and may also be of interest.

Almost all orbit determination software takes a "binary" approach in deciding which observations are blunders and should be rejected from an orbit solution. (Or from an astrometric plate reduction or fitting magnitude parameters; the problem is a general one in any sort of least-squares fit.) Observations are considered to be either perfectly okay or perfectly bad: if an observation is so-and-so many sigmas off from prediction, it's rejected; otherwise, it is accepted at full weight. To my knowledge, this is the standard method used by all other orbit computers, and was the method used by me in Find_Orb until 2012.

My objection to this "traditional" scheme is that such a rejection/acceptance model is excessively black-and-white. In reality, we have observations that fit our model well enough that we've no reason to think they're blunders; some that are many sigmas off and are clearly blunders; and some that are maybe a few sigmas off, in a region where we really aren't sure if they're "bad" or not.

First, a distinction: I consider "outliers" and "blunders" as separate issues. If you have normally-distributed data that was perfectly gathered -- no mistakes such as identifying wrong objects or typing things in badly and so forth -- you'd still get one observation that was four sigmas off prediction out of every 15000 or so. Much more often than that, though, there will be blunders: observations that are flat-out Wrong and shouldn't be included in fitting an orbit.

So far, I've found four possible solutions to the blunder problem and created (and implemented and currently use in Find_Orb) a fifth. Solutions are :

The first solution, which appears to be in wide use, is that described above: any observation with residuals greater than a certain number of sigmas, or a certain number of arcseconds, is "wrong" and is excluded. You do a least-squares fit on the remaining observations, which may change the residuals so that previously excluded observations are now included, or vice versa. Eventually, though, convergence should occur. (In practice, I've never seen it fail to settle on a stable point, but I don't have a mathematical proof to offer.)

This is quite easy to do, and I think it's what Minor Planet Center, AstDyS/NEODyS, and JPL routinely use. (And, as described above, it's what I used before 2012.)

Numerical Recipes has a second solution, in a chapter on "robust estimation" methods, wherein one assumes the real error distribution consists of Gaussian errors plus blunders. This looks mathematically interesting, but (except for simple cases) computationally daunting. You have to switch from a least-squares solution to a downhill simplex or similar method. I've implemented downhill simplex in Find_Orb. It's a very interesting method, but tends to be quite slow for perturbed orbits.

The third possible solution is use of Peirce's criterion. This was pointed out to me by Dave Tholen, and is described a bit at these links:

   arXiv:1106.0564 [astro-ph]: http://arxiv.org/abs/1106.0564
   Gould, B. A.,  AJ 4:81 (1855):  On Peirce's Criterion for the Rejection of
   Doubtful Observations, with tables for facilitating its application 

The source code for Peirce's criterion that I wrote has some comments that may also be helpful.

This method looked very promising at first. The problem is that it's more of an "outlier" method than a "blunder" method. That is, it's good for coming up with a more robust solution for normally or Rayleigh distributed data that has a few outlying points whose rejection would result in a better answer. It's not intended for data which is normally distributed, but with occasional blunders.

I learned of a fourth solution from Edwin Goffin: kurtosis rejection. Gaussian-distributed data should have a kurtosis γ2 = 0. If it's greater than that, you reject the observation with the largest weighted residual and recompute, repeating until γ2 < 0. (It seems to me that you would really want to minimize the absolute value of γ2; so after rejecting the last observation, you might find that the value of the kurtosis has dropped below zero, but the absolute value was actually less with the preceding observation included. In which case, you'd want to "backtrack" by one rejection.)

This is a really easy solution both to understand and to implement. You can click here to see an implementation of both kurtosis rejection and blunder management (the method described below). With a little work, you could generalize kurtosis rejection to handle Rayleigh-distributed astrometric residuals, as well as Gaussian. But, as with the previous methods, you still have a binary solution: observations are either "in" or "out", with no recognition of the fact that real-world data is sometimes, instead, "probably good" or "probably wrong".

I've come up with a fifth possible solution, which I'm calling blunder management. It seems to work well, better in some ways than the others listed above. It does lack mathematical rigor (as do the above methods to one degree or another) and I've had to resort to "hand-waving" arguments in places. With those warnings :

In reality, if an observation is twenty standard deviations from prediction, its likelihood of being a blunder is near-certain; you'd expect that to happen in a truly normal distribution about once or twice in the lifetime of the universe. If an observation is within a standard deviation of prediction, it's probably not a blunder. In between, there is a murky area where it's harder to say, and depends, in part, on how often blunders occur.

Suppose we have, for each observation, a probability that it is a blunder. Maybe, after looking at the data coming from a particular observatory, we decide that 2% of its observations are simply wrong. And for the moment, assume the observational error is normally distributed. (The idea can be generalized to other distributions easily, including the Rayleigh distribution of astrometric data.)

Suppose furthermore that we get an observation that is 2.57 sigmas from prediction. 99% of the normal distribution is between -2.57 and +2.57 sigmas; the likelihood of an observation with this level of departure from the norm is only 1%, in the absence of blunders.

However, if 2% of the observations are blunders, I'd think that the probability that this is a "good" observation is .01 / (.02 + .01), or one in three. I would therefore accept it in the least-squares fit, but with only a third of its default weight.

To me, this reflects the sort of thinking I'd do anyway on seeing an observation 2.57 sigmas from prediction. Yes, such observations do happen, and you see them fairly often. But it's just at the point where you aren't sure if it's real data or a blunder; you don't know (in the binary model) if you should toss the observation or keep it. In the scheme I'm proposing, you do something in between.

If the observation was 0.67 sigmas off prediction, the the relevant area of the normal distribution would be about 50%; half of observations would, in a perfect world with perfectly Gaussian errors, fall within this range. The observation would have its weight adjusted by .50 / (.02 + .50), or a factor of about .96. In other words, I'd give this about a 96% chance of being a good observation.

At the other extreme, if the observation were off by 3.1 sigmas, 99.8% of the normal distribution would be within range; only one observation in 500 is expected to be so far out. So I'd re-weight the observation by .002/(.02 + .002), or .09. This isn't total rejection, but reflects a strong lack of faith that the observation is real. (Go out another sigma or two, and the observation gets effectively excluded very quickly.)

Note further that it could happen that you have a source that you think Never Makes Misteaks: the probability of a blunder is zero. In that case, the "re-weighting" is always by a factor of one. In other words, you go back to the usual type of least-squares fitting, in which if something is off by twenty sigmas, you act just as if it were perfectly okay.

A few final problems:

• What's the probability of a blunder? In Find_Orb, I now have this set to be the same for all observations (default is 2%). In practice, I think this will be about right. If an observer submits enough observations, we might be able to determine the blunder rate from the data. (Though we might not be able to tell that for the first part of the night, things went smoothly with a 1% blunder rate, while on the second part, things fell apart and the blunder rate was 10%.) Marco Micheli has proposed using Peirce's criterion to determine the blunder rate: if that method rejects everything outside +/-2.57 sigmas, for example, then the blunder rate is 1% (since 99% of the normal distribution lies within those limits). You could use kurtosis rejection to find the blunder rate, or even blunder management itself; i.e., if you have the blunder probability set to 2% and find that this limit causes 10% of observations to be considered to be blunders, you'd adjust the assumed blunder rate iteratively until it matched the observed rate.

Personally, I'd be a little skeptical of any attempt to analyze the blunder rate of an observatory unless it generated truly massive amounts of data and did so in a very consistent way (methods/equipment/personnel didn't change over the time span analyzed.) But I can understand the desire to do something better than just say, "I think astrometrists blunder 2% of the time", and many observations come from big surveys that really are amenable to this sort of of analysis.

• Note that the blunder rate isn't the same thing as the estimated uncertainty. Some observers might have gear that limits their data to be good to within two arcseconds, but they're very careful people who never make mistakes; if they tell you the object is ten arcseconds from prediction, you should believe them, even though that's five sigmas off. Others might have wonderful gear that is usually reliable to 0.1 arcseconds, but they're a little sloppy and have a high rate of blunders, such that anything 0.3 arcseconds off is probably bad data.

• Find_Orb includes uncertainties in time and magnitude as well as in RA/dec. At present, these are not subjected to any sort of adjustments for blunders. If they were, we'd want to consider the fact that the rate at which people blunder in their measurements of time and magnitude will probably not match the rate at which they blunder in RA/dec measurement.