Assorted thoughts on astrometric error handling

Find_Orb currently uses several techniques to model astrometric errors. As far as I know, it's the only software to implement "blunder management" and to handle random timing errors in a mathematically reasonable way. It has some decent algorithms for handling over-observing issues. However, some recent conversations (late June 2017), both private and in reply to a bug report I posted on the Minor Planet Mailing List, caused me to think about some ways in which I (and others) could do a better job.

I have not seen, nor did I really expect, a great improvement in actual results (accuracy of the calculations) from these changes. The earlier, somewhat naïve method of handling over-observing implemented in Find_Orb in 2012 (and implemented in simplified form in FCCT15, and in better form in VFCC17) had already made a big improvement; the new methods described here couldn't really add much. The real benefit of these new methods is that the error model is more mathematically defensible than it was, with fewer ad hoc elements.

The most important thing I could do to improve error handling in Find_Orb would probably be to implement the estimated sigmas described in VFCC17. This paper shows that the sigmas for survey astrometry vary depending on date (as the surveys changed techniques and equipment), magnitude, and rates of motion in ways that make some observations a lot more precise than others.

Some parts of my error handling scheme before September 2017 were embarrassingly ad hoc, with little or no mathematical justification. (Though in my defense, so was everybody else's; none of us had exactly covered ourselves in glory here.) Following are some thoughts on a more robust system.

Over-observing issues :

Until 2012, Find_Orb (and everyone else, as far as I'm aware) gave observations a "weight" in the orbit solution that depended solely on the estimated astrometric uncertainty. In other words, it assumed that observations really just had random Gaussian errors that were uncorrelated with each other, and that there were no such things as systematic errors in astrometric data.

In reality, a set of observations from a particular observatory made on a particular night usually are at least somewhat correlated. They probably share a common catalog error and a common timing error. To make matters worse, in 2012, I didn't have a way to "debias" astrometry to remove at least most of the catalog error. (We can now use FCCT15 to estimate and subtract most of the systematic catalog errors.)

The problem got really bad if one observatory submitted a large fraction of the observations for an object, especially if they had a noticeable systematic error. The orbit would then reflect that data; any errors made by that observatory would overwhelm the good efforts of the other observers.

To picture this, imagine three people weigh a rock. The first two make three or four accurate measurements and call it good. The third takes a hundred measurements... but doesn't notice that the scale is incorrectly zeroed, so all the measurements are systematically wrong by the same amount. If you simply averaged all hundred-plus measurements, the third person's measurements will dominate the answer and "drown out" the good data.

Previous over-observing solutions:

FCCT15 applied a rather simple solution: find the number Nobs observations made from that observatory on that night, and give those observations each a weight of 1/sqrt(Nobs). This is equivalent to saying: "All observations made on a given night are perfectly correlated. There are no random errors, only systematic ones. Once you've gotten one observation on a given night, you might as well stop; further data will be of no value."

Phrased that way, the FCCT15 solution doesn't sound very good. But it was still better than the default method of applying no re-weighting of data. The default method corresponds to the opposite extreme: "All observations on a night are completely uncorrelated. All errors are random. Take Nobs observations on a night, and you've reduced your errors by a factor of sqrt( Nobs)."

The FCCT15 method, as implemented by NEODyS/AstDyS in their .rwo format, had some additional problems. The RMS errors given in the .rwo data came already multiplied by sqrt(Nobs). In late June 2017, I realized that this led to a serious bug in Find_Orb: I was taking those RMS values, treating them as "estimated astrometric uncertainties", and multiplying them by an additional factor to correct for over-observing. In other words, the data were being adjusted twice; the more data you got, the less weight it had in the orbit! It also corrupted blunder management and outlier detection, which thought they were dealing with "real" uncertainties, not re-weighted ones. (Click here for a follow-up post with further explanation of the bug.) Fortunately, I was able to compute Nobs, divide the RMS values by sqrt(Nobs), and recover the original "astrometric uncertainties". It's a strange thing to be doing, but no actual information was lost in the process, and the resulting uncertainties are "reasonable".

I strongly disagree with the decision to "pre-multiply" RMS values by sqrt(Nobs) or any other value, both for the above reasons and because it causes trouble if more observations are added. I've worked around these issues in Find_Orb, and I don't think anyone will come to grief as a result of them. But it was, in my opinion, a really bad idea.

I gather that some orbit computers have actually gone so far as to include a maximum of four observations from a given observatory on a given night. Get more than that, and your observations are simply ignored. This does mean you get some benefit from more observations, but only to a certain point. Unfortunately, it also means that you can't use more data to beat down the random component of the error.

When I got around to addressing the issue, none of these schemes appealed to me, but I didn't have a good grasp on how to handle it. After some thought and some discussions with Marco Micheli and Alan Harris, I decided that if Nobs <= Nmax, no re-weighting would be done. If Nobs > Nmax, observations would be given weights of sqrt( Nmax/Nobs). Somewhat arbitrarily, I set Nmax=5 by default, with the ability to reset it. (Nmax=1 corresponds closely to the FCCT15 scheme.) This method was arrived at (independently, I think) in VFCC17 with Nmax fixed at 4.

Proposed new over-observing solution:

As stated above, I chose Nmax=5 without a solid theoretical basis. Vereš et. al. similarly give no mathematical rationale; as I did, they simply reasoned that four observations ought to get at least some greater weight than two, but that beyond a certain point, added observations were presumably not helping anymore.

A discussion with Dave Tholen got me to thinking about the role of Nmax. Suppose you're observing an object that's at the faint end of what you can do; there are random errors due to noise of about half an arcsecond. But you're using Gaia-DR2 and have checked your timing against GPS, and have otherwise managed to reduce systematic error to (say) the level of 50 milliarcseconds.

In such a case, you'd need a hundred observations just to bring your random error down to the same level as your systematic error. It's pretty clear that in this situation, Nmax ought to be much greater than four.

To reverse this, suppose someone got high-quality, low-noise data with random errors at the 50 milliarcsecond level. But they did so in 1992, and had only GSC-1.1 to reduce against, resulting in systematic errors of about half an arcsecond. With almost all your error being systematic, getting more than one or two observations on a night would be almost pointless, and Nmax should be set to one (as in FCCT15) or maybe two.

Let's assume that there is a systematic error sigma_s and a random error sigma_r. In that case, the total error sigma_t after getting Nobs observations would be

(1) sigma_t^2 = sigma_s^2 + sigma_r^2 / Nobs

For a single observation, the total error is simply that from random and systematic errors added in quadrature. As you get more observations, they gradually approach being equal to a single observation with only systematic error.

In the VFCC17 and Find_Orb methods, you get asymptotically similar behavior, in that as you get more observations, they gradually approach being equal to a single observation with error scaled by 1/sqrt(Nmax). To get matching behavior, one would set (these three expressions are equivalent)

(2) 1/sqrt(Nmax) = sigma_s / sqrt( sigma_s^2 + sigma_r^2)
(3) Nmax = (sigma_s^2 + sigma_r^2) / sigma_s^2 = sigma_r^2 / sigma_s^2 + 1
(4) sigma_r^2 = (Nmax - 1) sigma_s^2

My Nmax=5, therefore, basically boils down to assuming that the random errors have twice the magnitude of the systematic errors. VFCC17's value of Nmax=4 assumes random errors have sqrt(3) = 1.732 times the magnitude of systematic errors. (As stated previously, FCCT15's value of Nmax=1 assumes no random error. The naïve method of not accounting for over-observing at all corresponds to Nmax=∞ and sigma_s = 0. At least, roughly speaking... all I've done here is to ensure that the asymptotic behavior is correct relative to the Nobs=1 case.)

You'll notice that my first hypothetical instance (random errors of half an arcsecond = 500 mas, but systematic errors down at the 50-milliarcsecond level) would correspond to 500^2 = (Nmax - 1)(50^2), i.e., Nmax = 101. If the second "terrible systematic errors, but careful measurement limiting random errors" case reversed this, so that sigma_r = 50 milliarcsec and sigma_s = 500 milliarcsec, one would have Nmax=1.01.

Finally, it's convenient to note that you can take (1) and (4) to get

      sigma_t(Nobs)^2   1 + (Nmax - 1) / Nobs
(5)   --------------- = ---------------------
       sigma_t(1)^2            Nmax

This suggests a very straightforward modification of the 're-weighting' factor R(Nobs). In FCCT15, VCCT17, and (before September 2017) Find_Orb,

R(Nobs) = 1                  if Nobs <= Nmax
R(Nobs) = sqrt(Nmax/Nobs)    if Nobs >= Nmax

(Nmax=1 for FCCT15, Nmax=4 for VCCT17, Nmax=adjustable in Find_Orb, Nmax = ∞ if no reweighting is done and all errors are assumed to be random/non-systematic.) Instead, we should be using (and Find_Orb does use, as of September 2017)

(6) R(Nobs) = sqrt(Nmax / ( Nobs + Nmax - 1)) 

(this is arrived at by noting that R(Nobs)^2 is simply Nobs divided by (5).) You'll note that the "new" reweighting factor is exactly equivalent to the "old" factor for Nmax=1 or ∞, and has the same asymptotic behavior for large Nobs. However, since it more accurately reflects the idea that our errors result from a mix of systematic and random errors, and does so in a less ad hoc way, I'd say it's to be preferred.

Computing Nobs:

The assumption in all of this has been that observations made on a given "night" are perfectly correlated; those made the next or previous night are completely uncorrelated. This corresponds to taking Nobs for observation 'i' to be a sum over all 'n' observations,

Nobs = sum( j = 1...n) F(j)

   where

(a)
F(j) = 1 if observations i and j were on the same night from the same observatory
F(j) = 0 otherwise

My first effort at handling over-observing in Find_Orb, in November 2012, was a modified version of this scheme. It looked for the observation time to be within an amount 't_max', defaulting to half a day:

(b)
F(j) = 1 if |obstime(i) - obstime(j)| < t_max and both observations i and j
               were made from the same observatory
F(j) = 0 otherwise

I did this because it was simpler, seemed reasonable, and allowed me to try out different ideas as to the time scales over which observations are correlated. (a) and (b) result in similar behavior, but my (b) approach does assume that observations made at 6:00 and 18:00 local time will be just as correlated as those made between 18:00 and 6:00 the next morning. I have a hunch that FCCT15's (a) assumption that the "same night" observations, twelve hours apart, will be better correlated than the morning/following evening observations, may be a little closer to reality. In fact, a bit later, I modified my code to use the "same night" criterion.

However, I later decided that both methods are flawed. If nothing else, the assumption that F(j) is either zero or one, i.e., observations are perfectly correlated up to a certain point and then perfectly uncorrelated, seems rather silly. And observations made within minutes might have greater correlation than those made several hours apart.

With that assumpion, F(j) = 1 if the difference in time dt is nearly zero (observations taken one after another ought to have nearly perfectly correlated systematic errors). F(j) should gradually decrease with increasing |dt|, going toward zero as |dt| approaches infinity. This sounds a lot like a Gaussian. As of September 2017, I've therefore set

 (c)  F(j) = exp( -(dt / t_max)^2 / 2)  [if obs i & j were made at the same observatory]
         dt = obstime(i) - obstime(j) 

This is, I would submit, more defensible than either scheme (a) or (b). One would expect that there would be at least some night-to-night correlation. One would also expect that the correlation between observations made minutes apart should be high, and that between observations made at the beginning and end of the night ought to be not so high, and that the correlation between observations made a week or more apart ought to be pretty low. (c) reflects those assumptions.

(Note warning below that correlation may be a function of where we are in the sky, more than it is a function of how much time elapsed between observations. Taking this into account would require some sort of modification of F(j), though I've only vague ideas as to what that would be.

Note that Jim Baer has kindly pointed out that a similar method has already been in use, though not quite as simplified as what we're talking about here: an actual correlation matrix linking nearby observations is used. Jim suggests that this method is not suitable for real-time calculations. The above scheme (c) does work quite well in real time, though. Despite the shortcomings I've described, I'd argue that it is a much more defensible method than either (a) or (b), and requires only slightly more code.

Problems with proposed new over-observing solution:

• The above is reasonable mathematically. But how do we decide what the systematic and random errors might be? (Should note that observers can specify both types of errors in the new ADES format. But even if you're the observer with full details on the images and observing setup, figuring out how large either error source is can be daunting.)

If there are many observations on a given night, we can actually compute the random error from the standard deviation of the residuals. (Which would probably be the safest thing to do. If somebody got, say, six observations, it's probably better to compute the random error, rather than make assumptions that may not reflect what the sky, scope, object, and camera were doing that night.)

You could then take the values of random errors computed for objects where you had lots of data, and apply them to objects that got fewer observations, but were otherwise similar.

Figuring out the systematic catalog error is more difficult. FCCT15 computed these biases and provided a way to subtract them, resulting in theoretically "unbiased" data. But we don't know how much systematic catalog bias remains. We can set it to be effectively zero for Gaia-DR2. But I don't see how we can say much about it for other catalogs.

• We've somewhat arbitrarily decided that errors are correlated over a time scale of one night. For timing errors, there's probably some truth to that. For systematic catalog errors, the correlation is really in area of the sky, not in time. If the object is moving quickly across the sky, the catalog bias will not be especially systematic. If it's a TNO spending years in one part of the sky, t_max may be closer to "years" than to "one night". (Especially because timing errors are much less significant for a TNO; systematic catalog errors will dominate.)

• What about photometry? Presumably, photometric errors should undergo a similar analysis for random vs. systematic errors, the latter being strongly catalog-dependent. At present (to my knowledge, at least), no over-observing adjustment is done by anybody in this area: go out and get lots of bad photometry, and it can overwhelm the good stuff.

• Davide Farnocchia points out that treating the "RMS errors" in FCCT15 as "astrometric sigmas" multiplied by sqrt(Nobs) is not quite right. Those "RMS errors" were arrived at under the assumption that they would get multiplied by that factor. Use the Find_Orb/VFCC17 correction, multiplying them by a different factor, and you'll get different results. That's a bit of a head-scratcher. Either we'd have to include an additional "fudge factor" that brings the adjusted sigmas to a comparable level, or the RMS errors would always have to be arrived at in combination with a compatible over-observing scheme. (I think the "fudge factor" ought to be possible.)

VFCC17 makes some interesting points about the astrometric errors in surveys, and provides some general rules for assigning sigmas to astrometry that could be quite useful here. I'm going to take a guess, though, that their sigmas are slightly higher than they ought to be for the "better" surveys. The problem I see is that the sigmas are based on residuals, "observed minus computed", with the implicit assumption that you're trying to match the computed values. The better observers are probably getting results that are about as accurate as the computed orbits; if they disagree by 0.05", you don't know if the observer is wrong or if the orbit is wrong. The better your astrometry, the more it gets penalized by this.

Notes about precision vs. uncertainty issues

Occasionally, one will encounter data where the precision indicates a higher uncertainty than the sigma given for it does. For example, if I tell you that I've measured the weights of three people as 51 +/- 0.1 kg, 55 +/- 0.1 kg, and 60 +/- 0.1 kg, you might wonder why I'm saying the values are good to tenths of kilograms when the actual values have been rounded to the nearest kilogram. You might wonder if the values were just totally wrong.

MPC's punched-card format normally gives RAs to 0.01 second (0.15 arcsec at the celestial equator) and decs to 0.1 arcsecond. If you tell Find_Orb that an observation is actually good to 0.14 arcsecond, you'll get a warning message of this sort :

At least one observation,  made at 30 Jul 2022 14:56:21,  has an
uncertainty that is less than its precision.  This doesn't make
sense, and may indicate something's wrong with the data. See
https://www.projectpluto.com/errors.htm#precision for further
discussion.

The solution is either to say that the uncertainty is actually larger than 0.14 arcsec, or to provide extra digits to get the correct precision (the MPC format will allow for one more digit in both RA and dec, and ADES astrometry allows arbitrary precision).

Similar constraints apply to the observation time (which in the 80-column format can be to 10^-5 day = 0.864 second precision, or to a "microday" = 86.4 millisecond precision) and to the photometry (which can be given to whole magnitudes, 0.1 mag, or 0.01 mag precision). For each of those quantities, if the stated uncertainty is less than the precision, you get a warning.

In theory, Find_Orb need not be quite so strict. Rounding a quantity to the nearest x introduces an error with root-mean-square value (sqrt(3) / 6)x. So rounding a declination to, say, the nearest 0.1 arcsecond introduces an error with rms=0.29 arcsecond, roughly. This would allow the above weights to be 51 +/- 0.3 kg, etc. But even in that case, one would wonder if a blunder had been made. I think a warning is needed if the uncertainty is less than the precision.

References

Vereš, Farnocchia, Chesley, Chamberlin 2017 (VFCC17)

Farnocchia, Chesley, Chamberlin, Tholen 2015 (FCCT15)