Method of Herget: mathematical details

Some mathematical details of the method of Herget

The method of Herget is really a pretty simple one; that is the main reason I decided to use it to determine initial orbits in my Find_Orb orbit determination software. If someone asked me to stand up at a chalkboard and explain how it works, or to sit down at a computer and write a program for it, I could do so without cracking open a book or having to wave my hands when I got to the hard parts... this is not true of the methods of Laplace, Olbers, or Gauss. Also, it can make use of many observations and can account for perturbed orbits; the others can only use three and assume unperturbed motion.

The ability to handle perturbations is especially important for "odd" cases. It's the reason Find_Orb has found orbits for cases such as Cassini's flyby of Earth, some Earth-orbiting satellites, Shoemaker-Levy 9 (comet that crashed into Jupiter) and all five of the outer satellites of Uranus. (And since then, every other irregular satellite.)

The C/C++ source code I use for Find_Orb is posted on this site; click here for details. If you're a decent C programmer, this might be a reasonable way to learn about the method of Herget (and about least-squares corrections to orbits and several other basic concepts). However, the following mathematical description is probably of more interest, even to C programmers.

Almost everything described below was extracted from J M A Danby's excellent book, Fundamentals of Celestial Mechanics , published by Willmann-Bell; or puzzled out on my own (though almost certainly not truly "original".)

The method works as follows. Suppose you have a set of observations for an object, in right ascension and declination. You start out by assuming that you know the distance to the object for two of those observations... these are the R1 and R2 that Find_Orb asks for. Find_Orb also chooses the first and last observation, which seems to help the method avoid strange divergences.

Anyway, since you know the distance and RA/dec for those two points in time, you also know exactly where the object was on those two dates. Call those positions P1 = (x1, y1, z1) and P2 = (x2, y2, z2). Given that, there is one and only one orbit that could take the object from P1 to P2 in the time delta_t = (t2 - t1). Solving this problem has been the focus of an immense amount of effort over the last few centuries, and Danby supplies several ways of computing what that orbit would be. I found most of them a little confusing, so I came up with my own very simple method. (I'm sure this isn't at all original, though. I just haven't found a reference to it yet.)

(1) The first and simplest approximation is to start out with a "guessed" initial velocity V1 = (vx1, vy1, vz1),

   vx1 = (x2 - x1) / delta_t
   vy1 = (y2 - y1) / delta_t
   vz1 = (z2 - z1) / delta_t

If there were no gravity from the Sun and planets, and the object just moved in a straight line, this velocity would be exactly correct. (As a matter of fact, Dave Tholen's TWOOBS code does essentially just that. If delta_t is small enough, say, a few days, it's a "good enough" approximation; the object doesn't fall all that far toward the sun over the intervening time. You might want to try this first, then add the following refinements later.)

(1b) (Completely optional) The next improvement over the above, "just go in a straight line" estimate is to include gravity to first order. You do that by computing the object's acceleration at the approximate midpoint of the orbit: (x1+x2)/2, (y1+y2)/2, (z1+z2)/2. If you assume this acceleration (ax, ay, az) to be close to the constant acceleration between t1 and t2, you can get a much better initial V1:

   vx1 = (x2 - x1) / delta_t - ax1 * delta_t / 2
   vy1 = (y2 - y1) / delta_t - ay1 * delta_t / 2
   vz1 = (z2 - z1) / delta_t - az1 * delta_t / 2

This step is by no means necessary. Physically, you can think of it as resembling a rifleman taking a long shot and thinking, "The bullet will drop ten centimeters due to gravity over that far a distance; I'll aim ten centimeters high in order to compensate." In this case, the rifleman is ignoring the fact that gravity will (for this special sort of "bullet") change somewhat in intensity and direction while the bullet is in the air; but the changes are small enough that this provides a good estimate, and one that is a lot better than the estimate that ignored gravity completely. This is a useful analogy; it'll crop up again later on.

If you do include this rough compensation for gravity, you'll probably be all set for delta_t of up to a few weeks. Also, as you'll see, steps 2-4 are iterative. Using a better guess of the actual trajectory may save you an iteration or two.

(1c) (Also optional) A slightly smarter rifleman might notice that the acceleration isn't going to be constant; if it's a long enough shot, you might say that it's changing in a mostly linear manner between t1 and t2. We know exactly where the object is at t1 and t2 (in the analogy, the end of the gun barrel and the target location, respectively), and therefore know the gravitational acceleration at those points exactly. As it turns out, the required adjustment in the above formula is rather simple :

   vx1 = (x2 - x1) / delta_t - (ax1 * 2 + ax2) * delta_t / 6
   vy1 = (y2 - y1) / delta_t - (ay1 * 2 + ay2) * delta_t / 6
   vz1 = (z2 - z1) / delta_t - (az1 * 2 + az2) * delta_t / 6

You'll notice that for short arcs where the acceleration at the start and end are pretty much the same, this results in pretty much the same result as the "midpoint" method. For slightly longer arcs, though, this reflects the reality that acceleration early in the arc will have more time to build up its effects than acceleration late in the arc.

(2) Integrate the orbit numerically from the time t1 to the "end time" t2. (If you were only including the sun, then you could use an analytic solution. But you need to be able to numerically integrate orbits anyway to handle perturbations. Find_Orb uses Fehlberg's modified version of the Runge-Kutta method, as described in Fundamentals of Celestial Mechanics, p. 297-300; but the algorithm you use to integrate the orbit is somewhat irrelevant, as long as it gives accurate results.)

(3) Your end result will, of course, not be (x2, y2, z2), because the actual acceleration due to gravity changes continuously between t1 and t2. (Note, though, that if t2 - t1 is only a week or two, and the object is an AU or more from the sun and not heavily perturbed, the acceleration due to gravity won't change all that much over the time in question. So you'll be really close to (x2, y2, z2).)

Anyway. Suppose you are off by an amount (delta_x, delta_y, delta_z). Then you recompute a new "guess" for the velocity,

   vx1 = vx1 - delta_x / delta_t
   vy1 = vy1 - delta_y / delta_t
   vz1 = vz1 - delta_z / delta_t

(4) You have to repeat steps (2) and (3) until those delta values become suitably small.

This method has the advantage of being simple and physically obvious. Step (3) corresponds to saying, "My first attempt was .13 AU 'high' and .08 AU to the 'left'; so my next attempt will be 'lower' by .13 AU and 'to the right' by .08 AU", again in much the way a rifleman might take one shot, then use the result to correct his aim on future shots. Also, the numerical integration can include the perturbing effects of planets; the more sophisticated methods described by Danby do not do this. I suppose you could say that this is a "physics oriented" approach, while the others are more "mathematically oriented".

The method has the disadvantage that, if more than about 1/8 of an orbit is involved, the delta_xyz values do not converge, and there is no solution! But it's entirely possible to get around this. At this point, the math starts to get slightly hairier and the connection to the real physics, somewhat less obvious. The reason is that, over longer periods of time, the relationship between changes in the initial velocity and where the object ends up is not so simple. In the above, we assumed that if we pushed the object faster, it would end up past the target point; aim above the target, and it would end up above the target; and so on. That ceases to be true for longer orbits. You could aim above and end up to the left, below and end up to the right, and so on.

Getting around this is ugly and usually unnecessary; you can work by starting out with a sub-chunk of the data covering less than about an eighth of an orbit, and get a good enough "introductory" orbit to use in a full six-parameter least-squares adjustment. Alternatively, you can note that the idea of using the method of Herget with the first and last observations as the "fixed" ones is by no means graven in stone; you can switch this to be two observations about an eighth of an orbit apart, near the middle of the arc.

Doing this becomes both ugly and necessary if you're dealing with heavily-perturbed objects or problems such as "a probe must leave Earth on date X and reach Mars on date Y, after completing half an orbit". (Most probes to Mars do exactly this.) If you don't expect to do this (and few people ever do), you can skip over the next few paragraphs. Otherwise, read on:

We'll suppose that your velocity guess, (vx1, vy1, vz1), leads you to be off by a vector 'delta' at your destination at time t2. Suppose that we adjust this velocity guess by a small amount to result in (vx1 + h, vy1, vz); this results in a new offset, delta + h * P. P is essentially a (vector) linear derivative, showing how your error at the far end of the trip varies with small changes in the x-component of the initial velocity. That is, "adjust the starting x-component of the velocity by a certain amount, and it changes your position at the far end of the orbit, time t2, by P times that amount." We're getting that derivative numerically.

Now we try again with velocity guesses (vx, vy + h, vz), resulting in an offset delta + h * Q, and (vx, vy, vz + h), resulting in an offset delta + h * R. We can now generalize by saying that, for a velocity (vx + eps_x, vy + eps_y, vz + eps_z), we'd end up at the offset (delta + eps_x * P + eps_y * Q + eps_z * R) at time t2.

Of course, what we really want to get is an offset of zero (that is, we want to hit the target location). Therefore,

offset =  (delta + eps_x * P + eps_y * Q + eps_z * R) = 0

This is a not-too-tough exercise in linear algebra. Breaking it up into components in x, y, and z, we get

delta_x + eps_x * Px + eps_y * Qx + eps_z * Rx = 0
delta_y + eps_x * Py + eps_y * Qy + eps_z * Ry = 0
delta_z + eps_x * Pz + eps_y * Qz + eps_z * Rz = 0

...that is, we've got three unknowns (the 'epsilon' values) and three linear equations, and the solution becomes easy. Going back up a paragraph or two, we now know that, given velocity (vx + eps_x, vy + eps_y, vz + eps_z) at time t1, we'll be at a zero offset at time t2 (that is, we'll hit the target location.)

One nice aspect of this is the fact that our model for computing the offsets at time t2 can include perturbations if we wish. That means that we suddenly have a way to compute the "transfer orbit" that works anywhere in the solar system. I've been able to use it to handle irregular satellites of Saturn, Uranus, Neptune and Jupiter, and some artificial satellites in complicated orbits around the Earth and Moon.

One caveat here: because the problem is not entirely linear, you'll find that if you numerically integrate with the new velocity, you'll be much closer to, but not exactly dead on, the target location. You just have to repeat the procedure until you're close enough to be happy. This usually doesn't take very many iterations.

In any case... whether you used a simple but modest-accuracy or a more complex but highly accurate method, you have now determined an orbit for the object, starting at time t1, position (x1, y1, z1), velocity (vx1, vy1, vz1), that matches your observations at times t1 and t2. In between, there are errors in RA and dec, for observations 1, 2, 3, ... n.

Let's call these errors

   delta_ra(i)  = observed_ra(i)  - computed_ra(i)
   delta_dec(i) = observed_dec(i) - computed_dec(i),  i = 1, 2,  ... n.

Observation 1 matches t1, and observation n matches time t2, where the errors are zero. So delta_ra(1) = delta_dec(1) = delta_ra(n) = delta_dec(n) = 0. But for all other i, we can assume that the observed position and the computed position will differ.

These 'delta_ra' and 'delta_dec' values are also known as 'residual errors' or 'O-C' (observed minus computed) values.

At this point, we've got a way to measure the error for each object, and we can compute a "total error" for our guess at the object's orbit:

   total_error = sum( delta_ra(i) ^ 2 + delta_dec(i) ^ 2),  i = 1 to n.

This is usually shown by Find_Orb as a "root-mean-square", or RMS, error:

   rms_error = sqrt( total_error / n)

Our goal is to minimize the total error (or, equivalently, the RMS error). Then we can say we have the orbit that gives a "best fit" to the observations.

The usual method for doing this is the method of least squares. In most cases, the method of least squares is quite horrifying for a programmer to contemplate... which is really too bad, because it is an enormously helpful mathematical tool. (I've "encapsulated" the method into a set of software tools that are used in my astrometry software, in orbit computation, and in analyzing errors in telescope mounts. All this is in lsquare.cpp in the Find_Orb source. You'll still need a good math textbook to figure out what I did.)

In this particular case, though, we only need solve for two variables, R1 and R2. That simplifies matters a little, though not as much as one would ideally like. I'll leave the full description to Danby.

Dave Tholen and Rob McNaught have each come up with two decent ways of getting R1 and R2. Dave Tholen's TWOOBS program (meaning "two observations", but pronounced as the single-syllable "twoobs") works by trying a wide range of R1 and R2 values. PCs are brisk enough that this is not necessarily a big deal. TWOOBS can immediately discard absurdly hyperbolic solutions and (slightly more risky) retrograde solutions. (It's since been replaced by KNOBS, a program that can consider "N observations" where N is greater than or equal to two.)

TWOOBS makes no judgments about the remaining orbits. You wind up with a slew of possible solutions, which does give you a clue as to the ephemeris uncertainty. Rob McNaught came up with a scheme he calls "Pangloss", because it comes up with "the best of all possible orbits". In this method, you again try a whole slew of R1 and R2 values, but consider how "reasonable" each orbit is. Minor planets do fall into distinct groups and families, so you can say (for example) "this looks like a normal kind of main-belt orbit of the sort 90% of asteroids have" vs. "practically no known object has that sort of semimajor axis... this is probably nowhere near the correct solution."

At present, Find_Orb just adjusts R1 and R2 to minimize the sum of the residuals. I hope to make it slightly Panglossian by having it minimize the sum of the residuals minus a "reasonableness factor". That factor would be zero for a really bizarre orbit, and high for a reasonable-looking orbit. If there were only two observations available (a case where, no matter what R1 and R2 are, the sum of the residuals will be zero), the result would be purely Panglossian. As more observations become available, it would become a mix of Pangloss and Herget. And with a long enough arc, it would basically be judging the orbit purely on the basis of the observations, and discarding its preconceived ideas about "reasonableness".

More generally, once you have "good" values of R1 and R2, and therefore have an orbit that describes your observed data pretty well, you can switch to the general method of least squares, the "full step" in Find_Orb. With this method, Find_Orb adjusts (x1, y1, z1) and (vx1, vy1, vz1), again trying to find the minimum value for the total_error. Because we are no longer insisting that the first and last observations have zero error (or, equivalently, because we can now adjust six values instead of only two, R1 and R2), we can usually get a lower total_error than we did before.

Downhill simplex and 'superplex' methods: In late 2008, I investigated use of the downhill simplex (DS) method, as described in here, in Numerical Recipes in C. The DS method allows one to find a minimum of a multi-dimensional function, which, when you come down to it, is what classical orbit determination is all about.

The advantages of the DS method are that you can use it even in the absence of derivative information, and that it's very tough to throw it off track. The methods of Herget and of least squares are entirely capable of diverging, going in a few steps from perfectly reasonable orbits to ones light-years away. But the DS method goes "downhill" only. Also, the DS method is very intuitive, as the above link should make clear. It didn't take very long to add this method.

The disadvantages are that it's not nearly as fast as other methods. I generally use it in situations where I have a really short arc, or where other methods generate bizarre, un-physical orbits.

For orbit determination, I've implemented DS in two ways. Find_Orb can treat the determination as a two-dimensional problem, looking for the values of R1 and R2 that lead to a "best" orbit (that is, minimize a particular function). Alternatively, it can recognize that the problem really is six-dimensional, and will minimize a function of the state vector. The latter approach is quite a bit slower, and I call that the "superplex" method. (Well, I had to have some way to distinguish the two methods!)

And here is where the fact that you don't need derivative information becomes very helpful: the "best" orbit isn't necessarily the one that simply minimizes the sum of the squares of the residuals! It can include other criteria. The function to be minimized is instead the sum of the squares of the residuals, plus a function of the orbital elements. So you can consider the fact that some orbits are much less likely than others. The methods of Herget and least-squares pass no judgment on the orbital elements; if a bizarre orbit light-years away has a lower root-mean-square error than a main-belter, they'll take the former.

In my DS implementation, I included a scattergram of the frequency of asteroid orbits in semimajor axis versus eccentricity, similar to this plot from the MPC site. You can see this in orb_func.cpp in the Find_Orb C/C++ source code, in the evaluate_initial_orbit function. Basically, common combinations of semimajor axis and eccentricity are favored (result in the function taking on a lower value), so the minimization is apt to find "reasonable" orbits.

(As currently implemented, this only helps for main-belt and Jupiter Trojan objects. The idea would be slightly useful if extended to TNOs, though these don't display the same degree of clustering as main-belters, except somewhat into Plutino orbits; and would probably be very useful with sungrazing (SOHO) comets, which are strongly clustered.)