Sep 17, 2017

Caught between (ground) truth and the (LS) fit

Since my last blog post, I've been trying to apply the method of least squares (LS) to the problem of aligning the local frames of 2 different phones running an ARKit based UI (in fact, a Unity game).  Along the way, I had to understand the differences between various nonlinear LS methods: the simplest of them all is the Gauss-Newton method, which requires just a Jacobian (partial differential matrix of the observable as a function of the state elements), but can't deal with losing a rank in the Jacobian. Also, for some random values of initial state guesses, the iteration jumps around wildly, which is a bit unsettling.  Originally, I tried to symbolically write out the Jacobian terms, but because the geometry involves 5 different frames, taking the derivative of so many Quaternion laden terms fills one entire page--for just 1 term.  So I settled on a numerical method instead.  Given the considerable difficulty with even the single partial differential, I wasn't going to try my luck with double partial differential necessary for the Hessian--which is necessary for Newton's method.

But what really sucked about my problem is the lack of observability--because the state has both translation and rotation which are indistinguishable/weakly distinguishable, the iterations can "wander" in the infinite space of solutions and not converge.  Upon further reading, I learned about the difference between the maximum sparse solution of LS (Matlab's "\" operator) and the minimum norm solution (using the pseudo inverse solution).  And to deal with a weak condition number case, I used Tikhonov regularization.  I have yet to come across any discussion of observability in least squares context, so I don't know whether to be pleased or disappointed at the result I got
The solution of 30 iterations of (Tikhonov) regularized LS, with the 6 DOF between the 2 device local frame as the states, and the other phone's blob location on the camera image as the observable.  (I wrote about detecting a remote phone's torch using OpenCV last year).  On the top, the blue circles are my phone's pose within its own local frame, while the red "x" and the dotted lines are my wife's phone pose at the observation instant.  The black "x" and attached lines are my wife's phone's pose in MY camera's frame.  Since the 2 phones are merely rotating, the black "x" should be a fixed Z distance: about 2 m, which is clearly not what the LS solution settled on.
I think I should be disappointed because this solution settled on a wrong distance between the 2 devices: the ground truth is approximately 2 m, but as you can see above, the LS settled on around 6 m.  But it's not the LS that I should be disappointed in: it's my fault for not thinking through the observability problem up front.  What can I say?  I learn by failing.  But I do believe the smart researchers can better warn students about these practical problems.  Maybe they just don't have the energy to write about these practical considerations after painstakingly deriving all those equations.  That's why I appreciate Topics in Astrodynamics (I was originally motivated to study least squares when I learned about the differential correction method (AKA batch filter), which is well explained in Chapter 15 of Topics in Astrodynamics) even more, because in that book, there is at least a mention about the need for better model when the residual is unacceptably large.