Jun 17, 2017

Understanding the UKF SLAM in Matlab

In my previous post, I worked through a 2-D EKF SLAM (Simultaneous Localization and Mapping) homework assignment for the online course by Prof. Cyrill Stachniss.  In this blog entry, I get to solve the same problem using the UKF (Unscented Kalman Filter).  Being the same non-linear problem, the state formation and the equations of forward process evolution and observation are the same as in the EKF SLAM case.  It reading the exposition below, it might be helpful to pull up my previous post next to this post.

Prediction step

In EKF, I simply fed the current state estimate 𝛍-[k] (and the external input u[k]) to the nonlinear function g(u[k]; 𝛍+[k-1]) to predict the next state:
𝛍-[k] = g(u[k]; 𝛍+[k-1])
But in UKF, I have to go through the UT (Unscented Transform): generate an ensemble (called sigma points 𝛘+[k-1] in section 3.4.1 of the Probabilistic Robotics book) of 𝛍+[k-1], and feed all those points to g().  Then 𝛍-[k] is the mean and the covariance of the transformed ensemble are the  new 𝛍-[k] and 𝚺𝛍𝛍-[k].

Sigma points 𝛘

The ensemble 𝛘+[k-1] should be centered around 𝛍+[k-1], with enough "spread"--based on 𝚺𝛍𝛍+[k-1].  𝛘+0[k-1] is the current estimate 𝛍+[k-1].  Subsequent points are some + and - direction along each of dimension of the uncertainty.  I visualize this as the pair of points roughly 1~2 standard deviation on either side of 𝛍+[k-1].  A pair of points for each of the state dimension n produces 1 + 2n sample points, as you can see in the left ellipsoid below, which is rotated around the principle (x-y) axis because of non-zero cross-covariance term in 𝚺𝛍𝛍+[k-1].
Unscented transform maps the sigma points through g()
Along with 𝛘+[k-1], we have to generate weights wm and wc to use when calculating the mean and the covariance of the transformed points, respectively.  In the book, wm and wc are the same except for wcwm0 + (1 - 𝛂2 + 𝛃).  Although the book presented only 1 sigma point generating and weighting algorithm, there are apparently many variations listed, for example, in Grewall and Andrews.  The only thing that did not understand is the extra (1 - 𝛂2 + 𝛃) term on wc0, because the weights are supposed to sum to 1 (normalized); I checked that wm sums to 1, which means that wc does not sum to 1.  Until I can tackle down the Ph.Ds in my team for a satisfying answer (don't hold you breadth), here is the Matlab code to generate the sigma points:

function [X, wm, wc] = compute_sigma_points(mu, sigma)
% Computes the 2n+1 sigma points according to the unscented transform,
% where n is the dimensionality of the mean vector mu.
% The sigma points should form the columns of sigma_points,
% i.e. sigma_points is an nx2n+1 matrix.

n = length(mu); % state dimension
global alpha kappa beta;
lambda = max(alpha^2 * (n + kappa) - n, 1);
scale = lambda + n;

[U, D, V] = svd(sigma); %[V,D] = eig(sigma);
D = sqrt(D);
D = max(D, 0.05 * eye(n)); % lower bound explained below

sigmasqr = ...sqrt(n+lambda) * Seems to be too big!
           ... sqrtm(sigma); %chol(sigma);
           U * D * V'; %V * D / V;

mureplicated = repmat(mu, 1, n);
X = [mu, mureplicated + sigmasqr, mureplicated - sigmasqr];

% Computing the weights for recovering the mean and variance.
% Note: weights are column vectors, to allow vector multiply w/ samples
wm = [lambda/scale; repmat(1/(2*scale), 2*n, 1)];
wc = wm;
wc(1) = wc(1) + 1 - alpha^2 + beta;
end

Prediction in SLAM

Unless some of the states are completely decoupled (the covariance between them is zero), one should feed the full 𝛍+[k-1] and 𝚺𝛍𝛍+[k-1] to the compute_sigma_points.  But in SLAM, the landmarks do not move (one of the fundamental assumptions), so there is no need to predict the landmark positions, so I can just constrain the prediction to the robot pose.  In this 2-D planar robot, the I can use Matlab's vector operator to transform all 𝛘+[k-1] in just a few lines of code.

[X, wm, wc] = compute_sigma_points(mu(1:3), sigma(1:3,1:3));
theta = X(3, :);
ang = theta + rot1 + rot2;
pose_sample = [X(1, :) + trans * cos(theta + rot1);
               X(2, :) + trans * sin(theta + rot1);
               cos(ang); % expand out x,y separately
               sin(ang)];

% Be careful when computing the robot's orientation (sum up the sines and
% cosines and recover the 'average' angle via atan2)
mu(1:2) = pose_sample(1:2,:) * wm;

mu(3) = atan2(pose_sample(4,:) * wm, pose_sample(3,:) * wm);

I can see why wm being normalized is important for keeping 𝛍-[k] unbiased.  Prediction of the covariance is also concise in Matlab:

dSigma = X - mu(1:3) * ones(1,length(wm)); % subtract the mean
dSigma(3,:) = wrapToPi(dSigma(3,:)); % normalize angles again

sigma(1:3,1:3) = dSigma * diag(wc) * dSigma' ...
    + diag([0.01 0.01 deg2rad(0.5)]); % robot move uncertainty

Negative feedback loop between 𝚺mm and sigma point spread

I first tried using the matrix square root or Cholesky decomposition to generate the pairs of points away from the mean, but I ran into a problem: there is a "negative" feedback between "uncertainty" (can be seen with the norm of the covariance matrix) and the sigma point spread.  Intuitively, if the sigma point spread scales linearly with the size of the hyper-ellipsoid, decreasing uncertainly leads to smaller spread in the sigma points, which (unless the nonlinear system equation is grossly non-linear or even singular) in turn leads to smaller covariance matrix norm; because 𝚺mm does not grow during the prediction step like 𝚺xx (see my earlier comment about the landmark stationarity assumption in SLAM), this leads to 𝚺mm asymptotically decaying to 0. This contrasts sharply with the uncertainty of the landmark being lower bounded by the robot pose in EKF (they are completely correlated).  I by lower bounding the diagonal entries of the SVD(𝚺𝛍𝛍), I tried to keep the landmark uncertainty disappearing to 0. I am not sure yet if this is the best way to deal with the problem, but 𝚺mm --> 0 is a numerical problem at minimum because 𝚺𝛍𝛍 becomes ill-conditioned.  Because I always think about "how am I going to do this on a 32-bit or 16-bit MCU", I am particularly alarmed about this inconvenient discovery that is usually not mentioned in the Kalman Filter fundamental books.  Even with this heuristics, the landmark position uncertainty has reduced dramatically compared to EKF.
Robot pose (magenta) and landmark position (cyan) estimate at the end of 331 time steps.  Note that the estimates have drifted less than in EKF SLAM.

Correction step

The unscented transform is used again during the correction step--this time using the a-priori mean 𝛍-[k] and covariance 𝚺𝛍𝛍-[k] from the prediction step, and using all of them instead of just the robot pose portion, as in the SLAM prediction step.  To correct the states from observation, I need 2 things: the Kalman gain K[k] and the innovations from the landmark observations.  But at a given time step k, the observation consists of old landmarks (from which innovation can be formed) and new landmarks.  So I have to form Kp[k] and an ensemble of innovations (zp[k] - hp(𝛘-[k])) using just previously observed landmarks at time step k (subscript "p" for previously known).  Note that I cannot reuse the ensemble generated during prediction because the predicted state 𝛍-[k] is different than 𝛍+[k-1].  Forming the innovations from zp[k] and 𝛘-[k] is relatively straight-forward, but Kp[k] involves more steps.  For clarity, let the ensemble of expected observations be
Zp[k] = hp(𝛘-[k])
zpavg[k] = Zp[k] wm
Then loosely speaking, Kp[k] is the cross-correlation of 𝛘-[k] with Zp[k], divided by the covariance of Z[k]:
Sp[k] = (Zp[k] - zpavg[k]) diag(wc) (Zp[k] - zpavg[k])T + Qp[k]
Kp[k] = (𝛘-[k] - 𝛍-[k]) diag(wc) (Zp[k] - zpavg[k])T / Sp[k]
The uncertainty of measurement Qp[k] works to de-weight the innovation for the correction, just like in EKF.  The state and covariance correction are then:
𝛍+[k] = 𝛍-[k] + Kp[k] ( z[k] - zpavg[k] )
𝚺𝛍𝛍+[k] = 𝚺𝛍𝛍-[k] - Kp[k] Sp[k] KpT[k]
Here is the Matlab implementation of the simple idea given above:

lId = z(:,1);
oz = z(ismember(lId, map), :);
if ~isempty(oz) % Have known landmark => can generate innovation
    [X, wm, wc] = compute_sigma_points(mu, sigma);
    nX = size(X,2); % number of sigma points

    % Reorder obs according to map, which is also the order in mu (state)
    [~,lIdx] = ismember(oz(:,1), map);

    nz = size(oz, 1);

    % line 7 on slide 32: Z = h(X)
    % Generate distribution of expected observation from sigma_point of
    % robot poses AND landmark positions. 
    Dx = X(2*lIdx+2,:) - ones(nz,1) * X(1,:); % nZ x nX
    Dy = X(2*lIdx+3,:) - ones(nz,1) * X(2,:); % nZ x nX
    
    Range = sqrt(Dx.^2 + Dy.^2); % predicted range and bearing distribution

    Bearing = atan2(Dy, Dx) - ones(nz,1) * X(3,:);
    % To average the angle, must go through unit circle
    Cos = cos(Bearing) * wm; Sin = sin(Bearing) * wm;

    Range_avg = Range * wm;

    % Rearrange according to the mu order
    Z_avg = zeros(2*nz, 1);
    Z_avg(1:2:end) = Range_avg;
    Z_avg(2:2:end) = atan2(Sin, Cos);

    q = zeros(2*nz,1);
    q(1:2:end) = Range_avg * 0.01;
    q(2:2:end) = deg2rad(1) * ones(nz,1);
    q = q .^ 2;
    
    % UKF correction algo step 9
    dZ = zeros(2*nz, nX);
    dZ(1:2:end, :) = Range;
    dZ(2:2:end, :) = Bearing;

    % innovation covariance matrix S, line 9 on slide 32
    dZ = dZ - Z_avg * ones(1,nX); % Subtract the mean of sample
    % Normalize the angles
    dZ(2:2:end) = wrapToPi(dZ(2:2:end));

    S = dZ * diag(wc) * dZ' + diag(q);
    % Compute Sigma_x_z, the Cross covariance, line 10 on slide 32
    % (which is equivalent to sigma times the Jacobian H transposed in EKF).    
    dSigma = X - mu * ones(1, nX); % subtract the mean
    dSigma(3,:) = wrapToPi(dSigma(3,:)); % normalize angles again

    Sigma_xz = dSigma * diag(wc) * dZ';

    % Compute the Kalman gain, line 11 on slide 32
    K = Sigma_xz / S;

    % Current observation
    %lId = oz(:,1);
    range = oz(:,2); bearing = oz(:,3);

    % Update mu and sigma, line 12 + 13 on slide 32
    innoZ = zeros(2*nz,1);
    innoZ(1:2:end) = range; % Form the innovation
    innoZ(2:2:end) = bearing;
    innoZ = innoZ - Z_avg;
    innoZ(2:2:end) = wrapToPi(innoZ(2:2:end));
    
    mu = mu + K * innoZ;
    mu(3) = wrapToPi(mu(3));
    sigma = sigma - K * S * K';
end

Accepting new landmarks

Previously unobserved landmarks are no use for correction, but still have to be accepted into an expanding state.  I have not yet figured out how to calculate the cross-correlation of the newly observed landmark with existing landmark or even the robot, but here's my code used to generate the above result:

z = z(~ismember(z(:,1), map), :); % Deal with only the new landmarks
nz = size(z,1);
for i=1:nz % For newly discovered landmarks in this timestep, sample around
    nMu = length(mu);
    lId = z(i,1); range = z(i,2); bearing = z(i,3); % raw measurement
    [X, wm, wc] = compute_sigma_points([range; bearing] ...
        , diag([0.01*range deg2rad(1)] .^ 2));
    % Transform range/bearing to Cartesian
    Range = X(1,:);
    Bearing = X(2,:);
    Dx = Range .* cos(Bearing + mu(3));
    Dy = Range .* sin(Bearing + mu(3));

    dx = Dx * wm;% Expected landmark position relative to robot position
    dy = Dy * wm;% = weighted average
    mu(nMu + (1:2)) = [mu(1) + dx; mu(2) + dy];

    dSigma = [Dx - dx; Dy - dy];
    % Stick the new weighted variance at the lower right corner of sigma
    sigma(nMu+(1:2),nMu+(1:2)) = dSigma * diag(wc) * dSigma';

    map(end+1) = lId; %add landmark to the map
end



Jun 6, 2017

Understanding the EKF SLAM

In April of this year, I attended the AR (augmented reality) workshop at Stanford, and learned the current issues and direction of the AR community.  I was curious whether I can just use VI (visual-inertial) SLAM (simultaneous localization and mapping) for the AR video game I want to write, and got confirmation that this would be possible after talking to Renato F. Salas-Moreno.  Since I work with inertial sensors in my current job, VI SLAM is particularly interesting.  While the VI SLAM on a single device is amply demonstrated by the slew of cutting edge products like Google Tango and Microsoft HoloLens, the interactive AR video game I have in mind requires merging of the map produced from multiple devices in the same room.  I learned from Marc Pollefeys that HoloLens solves the problem by using the map of a "master" device.  I got the feeling that this is not a completely solved problem, but since I am new to the whole SLAM problem, I am going to work through the EKF (extended Kalman filter) example from the online SLAM course, which uses the notation in the Probabilistic Robotics book.  In the equations to follow, I am forced to use the underline as the vector, instead of the usual overbar notation, because of the poor formatting capability of Blogger editor.

State formation

In SLAM, the device's pose (position and orientation) x, and the positions (but not orientation) of various landmarks (loosely defined as things that can be identified from camera images) mi--both in the world frame are captured as the state vector 𝛍true = [xtrue; mtrue1; ...; mtrueN], or more compactly 𝛍true = [xtruemtrue].  Since I don't know where the device or the landmarks are initially my initial state estimate is 𝛍[0] = [00].  Most text will annotate this vector with a hat (^) for the estimate, but again due to the Formatting constraint in the Blogger editor, I will just omit the hat, and instead denote the (unknowable) as true, as I've done above.

Ignoring for now the management of the landmark associations (they are sometimes unobservable as the device moves around the world frame), it is clear that the initial covariance matrix 𝚺mm[0] = diag(∞, ..., ∞) because at the outset I know nothing about the landmark positions--except the landmark positions are independent of each other, and the device position, for that matter.  So 𝚺𝛍𝛍[0] = diag(0, 0, 0, ∞, ..., ∞).  Note that this initialization of the device pose covariance (setting to 0) is different than what may be found in other Kalman filter textbooks (which uses ∞) because of the different initialization method in the book (coming up shortly below).  In linear (the original) Kalman filter formulation, the state evolution and observable generation are linear stochastic processes, but in EKF, they are Gaussian nonlinear processes (that can nevertheless be linearized):

Kalman FilterExtended Kalman Filter
Process (state change) model 𝛍true = A 𝛍true + B u + r     𝛍true = g(𝛍true, u) + r
Measurement model zk[k]= H 𝛍true + qk[k] zk[k]= hk(𝛍true) + qk[k]
where  r ~ 𝓝(0, R[k]) and qk[k] ~ 𝓝(0, Qk[k]).  The subscripts k in the measurement model is necessary in SLAM because landmarks come and go at every time step: to me, it means "pertaining to all observations at time step k".

The zero initial value of the states pretty much assure that the prediction of the state for the next time step is still zero--unless the system is such that zero is not an equilibrium, or if there is an external input AKA control input.  In this example of a planar robot movement using odometer model, there is apparently a non-zero initial external input (the odometry data):
u[0] = [rot1; trans; rot2] = [0.1007    0.1001    0.0002]T.

Prediction step

When first studying Kalman filter, I was confused whether to perform prediction right after outputting the control command, or at the beginning of the next step, right before processing the fresh measurement.  Because of my background in real-time control, I perform the non-time-critical tasks such as prediction after outputting the control command (i.e. minimize time delay between sensor measurement available and when the control command is output).  But in this case, the external input is not the control command but rather another sensor reading (odometry reading), which is only available at the next time step.  That is why the "prediction" step uses the a-posteriori estimate from the last time step x+[k-1] = [x+[k-1]; y+[[k-1]; 𝛉+[k-1] ], which in this example:
𝛍-[k] = g(u[k]; 𝛍+[k-1])
For additional clarity, I adopted the "-" and "+" superscripts for the a-priori and a-posteriori estimates.

Even though g(u[k]; 𝛍+[k-1]) can be a BHF (big hairy function) in general, stationarity assumption for the landmarks (m[k] = m[k-1]) in SLAM simplifies the state update to only the device pose update:
𝛍-[k] = 𝛍+[k-1] + [trans * cos(𝛉+[k-1] + rot1) ; trans * sin(𝛉+[k-1] + rot1) ; rot1 + rot2]

Observation prediction: has to wait till landmark observation in SLAM

We will see shortly the a-posterior update (correction) driven by the sensor measurement (observation).  The prediction of the observation h(𝛍[k]) is only necessary at that time, but in the spirit of calculating everything possible outside the time critical path from sensor measurement being available to outputting control command, I want to pre-calculate h(𝛍[k]) now--except that in SLAM, I don't know a-priori which landmarks I will observe, so I have to wait till discovering all observations for the time step.

Covariance prediction

To update the covariance, I need the Jacobian matrix G-[k]:
G-[k] = ∂𝛍[k] / ∂𝛍[k-1] @ 𝛍-[k] = ∂g(u[k]; 𝛍[k-1]) / ∂𝛍[k-1] @ 𝛍-[k]
Note that the partial derivative is derived from the prediction (i.e. without the new sensor input), AT 𝛍[k-1]--the last optimal state estimate.  Once again, the landmark stationarity and independence assumption allows simplification:
G-[k] = diag(G-x[k], I2N); where G-x[k] = ∂x[k] / ∂x[k-1] @ x-[k]
In the prediction equation above, only 𝛉 affects change of x, so that only the dx/d𝛉 and dy/d𝛉 need to be considered for the Jacobian calculation:
dx/d𝛉 = -trans * sin(𝛉 + rot1); dy/d𝛉 = trans * cos(𝛉 + rot1)
G-x[k] = I3 + trans * [ -sin(𝛉-[k] + rot1) ; cos(𝛉-[k] + rot1) ; 0]
I think it is helpful to remember separation of the device pose and the landmark positions manifest in the covariance matrix structure as well:
𝚺𝛍𝛍 = [𝚺xx 𝚺xm ; 𝚺Txm 𝚺mm], since 𝚺mx = 𝚺Txm.
Covariance prediction step involves this huge matrix (dimension (3+2N)2 in this example):
𝚺-𝛍𝛍[k] = G-[k] 𝚺+𝛍𝛍[k-1] G-T[k] + R[k]
In R[k] = diag(R-x[k], 0), R-x characterizes the random noise in the device pose (imagine the device being knocked around by an accident), evaluated at 𝛍-[k]. I have not seen a noise covariance modeled with such fidelity as to be sensitive to the current state estimate, but wanted clarity for now.  Note that unless the physical process has no noise (i.e. R[k] is 0: impossible!), the state uncertainly always grows during the prediction step, as you can see in the following plot of the robot location estimate overlaid against the known landmarks.
Red: prior device pose estimate and uncertainly (a-posteriori estimate from previous time step)
Magenta: predicted device pose and uncertainty, which is larger than the red ellipse
In SLAM, the huge covariance prediction calculation above in the first term above can be broken up into 3 isolated portions (the off-diagonal terms are transpose of each other) for simplicity.
G-[k] 𝚺+𝛍𝛍[k-1] G-T[k] = G-x[k] 𝚺+xx[k-1] G-T[k] , G-x[k] 𝚺+xm[k-1] 
(G-x[k] 𝚺+xm[k-1])T , 𝚺+mm[k-1]]
In fact, the independence of the landmarks allows even more factoring of G-[k] 𝚺-𝛍𝛍[k-1] G-T[k]:
G-x[k] 𝚺+xx[k-1] G-T[k] , G-x[k] 𝚺+x1[k-1] , ... , G-x[k] 𝚺+xN[k-1]
(G-x[k] 𝚺+x1[k-1])T , 𝚺+11[k-1] , ... , 𝚺+1N[k-1]
... , ... , ... , ...
(G-x[k] 𝚺+xN[k-1])T , 𝚺+N1[k-1] , ... , 𝚺+NN[k-1]
This formulation better shows the correlation between the device pose and individual landmark, and the inter-landmark correlation; it might be a better suited to the when landmarks rapidly come in and out of view, but might be an overkill for the simple case of fixed and unchanging landmarks.

Correction step

In this example problem, the range and bearing of a subset of the known landmarks are observed at each time step.  Assuming that the association problem (which landmark is this for?) is solved somehow, we can calculate the difference between the measured observation and the predicted observation, which is called innovation.  If a given landmark was previously observed, its estimated position is already in 𝛍-[k].  But a newly discovered landmark i's state m-i[k] is initialized with yet another function:
m-i[k] = initial(𝛍-[k], zi[k])
There doesn't seem to be a lot of discussion on this initialization function, but if we can initialize, then we can use the completed 𝛍-[k] to calculate the predicted observation hp(𝛍true) and the measurement Jacobian Hp[k], which is the gradient of the observables z[k] WRT 𝛍true:
Hp[k] = ∂hp(𝛍[k]) / ∂𝛍[k] @ 𝛍-[k]
Just like the process Jacobian, there are many ways to estimate the measurement Jacobian (analytical, numerical, and their respective children), but I've only used analytical models so far.

State correction

Kp[k] is the optimal gain under the zero-mean Gaussian noise hypothesis--called Kalman gain.
Kp[k] = 𝚺-𝛍𝛍[k] HpT[k] / (Hp[k] 𝚺-𝛍𝛍[k] HpT[k] + Qp[k])
Note that the "/" operator is in fact matrix inversion.  I find Kalman "weight" to be a better reminder of its true nature, because that is how much weight is given to the innovation z[k] - hp(𝛍-[k]) to update the state estimate:
𝛍+[k] = 𝛍-[k] + Kp[k] ( z[k] - hp(𝛍-[k]) )
Note that the unobserved landmarks do not contribute to correcting the state estimate.

Covariance correction

The Kalman weight also contributes to the state covariance correction:
𝚺+pp[k] = (Ipp - Kp[k] Hp[k]) 𝚺-pp[k] = 𝚺-pp[k] - Kp[k] Hp[k] 𝚺-pp[k]
where 𝚺pp = [𝚺xx 𝚺xp ; 𝚺Txp 𝚺pp] is the combination of the device pose to device pose covariance, device pose to observed (at k) landmark positions, and the landmark position to landmark position covariance.  Note that the covariance decreases commensurate to the Kalman weight, in reverse of the always increasing covariance during the prediction step earlier.
Red: predicted device pose estimate and uncertainly (a-priori estimate from this time step)
Magenta: corrected device pose and uncertainty, which is smaller than the red ellipse
Blue: predicted landmark position estimate and uncertainty (a-priori estimate from this time step)
Cyan: corrected landmark position and uncertainty, which is smaller than the blue ellipse
Black: range and bearing measurements at that time step, drawn from the corrected device pose (maybe I should have drawn it from the predicted device pose)
This completely textbook implementation of EKF leaves much to be desired.  I had to futz around with the sensor and process uncertainties to avoid gross estimation error--to some extent, this is one of the deficiencies of KF I already heard about before.  But I was surprised by how much the solution drifts as the device moves away from the starting position, which is in the lower left corner below.

If I reduced the sensor covariance too much, the final landmark (at 9,2) doesn't even fall within the 2𝛔  ellipse.   The drift might be due to the sensor bias, and probably uncorrectable without another sensor or a loop closure (coming later in the course).

There is much, much more to Kalman filter

While reading GNSS books about a year ago, I realized that the Kalman filter that I and many other people know is just the tip of the iceberg, and I am not even talking about the UFK.  A year ago, I could not have imagined such things as the Kalman smoother, closed-loop Kalman filter, or constrained Kalman filter.  I hope to share my future discoveries on such topics with you.