Feb 19, 2017

Recognizing remote phone's torch in Unity with OpenCV

So I need some image processing in my Unity game...

In my previous blog entry, I theorized on the feasibility of sound based pseudo range calculation between mobile devices within 5 m.  Of course, to localize a device using pseudo-range alone, you need at least 3 pseudo ranges to solve the Eucledian distance equation in 3D space even in the best case (devices time synchronized, no clock drift, etc).  Due to the limited bandwidth of smartphone speakers and mics (~audible range), it would be difficult to obtain sufficient number of pseudo ranges with pulse lasting only on the order of 10 ms (long pulse is suitable only for stationary use case).  Plus, it is impossible for a userspace application to get time synchronization fidelity required for precise range estimation.  So I wondered if I can estimate the relative angle of my phone relative to a light source--such as the LED torch available on all smartphones these days.  So if a stationary point light source is within the field of view of my camera, I have more constraint in the problem geometry shown below, where the B device is mobile and A device is stationary device. I put a torch "t" on the stationary device, and a camera "c" on the mobile device.

iOSTorch

To light up my iPhone's torch, I bought the iOSTorch plugin from the Unity Asset Store ($2).  It did not work right away on iOS 10, but it was open source, so I modified to hard code the iOS version to 10, and to turn off the flash (it was lighting up both the torch AND the flash).  The torch brightness is specifiable as a float, but iPhone internally quantizes to 3 levels; even at the "low" setting, it is quite blinding at a close distance.  To make it dimmer, I tried to cycle it off and on rapidly to get another 1 or 2 bits of resolution, but my phone overheated for some reason and then was bricked, and had to be restored through iTunes (so don't try it!).  So I decided to suck it up and deal with the saturated pixels from the overly bright torch.  With the image processing I discuss below, the result looks reasonable at distance beyond 1 m, as you can see below.
I turn on the torch in PlayerController.Start():
            switch(Application.platform) {
            case RuntimePlatform.IPhonePlayer:
                iOSTorch.Init ();
                iOSTorch.On (0.001f);
                break;
            case RuntimePlatform.OSXEditor:
            case RuntimePlatform.WindowsEditor:
                yIsUp = true;
                Debug.Log ("Using Unity Remote");
                break;
            }


In case the game is suspended, I turn off the torch in the MonoBehaviour.OnApplicationPause() callback:
        void OnApplicationPause( bool pause )
        {
            if (Application.platform == RuntimePlatform.IPhonePlayer) {
                if (pause) iOSTorch.Off ();
                else iOSTorch.On (0.001f);
            }
        }


In search of the torch photo-electrons in the image

I started my 3D game as a clone of the OpenCVforUnity's Optical Flow demo, so I already had OpenCV ready to go (it's only $95 on the Unity Asset Store).  OpenCVUnity's WebcamTextureToMatHelper transforms the RGB memory into OpenCV Mat--this is the starting point of the image processing.

using OpenCVForUnity;

void Update ()
{
...
    Mat rgbaMat = webCamTextureToMatHelper.GetMat ();
    int width = rgbaMat.width (), height = rgbaMat.height ();

Without an academic background in image processing, I approach the problem quite naively: I aim to boost the signal, and reduce noise, where the signal is the photo-electron count from the remote device's torch.  The LED torchlight seemed to be white, so wanted to kill pixels that did not have all 3 colors.  A simple element-wise product of the 3 channels seemed like a good filter for the white light.  OpenCV made this task trivial:

    List<Mat> channels = new List<Mat>();
    Core.split (rgbaMat, channels);
    rxgxbMat = channels[0].mul(
        channels[1].mul(channels[2], 1.0f/256), 1.0f/256);

Scaling the successive multiplication is necessary to avoid saturating out the result.  I guess OpenCV's Mat.mul() method internally uses wide datatype before applying scaling and the saturation, because each color channels were originally 8 bit unsigned counts.  The result seems to confirm my intuition that the torch light seems to be the brightest thing in the image.  So bright in fact, that it seems to distort the pixels around it, as you can see below.

Any image is contaminated by some noise, and one of the most effective noise reduction technique in image processing is the median filter, which replaces a pixel value with the median of ensemble consisting of itself and neighboring pixels.  Lower noise helps the morphological operations like opening produce a cleaner (more canonical) result.

Mat rxgxbMat, tempA, tempBemptyMat = new Mat(),
            erode_kernel = new Mat();
Point erode_anchor = new Point(0,0);

void Update ()
{
...
    if (tempA == null)
        tempA = new Mat (height, width, channels [0].type ());
    if (tempB == null)
        tempB = new Mat (height, width, channels [0].type ());
    Imgproc.medianBlur (rxgxbMat, tempA, 7);
    Imgproc.morphologyEx (tempA, tempB, Imgproc.MORPH_OPEN,
 erode_kernel, erode_anchor, 2);
    Core.MinMaxLocResult minmax = Core.minMaxLoc (tempB);
    Debug.Log (String.Format("max {0} @ {1}", minmax.maxVal, minmax.maxLoc));

Note that I am using iteration=2 for the opening operation.  It seemed that iteration=1 did not clean up the distortion enough, and iteration=3 seemed to yield another kind of distortion.  iteration=2 seemed to produce the cleanest looking blob.
The max value in my study at 2 PM today (a rainy day) was 116 (out of 255), but when I turned on the torch, the maximum value shot up to 253 (pretty near the saturation), and consistently tracked where the torch was shown in the camera.

Estimating the torch position

Lacking prior knowledge of the torch's orientation to my camera, I should just estimate the torch's position as the center of the brightest blob.  I just learned that any part of the image with salient information is called a "feature" in image processing.  Chapter 16 of Learning OpenCV 3 is all about 2D feature detection, including the SimpleBlobDetector.  To further simplify the image to feed to the simple blob detector, I use the fact that the torch seems to be the brightest thing in the scene when it is on.  So I subtract 90% of the found max intensity above from the morphologically opened image.

Core.subtract (tempB,
                new Scalar(Mathf.Max((float)(0.9f * minmax.maxVal), 200.0f)),
                tempA);

MatOfKeyPoint keypoints = new MatOfKeyPoint ();blobDetector.detect (tempA, keypoints);
Features2d.drawKeypoints (rgbaMat, keypoints, rgbaMat);
Debug.Log ("keypoints found " + keypoints.size ());
 

The detector is pre-created and initialized in PlayerController.Start() method.  I actually wanted to create a new detector with different thresholds in each Update(), but the OpenCV Unity API only allows initialization of the feature detector from a file, which is onerous to modify at every frame.

FeatureDetector blobDetector; 
void Start ()
{
 
...
    string blobparams_yml_filepath = Utils.getFilePath ("blobparams.yml");
    blobDetector = FeatureDetector.create (FeatureDetector.SIMPLEBLOB);
    blobDetector.read (blobparams_yml_filepath);
 

So the magic of the blob detector is in the parameters.  After playing around with various parameters, this is what I settled on for now:

%YAML:1.0
thresholdStep: 10.
minThreshold: 10.
maxThreshold: 20.
filterByColor: True
blobColor: 255 # Look for light area (rather than dark)

filterByArea: True
minArea: 100.
maxArea: 5000.

filterByCircularity: True
minCircularity: 0.75
maxCircularity: 1.5

This setting works pretty well if the camera sees a roughly circular pattern, as you can see by where the key point marker was placed.  The image is mostly dark because I subtracted 90% of the max value above.
Note that the center of the blob was correctly estimated.  But because the torch is so bright, a lot of light reflects off the back of my iPhone, which is not encased.  The reflections distort the observed image, and the circularity check rejects the blob, as you can see here.



Feb 15, 2017

Understanding pulse compression

In my previous blog post, I showed how a straight-forward implementation of the kinematic equation of motion in Unity using the MEMS accelerometer and gyro on my phone led to a unbounded position error growth due to the imperfect accelerometer bias estimation.  To better estimate the instantaneous accelerometer bias, I need a different sensor, because the accelerometer cannot estimate its own bias--this is one of the core ideas of indoor location systems, regardless of the sensor types used.  While reading about indoor location technologies, I stumbled on pseudo ranging in the ultrasonic frequencies on mobile phones like Anthony Rowe's work at Carnegie Mellon, which used pulse compression to get a finer range estimate than possible with un uncompressed pulse.  In this blog, I simulate sending and receiving compressed pulse sound. You can find my Matlab script at https://github.com/henrychoi/OpenCVUnity/tree/output_pulse/doc

Pulse compression

The most thorough explanation of pulse compression in radar application I've found so far is Radar Systems Analysis and Design Using Matlab by Baseem R. Mahafza (2000 edition).  Repeating the relevant parts of Chapter 6 (I changed some notations for my own clarity), firstly a rectangular pulse signal of duration is 𝞽p:

s(t) = Rect(t/𝞽p) = 1 for 0 <= t <= 𝞽p, else 0

A compressed pulse is a sinusoid of linearly increasing frequency (which means the phase is increasing quadratically) modulated by the Rect function above.  For an EM (electro-magnetic) wave which has displacements in the direction orthogonal to the direction of wave propagation (Z by convention)--and therefore can be polarized, the wave can be expressed conveniently with a complex exponential for the uncompressed signal in time domain su(t)

su(t) = Rect(t/𝞽p) exp{ j 2𝞹(f0 t + 𝞵/2 t2) }

Its real or imaginary part will be a sinusoid with increasing frequency, as shown below, for 𝞽= 512 / 44.1 kHz, f= 200, 𝞵 = 0.5 * B/𝞽p, where the bandwidth B = 10 kHz.
Note that the frequency spectrum of the signal is NOT symmetric about DC, because the signal is complex.  If this fictitious wave (that has polarization and travels at the speed of sound) hits 3 targets at distances 3.9 m, 4 m, and 10 m away from the speaker/receiver combo, with sonar cross-section (relative scale of how well the target back-reflects the sound wave) of 1, 1.5, and 2 respectively, the received sound wave at the receiving antenna will be a superposition of the reflection of the 3 targets.  Because the 1st and the 2nd target round trip distances are close together, the reflected waves interfere, and produce a non-constant magnitude at the receiver, as shown below.
Radar signals drop off as R^4, so the returned wave from the 3rd target is tiny.  Also note that the returned wave duration is as long as the transmitted rectangular pulse, so the ability to resolve the returned signal into a precise radial distance is challenging--and rather impossible when the target distances overlap.  But in compressed pulse processing, we cross-correlate the transmitted with the received signal.  Note conjugation of FFTsr
FFTsu = FFT(su[k]); FFTsr = FFT(sr[k])
corr[k] = FFT-1(FFTsu FFT*sr) / length(sr[k])
where su[k] and sr[k] are the sampled sequence of the transmitted and received signals.  The result is magical.
The diffuse energy of the pulse is concentrated into the correlation peak, improving  the signal strength by the compression ratio 𝞵.  And the correlation peak has a width of roughly c/B, where c is the wave propagation speed.  In this fictitious example, c/B works out to 3.3 cm, which is much smaller than the radial distance separation of targets 1 and 2, so the 2 targets appear distinct.  No wonder compressed pulse is widely used for radar and sonar.  But I am not trying to reinvent sonar, but rather estimate the distance of my phone from another device using the mic and speakers available on all modern smartphones.

1 way sonar pulse compression

If I constrain myself to using the existing HW on a smartphone, the sound wave bandwidth should be limited to roughly B <= 20 kHz.  Since the sound wave is a pressure oscillation in the direction of the wave propagation, the sound wave equation is real, which I write as a cosine.

su(t) = A_u Rect(t/𝞽p) cos{ 2𝞹(f0 t + 𝞵/2 t2) }

where A_u is the sound amplitude.  Because the pulse duration cannot be too long for a fast dynamics of mobile game player, I limit the pulse duration to < 50 ms.  This time and bandwidth limit I put on myself is a problem, as I explain below.  The transmitted signal in time and frequency domain is given below.  Note that the frequency spectrum is now symmetric since the signal is real only.
In an indoor gaming environment, this sound has to compete with ambient sound, such as people talking, TV noise, and game music itself.  I found that I had to keep the transmitted pulse fairly strong against the ambient, to detect pulses coming from roughly 10 m away.  Even at 10x the ambient sound level, it was difficult to detect a putative source 10 m away (but closer targets are no problem), as you can see in the cross correlation magnitude below:

Conclusion: borderline practical

For distances < 5 m, the xcorr SNR seems strong enough for a fairly robust distance measurement even with the signal volume turned all the way down to the ambient sound level (therefore borderline unnoticeable), as you can see below.

5 m sounds too small a work volume to be useful for a dynamic peer-to-peer shooting game.  But it is plenty large for a more stationary scenario like playing with Lego blocks.  I am going to try other external means of estimating attitude and position first.

Feb 6, 2017

IMU based 6DOF estimation in Unity

In my previous blog posts I initialized tip and tilt attitude estimate with only the Input.acceleration Unity offers, and then integrated Input.gyro.rotationRateUnbiased to get the current attitude.  The 2nd post showed the attitude bias observable after only a few seconds of game play.  In this post, I show that position update using just the MEMS accel and gyro is an even bigger problem.

Specific force, gravity, and acceleration

Core idea of strap-down IMU (where the IMU sensor moves and rotates with the body for which the estimate is sought) based position update is to double-integrate the acceleration sensed.  Paul Groves explains the necessary equations in his GNSS 2nd ed. Chapter 5, but I had a tough time understanding how to take out the gravity sensed by the accelerometer.  The technical term he uses for what the accelerometer is sensing is specific force, first explained in Section 2.4.7, as the difference between the true acceleration (what is sought) and the gravitational force the body is subject (which CANNOT be sensed by IMU):
f = a - 𝞬
where f is the specific acceleration vector, a is the acceleration vector, and 𝞬 (gamma) is the gravitational acceleration vector.

I could not understand this explanation because of few observations I made with Unity's Input.acceleration while waving my phone about:
  1. When stationary, the sensed acceleration is downward.  For example, [0; 0; -1] when the phone is lying screen-up on my desk. Does it mean that the 𝞬 is UP ([0; 0; 1])?
  2. He gave an example of the specific force felt by a passenger in an elevator accelerating up is greater than 1.  But since > 0 in this example, f should be smaller than the stationary case!
The crystallization moment came while reading a book on the construction of MEMS accelerometer, which can be modeled as a proof mass constrained by a pair of springs, like this:
Whether open-loop or more likely closed-loop (with force compensation electronics) design, MEMS accelerometers are inertial devices (hence the name IMU): the proof mass doesn't (at first) move while the case moves; hence the accelerometer senses the displacement (or force--in the case of the closed loop design) in the opposite direction of the case's movement.  The stationary case can be understood by imagining the bottom spring being compressed due to the gravity pulling down the proof mass: 1 unit of the bottom spring being compressed is what is being reported as the "-1" acceleration by Input.acceleration!  If I put the above case in an upward accelerating accelerometer, the bottom spring will compress even more, and I will get an even more negative value from the accelerometer.

Based on this observation, I concluded that the accelerometer is sensing reaction to the force on the body, rather than the force directly.  I confirmed my hypothesis by printing the Input.acceleration values while accelerating my phone in the x/y/z directions, and finally the fundamental force equation made sense:
allb = G [ qlb × accelbl𝞵 + 𝞬llb ]
where qlb is the average attitude of the local frame relative to the body rather than the other around and accelbl𝞵 is the average inertial acceleration (which is equal and opposite of the specific force the body is subjected to) during the most recent sample period: Once I rotate the specific force from the body frame to the local frame, I can then add the gravitational acceleration estimate to it, to get the estimated acceleration of the body in the local frame--scaled by the Earth's gravity constant.

Above equation for the body's acceleration is incomplete: even though the local tangent frame is stationary relative to the ECEF (Earth centered Earth fixed) frame, since ECEF rotates relative to the ECI (Earth centered inertial) frame, the body experiences:
  • Centrifugal acceleration (greatest at the Equator, in the +Z direction, but zero at the Poles).  This is about 0.3% of the gravitational acceleration, of 3 mg.  Anywhere else except at the Equator and the Poles, the centrifugal force has horizontal components in the local frame.  But breaking down that horizontal vector into x and y requires an estimate of the local frame's True North heading as well as the current latitude.
  • Coriolis acceleration proportional to the velocity of the body in the direction toward Earth's axis of rotation.  This also requires the heading and latitude.
Since I do NOT want to care about the player's location, I decided to ignore the above 2 pseudo forces, feeling justified because:

  • 3 mg is on the order of accelerometer bias confidence interval of off-the-shelf MEMS accelerometers--which means that when you reboot your phone, the bias might change that much anyway, so that the factory calibration can leave this much residual.
  • I expect the player to move at a speed in the order of 1 m/s.  When I multiply this by the Earth's rotation speed 72.7 urad/s, the maximum Coriolis acceleration is on the order of 0.1E-3 m/s^2, or roughly 0.01 mg--which is BURIED under the noise floor.
A word of caution on converting the Input.acceleration into specific force: Input.acceleartion is in the RIGHT handed IMU frame while the Unity body frame is in LEFT handed system, as shown in my previous blog post, so only the x and y axes components should flip sign.  As for the gravitational acceleration estimate, precision navigation systems go through a great deal of trouble to get an accurate gravity estimate, using the world gravity model almanac and some even using expensive gravitometers.  I decided to just use the magnitude of the average acceleration measured during the "hold still" period as the magnitude of the gravitational acceleration.

Navigation update equations

Average attitude

In my last blog post, I showed discrete attitude update in Unity using the Input.gyro sensor input--using the quantity attitude increment αbib. If I assume that the angular rate is relatively constant during the sample period, I can do exactly the same calculation for the average attitude during the last sample period by forming a relative rotation quaternion using HALF of the attitude increment.  I considered briefly whether to just use Unity's Quaternion.Slerp() function to spherically interpolate between qbl(-) and qbl(+), but decided I could better take advantage of the fact that for splitting the attitude increment into half, rather than iterating (as SLERP seemed to).

Velocity update

Once allb, the average acceleration during the last sample period is obtained, the velocity estimate is just a rectangular integral of allb with the sample period Ts: vbl(+) ~ vbl(-) + allb Ts

Average velocity

To integrate the velocity by sample period for position change, the average velocity is estimated as a simple average of the previous and the updated values: vbl(avg) = vbl(+) vbl(-) ) / 2

Position update

Finally, the position is integrated from the initial position (I assumed 0 for now): rbl(+) rbl(-) + vbl(avg) Ts

Unity implementation

My Unity implementation (my github repo) is mostly straight-forward--except for the fact I do NOT invert qbl (which is the current estimate of the body relative to the local frame) to obtain qlb.  I did at first, and found that rotating the specific acceleration by qlb using Unity's Quaternion.operator*(Quaternion, Vector3) method rotated the vector to the opposite direction of rotation.  In other words, Quaternion.operator*(qbl , -accelbl𝞵) yielded correct result!

Even before I tried this implementation, I knew that double-integrating the cheap MEMS accelerometer output will end up with my player "going to the moon"--as I've heard the experts describe the problem--pretty quickly.  But knowing it in your head and actually seeing it is quite another.  See for yourself how bad it is.

Now what?

Clearly, this solution is unfit for the intended purpose: to play a game lasting up to 10 minutes.  I have an idea to improve the navigation solution error by sensor fusing the sound echo based pseudo-range estimate with the INS solution just given.  But the equations I've derived so far are too complicated to present in this post, so let me see if I can pull it off first.

Feb 5, 2017

Attitude update in Unity using Input.gyro

In my previous blog post, I initialized the tip and tilt of phone in Unity using Input.acceleration.  The next step is to measure the relative attitude change from the initial attitude using Unity's gyro input.  Sure you can use Input.gyro.attitude to use the OS or Unity's estimate of the device attitude, but I want to learn inertial navigation equations, so I am going to trudge along, with only Paul Groves' GNSS 2nd edition to guide me.

The rotation rate in Unity player's body frame

The first order of business is to draw the gyro triad (which is reported in the RIGHT handed IMU frame) in the Unity PlayerController's (LEFT handed) body frame, as I've done in the previous blog entry, whereupon you should notice that the X and Y components should change sign.  Therefore, the angular rate in the player's body frame should be expressed like this:

Vector3 w_inb_lb = //Input.gyro.rotationRate // in mu frame (RH!)
                    Input.gyro.rotationRateUnbiased
                ;

w_inb_lb.Set (-w_inb_lb.x, -w_inb_lb.y, w_inb_lb.z);

First order attitude update equation

The core idea of attitude update is to rotate the current attitude (however it is expressed--quaternion, directional cosine matrix, rotation vector and angle, etc) by the INCREMENTAL rotation from the previous update to the current update.  It is mathematically captured in the attitude differential equation, as in GNSS 2nd edition Appendix E, equation E.33.  A first order approximate discrete time implementation of that differential equation for in Earth centered inertial frame (ECI) is
qbi(+) ~ qbi(-) × Quaternion(αbib, 1)

where the attitude increment αbib is approximated as a product of the average angular rate sensed in the last interval and the sampling time Ts.
αbib ~ 𝟂bib Ts

But since  the local tangent frame is located at latitude L and yawed at an angle from the longitudinal line (the one that goes from the South Pole to the North Pole) and rotates with Earth, there will be an extra term for the quaternion from the local tangent frame to the body frame:
qbl(+) ~ qbl(-) × Quaternion(αblb, 1) - Quaternion(𝟂lie Ts/2, 0) × qbl(-)

where the Earth's rotation vector is resolved into the local tangent frame by the latitude and angle from the longitude:

𝟂ielie = |𝟂ie| [ coscos𝟁el ; cossin𝟁el ; sin]

Figuring out the phone's ROUGH lat/long is not that onerous, but when consider the magnitude of the Earth's rotation |𝟂ie| (2𝞹 / 24 hr = 72.7 urad/s), I feel safe in ignoring this term for a game that should last on the order of 10 minutes.  So I came up with the following Unity code:

Vector3 alpha = (0.5f * Time.deltaTime) * w_inb_lb; Quaternion q_inc_lb = new Quaternion(alpha.x, alpha.y, alpha.z, 1);

q_lb = q_lb * q_inc_lb; //Unity quaternion multiples LEFT -> RIGHT
q_Ub = Q_Ul * q_lb; //Rotate the q_lb by Q_lU to go to Unity
Normalize(ref q_Ub); //Normalizing quaternion is less onerous than orthogonalizing a DCM
myRigidbody.MoveRotation(q_Ub);//Move the player to the est. attitude

You can find this source code in my github repo: https://github.com/henrychoi/OpenCVUnity

The following video is the result of the above algorithm in the Unity Remote (which relays the phone's Input.gyro updates to the Unity editor--and Unity editor plays the video in the Unity Remote screen).  I start the session by initializing the attitude while the phone is stationary on my desktop vise (yes I have a desktop vise), and then take it out of the vise and rotate it around for like 30 seconds before putting it back in the vise.  To help visualize the current attitude, I show a chase camera view from right behind and slightly above the player (upper left corner), and also a target ("ENEMY") located straight ahead.  When the player's forward vector (0, 0, 1) lands on the target, the target lights up red.
Clearly, the attitude is not coming back to the starting value after this brief excursion, so that I cannot build a playable game with this algorithm.  I suspect that the problem is the high bias and noise of MEMS gyro in my phone, but how much improvement can higher order attitude update equation be?

Higher order attitude increment

According to Groves GNSS 2nd ed. Appendix E, section 6.3, a more precise attitude increment than the Quaternion(αbib, 1) is
qb+b- = Quaternion( αbib sin(|αbib| / 2 ) / |αbib|, cos( |αbib| / 2 ) )
qbi(+) = qbi(-) × qb+b-

But due to the division by 0 problem, this precise increment cannot be implemented; instead, 4th order expression for qb+b- is suggested
qb+b- = Quaternion( As αbib, Ac );
As = 1/2 - (|αbib| / 2)2 / 12
Ac = 1 - (|αbib| / 2)2 / 2 + (|αbib| / 2)4 / 24

Here's the Unity implementation, showing only the parts that changed from the 1st order approximation given earlier.
              Vector3 alpha = // Mathf.Deg2Rad * //attitude increment
                    Time.deltaTime * w_inb_lb;
float alpha_div2 = 0.5f * alpha.magnitude;
float alpha_div2_sq = alpha_div2 * alpha_div2;
float alpha_div2_qd = alpha_div2_sq * alpha_div2_sq;
float As = 0.5f - 0.083333333333333f * alpha_div2_sq;
float Ac = 1.0f - 0.5f * alpha_div2_sq + 0.041666666666667f * alpha_div2_qd;
alpha *= As;
q_inc_lb.Set (alpha.x, alpha.y, alpha.z, Ac); // 4st order approximation

When I tried this solution, the drift in non-stationary case was still there, so it looks like my intuition was correct after all.