So I need some image processing in my Unity game...
In my previous blog entry, I theorized on the feasibility of sound based pseudo range calculation between mobile devices within 5 m. Of course, to localize a device using pseudo-range alone, you need at least 3 pseudo ranges to solve the Eucledian distance equation in 3D space even in the best case (devices time synchronized, no clock drift, etc). Due to the limited bandwidth of smartphone speakers and mics (~audible range), it would be difficult to obtain sufficient number of pseudo ranges with pulse lasting only on the order of 10 ms (long pulse is suitable only for stationary use case). Plus, it is impossible for a userspace application to get time synchronization fidelity required for precise range estimation. So I wondered if I can estimate the relative angle of my phone relative to a light source--such as the LED torch available on all smartphones these days. So if a stationary point light source is within the field of view of my camera, I have more constraint in the problem geometry shown below, where the B device is mobile and A device is stationary device. I put a torch "t" on the stationary device, and a camera "c" on the mobile device.iOSTorch
To light up my iPhone's torch, I bought the iOSTorch plugin from the Unity Asset Store ($2). It did not work right away on iOS 10, but it was open source, so I modified to hard code the iOS version to 10, and to turn off the flash (it was lighting up both the torch AND the flash). The torch brightness is specifiable as a float, but iPhone internally quantizes to 3 levels; even at the "low" setting, it is quite blinding at a close distance. To make it dimmer, I tried to cycle it off and on rapidly to get another 1 or 2 bits of resolution, but my phone overheated for some reason and then was bricked, and had to be restored through iTunes (so don't try it!). So I decided to suck it up and deal with the saturated pixels from the overly bright torch. With the image processing I discuss below, the result looks reasonable at distance beyond 1 m, as you can see below.I turn on the torch in PlayerController.Start():
switch(Application.platform) {
case RuntimePlatform.IPhonePlayer:
iOSTorch.Init ();
iOSTorch.On (0.001f);
break;
case RuntimePlatform.OSXEditor:
case RuntimePlatform.WindowsEditor:
yIsUp = true;
Debug.Log ("Using Unity Remote");
break;
}
In case the game is suspended, I turn off the torch in the MonoBehaviour.OnApplicationPause() callback:
void OnApplicationPause( bool pause )
{
if (Application.platform == RuntimePlatform.IPhonePlayer) {
if (pause) iOSTorch.Off ();
else iOSTorch.On (0.001f);
}
}
In search of the torch photo-electrons in the image
I started my 3D game as a clone of the OpenCVforUnity's Optical Flow demo, so I already had OpenCV ready to go (it's only $95 on the Unity Asset Store). OpenCVUnity's WebcamTextureToMatHelper transforms the RGB memory into OpenCV Mat--this is the starting point of the image processing.
using OpenCVForUnity;
{
...
Mat rgbaMat = webCamTextureToMatHelper.GetMat ();
int width = rgbaMat.width (), height = rgbaMat.height ();
int width = rgbaMat.width (), height = rgbaMat.height ();
Without an academic background in image processing, I approach the problem quite naively: I aim to boost the signal, and reduce noise, where the signal is the photo-electron count from the remote device's torch. The LED torchlight seemed to be white, so wanted to kill pixels that did not have all 3 colors. A simple element-wise product of the 3 channels seemed like a good filter for the white light. OpenCV made this task trivial:
List<Mat> channels = new List<Mat>();
Core.split (rgbaMat, channels);
Core.split (rgbaMat, channels);
rxgxbMat = channels[0].mul(
channels[1].mul(channels[2], 1.0f/256), 1.0f/256);
Scaling the successive multiplication is necessary to avoid saturating out the result. I guess OpenCV's Mat.mul() method internally uses wide datatype before applying scaling and the saturation, because each color channels were originally 8 bit unsigned counts. The result seems to confirm my intuition that the torch light seems to be the brightest thing in the image. So bright in fact, that it seems to distort the pixels around it, as you can see below.
Any image is contaminated by some noise, and one of the most effective noise reduction technique in image processing is the median filter, which replaces a pixel value with the median of ensemble consisting of itself and neighboring pixels. Lower noise helps the morphological operations like opening produce a cleaner (more canonical) result.
Mat rxgxbMat, tempA, tempB, emptyMat = new Mat(),
erode_kernel = new Mat();
erode_kernel = new Mat();
Point erode_anchor = new Point(0,0);
void Update ()
{
...
if (tempA == null)
tempA = new Mat (height, width, channels [0].type ());
if (tempB == null)
if (tempB == null)
tempB = new Mat (height, width, channels [0].type ());
Imgproc.medianBlur (rxgxbMat, tempA, 7);
Imgproc.morphologyEx (tempA, tempB, Imgproc.MORPH_OPEN,
Imgproc.medianBlur (rxgxbMat, tempA, 7);
Imgproc.morphologyEx (tempA, tempB, Imgproc.MORPH_OPEN,
erode_kernel, erode_anchor, 2);
Core.MinMaxLocResult minmax = Core.minMaxLoc (tempB);
Debug.Log (String.Format("max {0} @ {1}", minmax.maxVal, minmax.maxLoc));
Debug.Log (String.Format("max {0} @ {1}", minmax.maxVal, minmax.maxLoc));
Note that I am using iteration=2 for the opening operation. It seemed that iteration=1 did not clean up the distortion enough, and iteration=3 seemed to yield another kind of distortion. iteration=2 seemed to produce the cleanest looking blob.
The max value in my study at 2 PM today (a rainy day) was 116 (out of 255), but when I turned on the torch, the maximum value shot up to 253 (pretty near the saturation), and consistently tracked where the torch was shown in the camera.Estimating the torch position
Lacking prior knowledge of the torch's orientation to my camera, I should just estimate the torch's position as the center of the brightest blob. I just learned that any part of the image with salient information is called a "feature" in image processing. Chapter 16 of Learning OpenCV 3 is all about 2D feature detection, including the SimpleBlobDetector. To further simplify the image to feed to the simple blob detector, I use the fact that the torch seems to be the brightest thing in the scene when it is on. So I subtract 90% of the found max intensity above from the morphologically opened image.Core.subtract (tempB,
new Scalar(Mathf.Max((float)(0.9f * minmax.maxVal), 200.0f)),
tempA);
MatOfKeyPoint keypoints = new MatOfKeyPoint ();blobDetector.detect (tempA, keypoints);
Features2d.drawKeypoints (rgbaMat, keypoints, rgbaMat);
Debug.Log ("keypoints found " + keypoints.size ());
The detector is pre-created and initialized in PlayerController.Start() method. I actually wanted to create a new detector with different thresholds in each Update(), but the OpenCV Unity API only allows initialization of the feature detector from a file, which is onerous to modify at every frame.
FeatureDetector blobDetector;
void Start ()
{
...
string blobparams_yml_filepath = Utils.getFilePath ("blobparams.yml");
blobDetector = FeatureDetector.create (FeatureDetector.SIMPLEBLOB);
blobDetector.read (blobparams_yml_filepath);
So the magic of the blob detector is in the parameters. After playing around with various parameters, this is what I settled on for now:
%YAML:1.0
thresholdStep: 10.
minThreshold: 10.
maxThreshold: 20.
filterByColor: True
blobColor: 255 # Look for light area (rather than dark)
filterByArea: True
minArea: 100.
maxArea: 5000.
filterByCircularity: True
minCircularity: 0.75
maxCircularity: 1.5
This setting works pretty well if the camera sees a roughly circular pattern, as you can see by where the key point marker was placed. The image is mostly dark because I subtracted 90% of the max value above.
Note that the center of the blob was correctly estimated. But because the torch is so bright, a lot of light reflects off the back of my iPhone, which is not encased. The reflections distort the observed image, and the circularity check rejects the blob, as you can see here.