May 13, 2017

Dual system architecture for Raspberry PI

Because I first started learning about programming in a real-time context, I never became comfortable with the idea of controlling physical (safety-critical) devices on a non-deterministic OS.  So I stayed with MCUs (Microcontroller) for a long time.  On the other hand, as I tried to build shipping products, I became painfully aware of the whole infrastructure necessary to control and communicate with the MCU, and display the relevant information to the human operator or remote systems.  I realized that a big chunk of my paycheck was justified because I had the patience and skills to do this mundane but necessary chores for the companies I worked in.  In fact, when I was in the biotech industry, my fellow engineers often viewed ourselves as plumbers (although not as well paid).  But in my first foray into the world of SoC programming on Xilinx Zynq, I came up with a solution to the problem of integrating a general OS (running UI) and a bare-metal system by running Linux on CPU0 of Zynq, and running the bare-metal firmware on CPU1 of Zynq.  I later learned that my solution is being used at CERN and Bosch.  The solution hinges on the OCM (on-chip-memory) and the software interrupt available on the Zynq, as you can see below:
The interrupt controller is part of ARMv7 architecture, but OCM is Xilinx's contribution.  The 2 systems can even share data that is too big to fit in the relatively tiny OCM; I even prototyped a custom image collection system that takes images from an Aptina (now On Semi) CCD scheduled by the bare metal, that Linux can read later from the shared DRAM area, using Xilinx VDMA core, as explained in my previous blog entry.

But since leaving the biotech industry, I haven't touched FPGA.  Instead, I have worked with ARM Cortex M (even got Linux to run on ARM Cortex M4!), FreeRTOS and Raspberry PI (3).  In the last year, I was too busy studying DSP and computer vision, but I am beginning to itch for low level programming again.  I was blown away by Raspberry PI's bang-for-the-buck (it even has a programmable GPU, which you can use as DSP), so this time, I think I will ditch my $600 Zedboard in favor of the $35 RPI3 I bought more than a year ago.  The only complaint I have is the low performance of the data pipe from the CPU to the GPU, but this is a common problem in ALL computation pipeline that includes GPU (including CUDA).  I have some hope that ARM will solve the problem by more tightly integrating Mali with ARM Cortex A series in the future, but let's not let that stop us.

So if the OCM was the key to the dual system in my solution on Zynq, and BCM2837 (the SoC for RPI3) is MISSING an equivalent HW, how shall I work around it?  I first thought about using the DRAM as the shared memory, but I am concerned about degrading the cache hit, which I want to keep high for the hard real-time code still running out of DRAM.  Since the message path between the real-time and the non-real-time system does NOT have to be real-time, I am going to use another HW to shuttle the message, as shown below.
RPI3 only has 1 USART, so I cannot directly connect the 2 halves.  But since Linux (and U-Boot also, I think) is capable of using USBserial console out of the box (it appears as a /dev file), there should be little work necessary to communicate with the bare-metal code--if the bare-metal can read/write to the BCM2837's USART peripheral.  The current maximum baud rate supported by the USBserial IC category leader FT232R is 3 Mbps, which is fast enough to shuttle most messages into/out of the real-time system--except images.  So this architecture requires keeping image processing algorithms on Linux--which is frankly just easier (because you can't port OpenCV to bare metal).

Another change from my previous solution is to remove the capability to start/stop the real-time subsystem from Linux--at least for now.  I heard from Bosch that safety-critical real-time program needs to be available quickly from power-on, so the ~10 second delay when being launched by Linux would be unacceptable.  So this time, U-Boot is going to first kick off the bare metal program on CPU3, and THEN start Linux (before exiting).

I lost my Linux development system when I exited the biotech industry, so I just installed Parallels, to begin recreating my Buildroot development environment.  Because I am busy studying a few other things (on top of my full time job), a fully working demo is going to take some time to materialize.  If you know how to do the following, you should just be able to pull it off by yourself:

  1. Building and running a Buildroot distribution on RPI.
  2. Booting multiple systems (in sequence) from U-Boot.
  3. Configuring Linux to run on only subset of the CPUs, and using only a portion of the physical DRAM.
  4. Writing a bare-metal (or FreeRTOS) system that runs out of the reserved portion of the DRAM (with L1/L2 cache turned on).

May 8, 2017

Understanding Doppler phase

I wanted to fuse the real-time mic recording on my iPhone with the accelerometer/gyro signals for my hobby indoor location project.  But for reasons I discussed in my previous blog entry, I am currently stuck.  To get unstuck, I started thinking about the range rate estimation used in radars, which are primarily Doppler based.  Doppler frequency shift is covered in high school physics, but it's been a long time since I've had to think about Doppler, so I derived it out myself.

Consider a mic M initially at rest, being subject to a sound wave of frequency F, and wave speed c, as shown below.  For convenience, let's pick the beginning of sample to coincide with phase 0 of the sine wave.  This is just picking the constant offset phase, so using 0 initial phase does not invalidate the ensuing derivation.
Since the mic is at rest (v_M = 0), the sound wave passes by the mic at c [m/s].  After t [s] passes, the mic will experience the phase 𝛟 = 2𝛑Ft of the sound wave, as shown below.
Now suppose the mic is moving at v [m/s].  Then the situation is exactly analogous to a race between the mic and the sound wave; so relative to the mic, the sound wave is moving slower than when the mic was stationary, as shown below.
How much slower?  After the same t [s], the wave phase will have moved -2𝛑Ft * v/c less than previous case, so that the mic is now experiencing the phase 𝛟 = 2𝛑F(1-v/c)t.  When I divide the difference of phase between the 2 cases -2𝛑t * v/c by the elapsed time t, I get -2𝛑F v/c, which is exactly the Doppler frequency shift.  This also make sense because the derivative of phase is frequency.

This derivation might be helpful if you want to measure Doppler from the raw waveform rather than by measuring the received wave frequency, as in radar processing.  The problem I find with frequency based estimation is that I cannot get the instantaneous frequency; since frequency requires several waveforms to estimate (the more the more accurate).  So while the Doppler from frequency domain processing may be fine in an average sense, I have to give up some resolution in the process.  Since radar waves are typically microwave frequency, requiring lots of periods is not a problem for radar applications (even after the signal is heterodyned and down-sampled).  But for sound waves sampled at CD sample rate, requiring say 20 periods (I would not go below due to noise concerns) of 10 kHz wave would equate to 10 / F = 2 ms, which may start to be a problem for high dynamics applications.
Estimating the wave phase directly as a function of the range rate might also be helpful for estimating the distance change directly from the raw audio signal.  I actually just read The Doppler Equation in Range and Range Rate Measurement, NASA Technical Note X-55373, dating back to 1965 (!) for the Apollo program, where the debate seems to have been over how best to track both the range and range rate of the rocket.  Even though the author (NASA Goddard) seems to have decided in favor of obtaining the range by integrating the estimated range rate (from Doppler), you might find the direct estimation of the range (without the random walk noise inducing integration step) useful.  In the next month, I hope to find some time to implement a Kalman filter to do just that.