May 13, 2017

Dual system architecture for Raspberry PI

Because I first started learning about programming in a real-time context, I never became comfortable with the idea of controlling physical (safety-critical) devices on a non-deterministic OS.  So I stayed with MCUs (Microcontroller) for a long time.  On the other hand, as I tried to build shipping products, I became painfully aware of the whole infrastructure necessary to control and communicate with the MCU, and display the relevant information to the human operator or remote systems.  I realized that a big chunk of my paycheck was justified because I had the patience and skills to do this mundane but necessary chores for the companies I worked in.  In fact, when I was in the biotech industry, my fellow engineers often viewed ourselves as plumbers (although not as well paid).  But in my first foray into the world of SoC programming on Xilinx Zynq, I came up with a solution to the problem of integrating a general OS (running UI) and a bare-metal system by running Linux on CPU0 of Zynq, and running the bare-metal firmware on CPU1 of Zynq.  I later learned that my solution is being used at CERN and Bosch.  The solution hinges on the OCM (on-chip-memory) and the software interrupt available on the Zynq, as you can see below:
The interrupt controller is part of ARMv7 architecture, but OCM is Xilinx's contribution.  The 2 systems can even share data that is too big to fit in the relatively tiny OCM; I even prototyped a custom image collection system that takes images from an Aptina (now On Semi) CCD scheduled by the bare metal, that Linux can read later from the shared DRAM area, using Xilinx VDMA core, as explained in my previous blog entry.

But since leaving the biotech industry, I haven't touched FPGA.  Instead, I have worked with ARM Cortex M (even got Linux to run on ARM Cortex M4!), FreeRTOS and Raspberry PI (3).  In the last year, I was too busy studying DSP and computer vision, but I am beginning to itch for low level programming again.  I was blown away by Raspberry PI's bang-for-the-buck (it even has a programmable GPU, which you can use as DSP), so this time, I think I will ditch my $600 Zedboard in favor of the $35 RPI3 I bought more than a year ago.  The only complaint I have is the low performance of the data pipe from the CPU to the GPU, but this is a common problem in ALL computation pipeline that includes GPU (including CUDA).  I have some hope that ARM will solve the problem by more tightly integrating Mali with ARM Cortex A series in the future, but let's not let that stop us.

So if the OCM was the key to the dual system in my solution on Zynq, and BCM2837 (the SoC for RPI3) is MISSING an equivalent HW, how shall I work around it?  I first thought about using the DRAM as the shared memory, but I am concerned about degrading the cache hit, which I want to keep high for the hard real-time code still running out of DRAM.  Since the message path between the real-time and the non-real-time system does NOT have to be real-time, I am going to use another HW to shuttle the message, as shown below.
RPI3 only has 1 USART, so I cannot directly connect the 2 halves.  But since Linux (and U-Boot also, I think) is capable of using USBserial console out of the box (it appears as a /dev file), there should be little work necessary to communicate with the bare-metal code--if the bare-metal can read/write to the BCM2837's USART peripheral.  The current maximum baud rate supported by the USBserial IC category leader FT232R is 3 Mbps, which is fast enough to shuttle most messages into/out of the real-time system--except images.  So this architecture requires keeping image processing algorithms on Linux--which is frankly just easier (because you can't port OpenCV to bare metal).

Another change from my previous solution is to remove the capability to start/stop the real-time subsystem from Linux--at least for now.  I heard from Bosch that safety-critical real-time program needs to be available quickly from power-on, so the ~10 second delay when being launched by Linux would be unacceptable.  So this time, U-Boot is going to first kick off the bare metal program on CPU3, and THEN start Linux (before exiting).

I lost my Linux development system when I exited the biotech industry, so I just installed Parallels, to begin recreating my Buildroot development environment.  Because I am busy studying a few other things (on top of my full time job), a fully working demo is going to take some time to materialize.  If you know how to do the following, you should just be able to pull it off by yourself:

  1. Building and running a Buildroot distribution on RPI.
  2. Booting multiple systems (in sequence) from U-Boot.
  3. Configuring Linux to run on only subset of the CPUs, and using only a portion of the physical DRAM.
  4. Writing a bare-metal (or FreeRTOS) system that runs out of the reserved portion of the DRAM (with L1/L2 cache turned on).