Nov 25, 2017

"Bare metal" control of the New Haven OLED

Many embedded devices have no display, and make do with status LEDs (think about your home router or switch).  One step up is an LCD display--like the New Haven OLED display I bought a few years ago, to study the Linux frame buffer device drivers.  Originally, I was going to experiment on my Zedboard (which packs Xilinx Zynq SoC) but since then I've embraced the Raspberry Pi project.  So in this blog entry, I create a status display GUI on my NHD-1.27-12896UGC3, which comes with its own displayer controller SSD1351.  When complete, the RPi Linux driven terminal looks like this if the soldering and connections were OK.
RPi console output on NHD-1.27-12896UGC3.  Note the crisp color and the deep black.

RPi Linux supports SSD1351 out of the box

The Raspberry Pi linux kernel (can be cloned from https://github.com/raspberrypi/linux) already has the matching kernel module for the display driver, which you can verify for yourself by running the following commands on your Raspberry Pi:

pi@raspberrypi$ sudo modprobe configs
pi@raspberrypi$ gunzip -c /proc/config.gz > ~/BAK/.config 

You can see SSD1351 support in the resulting file:

CONFIG_FB_TFT_SSD1351=m

This define pulls in fb_ssd1351, which is one of the fbtft (TFT frame buffer) devices enumerated in fbtft_device.c: the 2 SSD1351 devices enumerated there are pioled and freetronicsoled128, neither of which are the NHD 1.27 128x96 device I have.  They are however driven similarly: 20 MHz SPI in Mode 0 (of the 4 SPI modes available; in mode 0, the slave samples SDA on the rising of SCL).  One puzzle is how pioled can drive a display only 32 pixels tall, when fb_ssd1351.c hard codes the initial height to 128, but let's see whether freetronicsoled128 can handle the "height=96" mod probe argument.  It looks like the probe code (fbtft_probe_dt) handles a whole bunch of options:

pdata->display.width = fbtft_of_value(node, "width");
pdata->display.height = fbtft_of_value(node, "height");
pdata->display.regwidth = fbtft_of_value(node, "regwidth");
pdata->display.buswidth = fbtft_of_value(node, "buswidth");
pdata->display.backlight = fbtft_of_value(node, "backlight");
pdata->display.bpp = fbtft_of_value(node, "bpp");
pdata->display.debug = fbtft_of_value(node, "debug");
pdata->rotate = fbtft_of_value(node, "rotate");
pdata->bgr = of_property_read_bool(node, "bgr");
pdata->fps = fbtft_of_value(node, "fps");
pdata->txbuflen = fbtft_of_value(node, "txbuflen");
pdata->startbyte = fbtft_of_value(node, "startbyte");
of_property_read_string(node, "gamma", (const char **)&pdata->gamma);

if (of_find_property(node, "led-gpios", NULL))
pdata->display.backlight = 1;

The default module properties in the kernel code for this device bears remembering:

.name = "freetronicsoled128",
.spi = &(struct spi_board_info) {
.modalias = "fb_ssd1351",
.max_speed_hz = 20000000,
.mode = SPI_MODE_0,
.platform_data = &(struct fbtft_platform_data) {
.display = {
.buswidth = 8,
.backlight = FBTFT_ONBOARD_BACKLIGHT,
},
.bgr = true,
.gpios = (const struct fbtft_gpio []) {
{ "reset", 24 },
{ "dc", 25 },
{},
},
}
}

The reset and dc pins above are the D/C# and RES# pins defined in the SSD1351 controller interface table shown below:
Since it mentions the DC# pin explicitly (rather than being tied low as for the 3-wire interface), the device driver is expecting to use the 4-wire SPI interface above--through RPi GPIO pin 25.  The kernel config did not set CONFIG_FBTFT_ONBOARD_BACKLIGHT because the device doesn't need backlight (it's an OLED!).  NHD-1.27-12896UGC3 data sheet shows the recommended wiring for the 4-wire SPI mode as follows:
Including the 3.3 V power and ground, only 7 wires connect the display module to RPi GPIO header:
  • D/C: RPi P1.22, GPIO.25
  • SCLK: RPi P1.23 (AKA SCLK)
  • SDIN: RPi P1.19 (AKA MOSI)
  • /RES: RPi P1.18, GPIO.24
  • /CS: RPi P1.24 (AKA CE0), GPIO.8
When all soldered and wired, the connection looks like this (ignore the logic analyzer probes on the display pins).

To load the kernel module, I supply the display height (which is different than the default 128 pixels) to the module argument like this:

sudo modprobe fbtft_device name=freetronicsoled128 height=96

But according to the kernel log, the height argument was ignored:

Nov 25 17:22:38 hchoi2-RPi1B kernel: [ 1709.489760] graphics fb1: fb_ssd1351 frame buffer, 128x128, 32 KiB video memory, 4 KiB DMA buffer memory, fps=20, spi0.0 at 20 MHz

Anyhow, the module load succeeded and I now have another frame buffer device (in addition to the default HDMI out):

pi@hchoi2-RPi1B:~ $ ls /dev/fb
fb0  fb1  

I can then use the 2nd frame buffer as the console output:

pi@hchoi2-RPi1B:~ $ con2fbmap 1 1

The console can be redirected by changing the last 1 in the above command to 0.

Low level control of SSD1351 on Arduino Uno

Linux FB framework is powerful but requires a lot of code, which does not fit on most deeply embedded targets.  The vendor (New Haven Display) put out a "bare metal" example on GitHub for controlling the device from Arduino Uno.  This is an easier way to understand the low level control than wading through the many layers of the Linux FB driver code.  The following is my annotation of the example Arduino code.

Low level primitive

The supplied example shows 3 different methods of sending command to the device: 2 parallel interface and the 4-pin SPI.  I am only interested in the serial interface (cannot dedicate that many pins just for the display!) so I will ignore the parallel interface going forward.  The chip requires MSb (most-significant-bit-interface), and Arduino will bit-bang each bit on its GPIO pin while holding CS (chip select) and D/C# low for the whole duration of 8-bits.

Writing 1 B of data over the serial is exactly the same, except for holding D/C# high while writing.

Initialization

  1. Chip reset: pull down the RES# pin for 500 usec, then pulling it up again, and then waiting for at least 500 usec.  
  2. Unlock command: write 0x12 and then 0xB1 to the command lock register (0xFD)
  3. Sleep mode on (display off): write (nothing ) to 0xAE register
  4. Set clock = divisor + 1, frequency = 0xF: write 0xF1 to 0xB3.  Writing to this register requires command unlocking (step #2).
  5. Set mux ratio
  6. Set display offset and start
  7. Set color depth to 18-bit (256k color), 16-bit format 2.
  8. GPIO input disabled
  9. Enable internal Vdd regulator
  10. Choose external VSL
  11. Set contrast current for the 3 collars (slightly different than the default: 0x8A, 0x70, 0x8A)
  12. Reset output currents for all colors
  13. Enhance display performance
  14. ...
  15. Sleep mode off (display on): write (nothing) to 0xAF register.

Blank out the entire screen to black

Blanking out the screen to any color just means writing the same (whatever) color to every pixel.  It consists of setup and data stage:
  1. Set column start and end to 0 and 127, respectively
  2. Set row start and end to 0 and 95, respectively
  3. Start write to RAM: write the destination register address (0x5C)
  4. For the next 128x96 pixels, write the given pixel value (RGB) as SPI data.  For the 262k color over 8-bit serial interface, the data format is given in Table 8-8 of the SSD1351 data sheet.  If I don't check for saturation, it's convenient to keep the colors as separate bytes, and output 8-bits for each color in rapid succession.

Print a fixed font letter

If I emit a different color for a pixel than the background color, I can show a dot at a given point.  If I arrange a group of neighboring pixels in a pre-arranged way, that is a symbol that can be shown at offset (x, y) on the screen.  If I then hold a read-only bitmap representing a letter, it is possible to print one letter at a time on the screen, by testing each bit of the bitmap as the pixel position moves to the right.  Here's an example of the letter 'E' in 10-point font:

const unsigned char A10pt [] = { // 'A' (11 pixels wide)
0x0E, 0x00, //     ###    
0x0F, 0x00, //     ####   
0x1B, 0x00, //    ## ##   
0x1B, 0x00, //    ## ##   
0x13, 0x80, //    #  ###  
0x31, 0x80, //   ##   ##  
0x3F, 0xC0, //   ######## 
0x7F, 0xC0, //  ######### 
0x60, 0xC0, //  ##     ## 
0x60, 0xE0, //  ##     ###
0xE0, 0xE0, // ###     ###
};

Note that this "10 point" font is actually 11 pixels tall and 13 pixels wide.  A for-loop to print this letter at position x and y on the screen is:

   index = 0;
   for(i=0;i<11;i++)     // display custom character A
   {
        OLED_SetColumnAddress_12896RGB(x, 0x7F);
        OLED_SetRowAddress_12896RGB(y, 0x5F);
        OLED_WriteMemoryStart_12896RGB();
        for (count=0;count<8;count++)
        {
            if((A10pt[index] & mask) == mask)
                OLED_Pixel_12896RGB(textColor);
            else
                OLED_Pixel_12896RGB(backgroundColor);
            mask = mask >> 1;
        }
        index++;
        mask = 0x80;
        for (count=0;count<8;count++)
        {
            if((A10pt[index] & mask) == mask)
                OLED_Pixel_12896RGB(textColor);
            else
                OLED_Pixel_12896RGB(backgroundColor);
            mask = mask >> 1;
        }
        index++;
        mask = 0x80;
        y_pos--;
   }
   x += 13;

This implementation is intimately tied to the font representation above (each row of the font consists of the 2 B and the pixel width and height are hard coded.  But note that a few of the hard coded parameters can be parametrized: the letter position (x, y), the letter itself, and the foreground color (and possibly the background color), and can be refactored into a common function that looks up the letter in a table--like the ASCII table:

void OLED_Text_12896RGB(unsigned char x_pos, unsigned char y_pos, unsigned char letter, unsigned long textColor, unsigned long backgroundColor);

This strategy is slow but functional.  Each byte write can be grouped together into a long sequence of bytes:
  • The nRS pin can be held low the whole time (i.e. avoid the repeated function calls)
  • The SPI write can be accelerated over DMA if the background is the same. That is, instead of a letter consisting of just 1 bitmap, it can just be a long sequence of colors for the entire rectangular region the letter takes up.  This will bloat the DATA segment dedicated to the letters.
Even more optimization techniques such as keeping a frame buffer and writing out a whole screen in one shot are just the beginning in graphics programming, and I won't write these myself because I don't want to reinvent the wheel.

Porting the Arduino example to RPi Linux user space

Driving out the SPI signal from RPi is an excellent way to prototype an embedded GUI platform even before the new board is brought up.  Even after the board is brought up, writing a user space program to try out an idea is a great convenience.  The key to porting the Arduino example to RPi is to leverage someone else's work on driving the RPi's SPI interface.  The BCM2835 library is mature and performant.  Using it, configuring the GPIO and SPI can be coded concisely:

#include <bcm2835.h>

#define    DC_PIN   25
#define   RES_PIN   24

int main() {
    if (!bcm2835_init())                                                                     
        return 1;                                                                            
    if (!bcm2835_spi_begin())    {                                                           
        fprintf(stderr, "bcm2835_spi_begin failed %d. Are you running as root??\n",          
                errno);                                                                      
        return 1;                                                                            
    }                                                                                        
    bcm2835_spi_setBitOrder(BCM2835_SPI_BIT_ORDER_MSBFIRST);      // The default             
    bcm2835_spi_setDataMode(BCM2835_SPI_MODE0);                   // The default             
    bcm2835_spi_setClockDivider(BCM2835_SPI_CLOCK_DIVIDER_32);                               
    bcm2835_spi_chipSelect(BCM2835_SPI_CS0);                      // The default             
    bcm2835_spi_setChipSelectPolarity(BCM2835_SPI_CS0, LOW);      // the default             
                                                                                             
// the output pins: D/C (GPIO.3), RES (GPIO.5)                                               
    bcm2835_gpio_fsel(DC_PIN, BCM2835_GPIO_FSEL_OUTP);
    bcm2835_gpio_fsel(RES_PIN, BCM2835_GPIO_FSEL_OUTP);

Divider = 32 yields 8 MHz SPI speed.  I could try going faster, but even at 8 MHz, the signal integrity is marginal.  When the part is integrated on a PCB, I should be able to go faster.  Anyway, the smiley shows that once the image is set, the display can just refresh itself without a periodic refresh from the host, which means that a slow processor like a C2000 can just update the display asynchronously.

This is not quite bare metal in the true sense.  But still, this code should be readily transferrable to an embedded target such as C2000.

Bare metal control of the NHD panel from C2000

TODO