Nov 15, 2015

Understanding the ADI ADV7511 DRM Linux driver

When I tried the Qt on Zedboard last time, I was overwhelmed by the complexity of the Linux graphics stack involving OpenGL.  I asked for any pointers to the Qt user forum, but didn't get any suggestion, so I just stuck with the Qt Widgets based program with the framebuffer backend. This time, I found this excellent explanation of the Linux graphics stack.

High level view of the Linux DRM SW stack

While reading this gem of a discourse, I found the Linux graphics stack images (this and this) on wiki to be indispensable for understanding the complex relationships between different moving parts.
Pearls of wisdom from Mr. Kitching's explanation:
An X server is a complicated thing; a recent count of source code lines showed the xorg X11 server to have significantly more lines of code than the Linux kernel.  ...In “direct rendering”, [sig] the server returns the ID of a local device node (/dev/dri/card*) which the client then opens (or in DRI3, an already-open file handle is returned). The client then uses libdrm to perform operations on this device-node in order to allocate graphics buffers and map them into the client’s memory-space. 
First were the “framebuffer” kernel drivers. These provide a fairly simple API that is capable of setting the graphics mode on the “main” output for a graphics chip, allowing applications to explicitly set individual pixels in the output framebuffer, and do a few simple accelerated operations (bitblits and rectangular fills). Framebuffer drivers expose a file /dev/fb{n} which userspace performs reads/writes and ioctls on; all card-specific drivers provide exactly the same API so userspace apps can use any framebuffer driver.  Modern cards ... often require drivers to generate GPU “programs” to do even 2D graphics - a far from trivial process. 
DRM drivers were created to support the DRI (direct rendering infrastructure): [user-space programs (like "window backing buffers", vertex graphs, textures, shader programs, and configuration commands)--through /dev/dri/* nodes--] allocate memory buffers, fill them with data or GPU instructions (through user-space library libdrm), and pass them to towards to GEM or TTM code, which passes the data/instruction to the graphics card.  The results are rendered into another buffer which is then submitted to the X server for composition.   DRI can theoretically be used to accelerate all sorts of graphics but in practice it is used for OpenGL.  The “DRM v2” drivers also support mode-setting (aka KMS). 
TTM kernel module provide a common kernel-private API for managing the transfer of data between system and GPU. Because TTM covered both the mem-on-card and UMA use cases, the API was quite complicated. The GEM (graphics execution manager) kernel module (really part of DRM; [provides a library of functions for managing memory buffers for GPUs]) was invented shortly after, with a much simpler API and a much simpler implementation - but it only supported UMA (it was developed by Intel who only produce integrated graphics). The TTM code was later reworked to keep the same functionality but provide the same simple API as GEM (“GEMified TTM”). 
EGL is an API for managing windows and buffers [-- and NOT for drawing operations; therefore EGL is merely a helper for OpenGL].  An OpenGL implementation is primarily responsible for generating appropriate blocks of data for a card to process, i.e. textures, tables of vertices, and streams of instructions - and all of this is card-specific but not operating system specific. However the data needs to be put into some buffer; OpenGL just requires the necessary buffers to be passed in as parameters, in order to retain its OS-independence. These buffers also need to be pushed to the card at some time. The responsibility of managing “windows” and “buffers” and “submitting” data has always been done external to OpenGL.  EGL was therefore invented as a standard API which can have a different implementation for each OS/windowing system. An app coded against the EGL and OpenGL interfaces therefore just needs to be linked to the right library at compile or runtime. 
The main use of DRI is to provide hardware acceleration for the Mesa implementation of OpenGL. DRI has also been adapted to provide OpenGL acceleration on a framebuffer console without a display server running.  DRI implementation is scattered through the X Server and its associated client libraries, Mesa 3D and the Direct Rendering Manager kernel subsystem.
In summary, modern graphics stack on Linux is vastly more complicated than the simple frame buffer used in early embedded systems.  It is challenging to understand a total system consisting of GPU HW (perhaps SW simulated), OpenGL API, and DMA operations between memory and GPU--because the code is scattered through the userspace, kernel, and even the HW (there is code to produce the HW, as well as the FW running on the remote HW itself).  But I hope to digest all of it--a bite at a time, starting with creation of the system to run Qt Quick OpenGL applications.

Creating a rootfs capable of running a QtQuick Control application

I don't have a GPU on the Zedboard that implements the OpenGL calls QtQuick will make, I will HAVE TO enable the mesa 3D (which is the open source implementation of OpenGL) in Buildroot, because Qt Quick requires OpenGL, as you can see in Qt Embedded document:
Some devices require vendor specific adaptation code for EGL and OpenGL ES 2.0 support. This is not relevant for non-accelerated platforms, for example the ones using the LinuxFB plugin, however neither OpenGL nor Qt Quick 2 will be functional in such a setup. 
The directory qtbase/mkspecs/devices contains configuration and graphics adaptation code for a number of devices. For example, linux-rasp-pi2-g++ contains build settings, such as the optimal compiler and linker flags, for the Raspberry Pi 2, and either an implementation of the eglfs hooks (vendor-specific adaptation code), or a reference to a suitable eglfs device integration plugin. The device is selected through the configure tool's -device parameter. The name that follows after this argument must, at least partially, match one of the subdirectories under devices.
To wit, in my Buildroot's Qt build folder, I only see the following "devices":

henry@w540:~/band/buildroot$ ls output/build/qt5base-5.5.0/mkspecs/devices/
common                           linux-buildroot-g++              linux-rasp-pi-g++
linux-archos-gen8-g++            linux-imx53qsb-g++               linux-sh4-stmicro-ST7108-g++
linux-arm-amlogic-8726M-g++      linux-imx6-g++                   linux-sh4-stmicro-ST7540-g++
linux-arm-hisilicon-hix5hd2-g++  linux-mipsel-broadcom-97425-g++  linux-snowball-g++
linux-arm-trident-pnx8473-g++    linux-odroid-xu3-g++             linux-tegra2-g++
linux-beagleboard-g++            linux-rasp-pi2-g++

Q: Does QtQuick require Mesa DRI?
A: Yes.  The following example shows that QtQuick app tries to load the DRI driver for the detected card, and then tries to fail back to swrast_dri driver.

QML_IMPORT_PATH: ./Qt5.2.0/gcc_64/qml:/home/james/src/ButtleOFX_Alpha_Linux64_v2.0/ButtleOFX/QuickMamba/quickmamba/../qml
"Qt Warning - invalid keysym: dead_actute" 
view <PyQt5.QtQml.QQmlEngine object at 0x7f63c3ce7640>
view <PyQt5.QtQml.QQmlEngine object at 0x7f63c3ce7640>
libGL error: unable to load driver: nouveau_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: nouveau
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast

To my dismay, I found that swrast DRI driver requires X11R7 in Buildroot (BR2_PACKAGE_XORG7)--but not the server.  So my Buildroot config contains the following packages:
  • Run a getty (login prompt) after boot
  • System configuration
    • Dynamic using devtmpfs + eudev <-- Qt5 eglfs_kms support seems to require it
    • supply root password <-- only to ssh into the target from the QtCreator using password
  • Target packages
    • Debugging
      • gdb <-- only to pick up gdbserver for debugging Qt application remotely
    • Graphics
      • mesa3d
        • DRI swrast driver: OpenGL implementation using the DRI infrastructure
        • OpenGL EGL: this is the API for managing windows and buffers
        • OpenGL ES: this is the OpenGL API
      • Qt5
        • Approve free license (Qt asks to approve the Qt free license during build)
        • gui module
          • OpenGL support
            • OpenGL API
              • OpenGL ES 2.0+
          • X.org XCB support
          • eglfs support (means egl full-screen): Qt Quick applications crash on startup without this, but why does Buildroot automatically select "linuxfb support" when I choose this?
          • GIF support
          • JPEG support
          • PNG support
        • qt5graphicaleffects
        • qt5quickcontrols
      • X.org X Window system
        • xorg-server (to avoid Qt Quick application saying "Could not open display")
          • Modular X.org (because KDrive/TinyX uses framebuffer instead of DRM)
        • X11 applicatoins
          • xinit: X window system initializer
    • Networking
      • openssh <-- this is ONLY to debug Qt application remotely
  • Libraries
    • Graphics
      • Q: what about libdri2 (DRI2 extension to the X window system)?
      • libdrm
        • Install test programs <-- for sanity testing DRM drivers
I changed buildroot/packages/qt5/qt5base/qt5base.mk to pick up support for KMS (and get rid of the unnecessary platforms):

QT5BASE_CONFIGURE_OPTS += \
-optimized-qmake \
-kms -no-xcb -no-directfb -no-linuxfb \
-no-cups \
-no-nis \
-no-iconv \
-system-zlib \
-system-pcre \
-no-pch \
-shared

Running 'make' at the buildroot top level starts the rootfs build process, which should take about half hour (much faster if rebuilding with ccache enabled) on an average modern laptop.

Allow root ssh into the target for remote debugging

When I tried to ssh into the target from the development laptop, the target's sshd refused connection.  I don't remember previous opensshd doing this before, but if you run into this, just add the following line into the rootfs's /etc/ssh/sshd_config:

PermitRootLogin yes

Now I could create a new "Generic Linux" device in QtCreator, and pass the test of connecting to the target.  It would be better to change the rootfs skeleton used by Buildroot, but I'll come back to this later.

Resulting rootfs

X (and not even the Xserver) adds nearly than 50 MB to the Linux rootfs size--in fact it is larger than all other parts of the linux rootfs COMBINED, as you can see below, where uImage is the uncompressed kernel image, and the rootfs.tar is the uncompressed image of the rootfs.

henry@w540:~/band/buildroot$ ls -lgh output/images/
total 205M
-rw-r--r-- 1 henry  69M Nov  1 08:35 rootfs.cpio
-rw-r--r-- 1 henry  31M Nov  1 08:35 rootfs.cpio.gz
-rw-r--r-- 1 henry  31M Nov  1 08:35 rootfs.cpio.uboot
-rw-r--r-- 1 henry  70M Nov  1 08:35 rootfs.tar
-rwxr-xr-x 1 henry 2.1M Oct 31 17:09 u-boot
-rw-r--r-- 1 henry 3.5M Oct 31 17:11 uImage
-rw-r--r-- 1 henry  11K Oct 31 17:11 zynq-zed-adv7511.dtb

ADI did not supply a DRI driver for the HDMI device--at least none that Buildroot knows about, so there is only swrast_dri:

# ls /usr/lib/dri/
swrast_dri.so

The "-kms" into the Qt base config shown in the previous section causes Qt to emit the eglfs-kms integration library:

# ls /usr/lib/qt/plugins/egldeviceintegrations/
libqeglfs-kms-integration.so  libqeglfs-x11-integration.so

Without this dynamic library, Qt eglfs will just fall back to the eglfs_x11, which I do NOT want to use (took too much code from just the X lib required for mesa eglfs and swrast).

Following the procedures I've used in the past (this and this), I network boot Linux on my Zedboard.  On boot, the target has used about 54 MB of RAM, according to /proc/meminfo:

# cat /proc/meminfo
MemTotal:         500148 kB
MemFree:          446284 kB
MemAvailable:     475100 kB
...

I chose -Os over space optimizing option, and I am sure the network stack eats up a large amount of RAM, but I will continue with the convenient network development environment for now.

During bootup, the axi-hdmi device driver is probed, as you can see from dmesg:

[drm] Initialized drm 1.1.0 20060810
platform 70e00000.axi_hdmi: Driver axi-hdmi requests probe deferral
...
axi-hdmi 70e00000.axi_hdmi: fb0:  frame buffer device
axi-hdmi 70e00000.axi_hdmi: registered panic notifier
[drm] Initialized axi_hdmi_drm 1.0.0 20120930 on minor 0

The probing is happens during platform driver init, because the axi_hdmi_tx driver is declared in the DTS.

Driver parameters are declared to the kernel in DTS

For an ARM Linux target to boot, 3 binaries are read (at minimum):

  1. Compressed Linux kernel image
  2. DTB (device tree blob) which is compiled from DTS
  3. Root file system--unless the target mounts a non-volatile file system.

The axi_hdmi_tx driver entry is found in zynq-zed-adv7511.dtsi:

axi_hdmi@70e00000 {
compatible = "adi,axi-hdmi-tx-1.00.a";
reg = <0x70e00000 0x10000>;
encoder-slave = <&adv7511>;
dmas = <&axi_vdma_0 0>;
dma-names = "video";
clocks = <&hdmi_clock>;
};

The DTS tells the driver the HW register address to read/write to.  It also identifies the DMA device driver: axi_vdma DTS node (although how it does this is unnecessarily complicated for the common case of just 1 DMA driver; dma-names establishes the array of DMA drivers can be used among the "dmas".  So the 1st element axi_vdma is named "video" in the DTS.  The driver later looks for the DMA driver "video" just named.  Despite the wild goose-chase, the driver finds the axi_vdma, which has the following DTS description:

axi_vdma_0: axivdma@43000000 {
compatible = "xlnx,axi-vdma";
#address-cells = <1>;
#size-cells = <1>;
#dma-cells = <1>;
#dma-channels = <1>;
reg = <0x43000000 0x1000>;
xlnx,include-sg = <0x0>;
xlnx,num-fstores = <0x3>;
dma-channel@43000000 {
compatible = "xlnx,axi-vdma-mm2s-channel";
interrupts = <0 59 0x4>;
xlnx,datawidth = <0x40>;
xlnx,genlock-mode = <0x0>;
xlnx,include-dre = <0x0>;
};
};

Even though this investigation is about the ADI ADV7511 DRM device driver, DMA is essential for high performance device driver.  So I will look into how the DRM driver uses DMA further down below.  Just like the adv7511 encoder driver, the vdma parameters are largely cryptic until you read the IP datasheet.  I studied the Xilinx VDMA IP when I worked on an image sensor bringup on Zynq, and remember that the datawidth is the number of bytes in each DMA transfer, num-fstores means the size of frame ring buffer.   Note that the DMA controller exposes its own register at another address that was decided in the HW design.  So let's take a detour through the HW land to better understand the memory mapping before diving into the device driver.

The HW controlled by the DRM driver

The HW registers base address is seen above matches the Vivado HW configuration as shown below.
The axi_hdmi_core is Verilog code from ADI, shown in the upper right corner (where both the yellow control signals and the green data signals are going to) of the Zynq HW design diagram below.

It is this axi_hdmi_tx IP that shakes the HDMI pins routed to the ADV7511 sitting a couple of inches away from the Zynq processor on the Zedboard.  To find the trace for these pins (shown to the right of the axi_hdmi_tx IP in the above schematic), one can consult the Vivado constraint file:

set_property  -dict {PACKAGE_PIN  W18   IOSTANDARD LVCMOS33} [get_ports hdmi_out_clk]
set_property  -dict {PACKAGE_PIN  W17   IOSTANDARD LVCMOS33} [get_ports hdmi_vsync]
set_property  -dict {PACKAGE_PIN  V17   IOSTANDARD LVCMOS33} [get_ports hdmi_hsync]
set_property  -dict {PACKAGE_PIN  U16   IOSTANDARD LVCMOS33} [get_ports hdmi_data_e]
set_property  -dict {PACKAGE_PIN  Y13   IOSTANDARD LVCMOS33} [get_ports hdmi_data[0]]
...
set_property  -dict {PACKAGE_PIN  V13   IOSTANDARD LVCMOS33} [get_ports hdmi_data[15]]

Note that the axi_hdmi_tx IP does NOT control the I2C pins--which is how one reads/writes the ADV7511 registers.  That is done through the axi_iic_main IP (the one right below the axi_hdmi_dma IP in the above schematic), through the S_AXI bus--as I learned more than a year ago, when first bringing up my Zedboard HDMI display.  In the memory map screenshot above, the base address of the axi_iic_main was assigned 0x4160_0000.  In summary, we now know that there are  at least 3 HW to control for the ADV7511 video operation:
  1. Low bandwidth ADV7511 internal register read/write through I2C: mediated by the Xilinx axi_iic_main IP
  2. High bandwidth parallel bus pixel data transmission to the ADV7511: mediated by the axi_hdmi_tx IP
  3.  DMA frame buffer from the system memory to axi_hdmi_tx IP: axi_hdmi_dma IP
In all cases (including sending the I2C packets), the Linux kernel interfaces with the HW through memory mapped registers.  Let's now change direction and approach the problem from the userspace application side.

Application that uses the DRM driver: QtQuick clocks demo on Zedboard

To create any Qt project in QtCreator, you need a "kit" (Tools --> Options --> Build & Run --> Kits tab), so I created "Zed" kit, as shown below:
Tip: fix any errors flagged by the QtCreator in this window. 

QtCreator comes with many examples, and Qt team's energy is going to the Qt Quick, so I chose the Clocks demo from the choices.

When creating a project in QtCreator, you have to choose the kits for the project.  I must not forget the "Zed" kit I just created above before clicking "Configure project".

Controlling a Qt eglfs_kms "platform" behavior

Supported command line arguments for any Qt eglfs app:
  • -platformpluginpath
  • -platform
  • -platformtheme
  • -qwindowgeometry
  • -qwindowtitle
  • -qwindowicon
Qt eglfs supports the following environment variable:
  • QT_QPA_EGLFS_INTEGRATION: eglfs_ms
  • QT_QPA_EGLFS_DEBUG: 1
  • QT_QPA_EGLFS_CURSOR: ??
  • QT_QPA_EGLFS_KMS_CONFIG
And just to ensure that X is out of the picture, I unset the DISPLAY environment (":0.0" by default) in the QtCreator's project run configuration window (Projects --> Build & Run --> <your kit> --> Run), as you can see below:

Supposedly, the KMS/DRM backend also supports custom configurations via a JSON file. Set the environment variable QT_QPA_EGLFS_KMS_CONFIG to the name of the file to enable this. The file can also be embedded in the application via the Qt resource system. An example configuration (TODO: try this out):

  {
    "device": "/dev/dri/card1",
    "hwcursor": false,
    "pbuffers": true,
    "outputs": [
      {
        "name": "VGA1",
        "mode": "off"
      },
      {
        "name": "HDMI1",
        "mode": "1024x768"
      }
    ]
  }

When I hit the debug key (F5), launching the project on the target worked, and I see the 7 clocks (New York, London, etc.).  The clocks definitely do not keep up, even though the CPU is pegged.  Whittling down the clock list to just 1 is still not enough to make the clock rendering real-time.  Perhaps because the renering is falling behind, I get the following error messages:

QEGLPlatformContext: Failed to make temporary surface current, format not updated
Could not set cursor: -6
Failed to move cursor: -14
Could not queue DRM page flip! (Invalid argument)
Could not queue DRM page flip! (Invalid argument)
...

The initial "temporary surface current" failure is happens at the very first render as the OpenGL context is created, in qeglplatformcontext.cpp, as you can see in this stack trace:

0 QEGLPlatformContext::updateFormatFromGL qeglplatformcontext.cpp 327 0xb5890d38
1 QOpenGLContext::create qopenglcontext.cpp 597 0xb68cbab8
2 QSGThreadedRenderLoop::handleExposure qsgthreadedrenderloop.cpp 912 0xb6dca064
3 QSGThreadedRenderLoop::exposureChanged qsgthreadedrenderloop.cpp 854 0xb6dca4f0
4 QQuickWindow::exposeEvent qquickwindow.cpp 207 0xb6df4a98
5 QWindow::event qwindow.cpp 2028 0xb68944e8
6 QQuickWindow::event qquickwindow.cpp 1414 0xb6e01afc
7 QCoreApplicationPrivate::notify_helper qcoreapplication.cpp 1093 0xb5f9f1d0
8 QCoreApplication::notify qcoreapplication.cpp 1038 0xb5f9f254
9 QCoreApplication::notifyInternal qcoreapplication.cpp 965 0xb5f9eedc
10 sendSpontaneousEvent qcoreapplication.h 227 0xb688b78c
11 QGuiApplicationPrivate::processExposeEvent qguiapplication.cpp 2643 0xb688b78c
12 QGuiApplicationPrivate::processWindowSystemEvent qguiapplication.cpp 1644 0xb688c850
13 QWindowSystemInterface::sendWindowSystemEvents qwindowsysteminterface.cpp 608 0xb686ee48
14 QWindowSystemInterface::flushWindowSystemEvents qwindowsysteminterface.cpp 592 0xb68713c0
15 QEglFSWindow::setVisible qeglfswindow.cpp 185 0xb5887780
16 QWindow::setVisible qwindow.cpp 516 0xb6892940
17 QWindow::showFullScreen qwindow.cpp 1832 0xb6893554
18 main main.cpp 41 0x94a4

I don't know how serious the warning ("Failed to make temporary surface current") is, but this helped me set a break-point in the render path, which of course must happen before the ADV7511 can render on the screen.  This stack trace indirectly demonstrates the difficulty of understanding/debugging modern multi-threaded applications: threads loosely cooperate through events--such as the "expose" event in this example.  When I step through the debugger, it appears that no slots are connected to this signal YET.  The QOpenGLContext (along with QQuickAnimatorController and QSGRenderContext) just created is "moved" to the QSGRenderThread (Qt Quick scene graph render thread)--that is, the OpenGL context is now owned by the Qt Quick scene graph render thread--which is started in this path.  The QSGRenderThread loops in the following loop while active:

  1. if window: syncAndRender()
  2. processEvents()
  3. if not pendingUpdate or not window: processEventsAndWaitForMore()

The last "wait" is what prevents the render thread from spinning.  So at a high level, the QSG render thread will syncAndRender() when an update is pending.  I dug out the following brief description of the OpenGL context usage from the QOpenGLContext class introduction:
A context can be made current against a given surface by calling makeCurrent(). When OpenGL rendering is done, call swapBuffers() to swap the front and back buffers of the surface, so that the newly rendered content becomes visible. To be able to support certain platforms, QOpenGLContext requires that you call makeCurrent() again before starting rendering a new frame, after calling swapBuffers().
During sync, a window is rendered recursively, as you can see in the stack trace:

0 vbo_validated_drawrangeelements vbo_exec_array.c 947 0xb541761c
1 vbo_exec_DrawElements vbo_exec_array.c 1128 0xb541761c
2 glDrawElements glapi_mapi_tmp.h 1635 0xb5d071ac
3 glDrawElements qopenglfunctions.h 724 0xb6d72878
4 QSGBatchRenderer::Renderer::renderMergedBatch qsgbatchrenderer.cpp 2296 0xb6d72878
5 QSGBatchRenderer::Renderer::renderBatches qsgbatchrenderer.cpp 2486 0xb6d7309c
6 QSGBatchRenderer::Renderer::render qsgbatchrenderer.cpp 2674 0xb6d798c0
7 QSGRenderer::renderScene qsgrenderer.cpp 208 0xb6d86d3c
8 QSGRenderer::renderScene qsgrenderer.cpp 168 0xb6d87524
9 QSGRenderContext::renderNextFrame qsgcontext.cpp 558 0xb6d9c104
10 QQuickWindowPrivate::renderSceneGraph qquickwindow.cpp 383 0xb6df7e80
11 QSGRenderThread::syncAndRender qsgthreadedrenderloop.cpp 593 0xb6dc3f0c
...

In my case, I am using the SW renderer but glDrawElements() very is near where the rubber meets the road (GPU).  During the leaf item render, rendering operations (translate, scale, rotate, clip) is performed on the item, and the QSGBatchRenderer handles the opaque and alpha items appropriately in SW, as you can see below:

        if (m_opaqueBatches.size())
            std::sort(&m_opaqueBatches.first(), &m_opaqueBatches.last() + 1, qsg_sort_batch_decreasing_order);

        // Sort alpha batches back to front so that they render correctly.
        if (m_alphaBatches.size())
            std::sort(&m_alphaBatches.first(), &m_alphaBatches.last() + 1, qsg_sort_batch_increasing_order);

I don't know why the cursor is treated differently than any other item, but it is the last item to be painted before swap buffer.

    if (surface->surface()->surfaceClass() == QSurface::Window) {
        QPlatformWindow *window = static_cast<QPlatformWindow *>(surface);
        if (QEGLPlatformCursor *cursor = qobject_cast<QEGLPlatformCursor *>(window->screen()->cursor()))
            cursor->paintOnScreen();
    }

    qt_egl_device_integration()->waitForVSync(surface);
    QEGLPlatformContext::swapBuffers(surface);
    qt_egl_device_integration()->presentBuffer(surface);

The cursor is treated specially because a HW may have a dedicated "plane" for cursor (vs. what is under the cursor).  It even gets its own ioctl number: DRM_IOCTL_MODE_CURSOR (0xC01C64A3), which fails with error code -6 (ENXIO).  I will do justice to the axi_hdmi_tx driver in the next section, but for the impatient. the DRM ioctls are defined in the <kernel>/include/uapi/drm/drm.h and implemented in <kernel>/dgpu/drivers/drm/drm_ioctl.c, as shown in the following code snippet:

static const struct drm_ioctl_desc drm_ioctls[] = {
...
DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETCRTC, drm_mode_setcrtc, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETPLANE, drm_mode_getplane, DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_SETPLANE, drm_mode_setplane, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CURSOR, drm_mode_cursor_ioctl, DRM_MASTER|DRM_CONTROL_ALLOW|DRM_UNLOCKED),
...

The 2nd argument into the DRM_IOCTL_DEF is the function that implements the ioctl.  So I checked drm_mode_cursor_ioctl(), which called drm_mode_cursor_common(), which revealed why ENXIO was returned: the driver (<>/drivers/gpu/drm/adi_axi_hdmi) does NOT have a separate cursor plane, and it does NOT provide the cursor related functions, as you can see in axi_hdmi_crtc.c:

static struct drm_crtc_helper_funcs axi_hdmi_crtc_helper_funcs = {
 .dpms = axi_hdmi_crtc_dpms,
 .prepare = axi_hdmi_crtc_prepare,
 .commit = axi_hdmi_crtc_commit,
 .mode_fixup = axi_hdmi_crtc_mode_fixup,
 .mode_set = axi_hdmi_crtc_mode_set,
 .mode_set_base = axi_hdmi_crtc_mode_set_base,
 .load_lut = axi_hdmi_crtc_load_lut,
};

Since I do NOT want to show the cursor anyway, I ignore the cursor related warning messages and move on.  During the front and back buffer swap, the rendered image is blitted to the back buffer.  In the SW rasterization (my case), the entire screen worth of pixel * 4 bytes/pixel are memcpy'ed as you can see in the following snipptet in the mesa's platform_drm.c:

   for (i = 0; i < height; i++) {
      memcpy(bo->map + (x + i) * internal_stride + y,
             data + i * stride, stride);
   }

After the frame buffer to be shown is finally ready, ioctl(DRM_IOCTL_MODE_SETCRTC) is finally called from QEglFSKmsScreen::flip():

    if (!m_output.mode_set) {
        int ret = drmModeSetCrtc(m_device->fd(),
                                 m_output.crtc_id,
                                 fb->fb,
                                 0, 0,
                                 &m_output.connector_id, 1,
                                 &m_output.modes[m_output.mode]);

        if (ret)
            qErrnoWarning("Could not set DRM mode!");
        else
            m_output.mode_set = true;
    }

    int ret = drmModePageFlip(m_device->fd(),
                              m_output.crtc_id,
                              fb->fb,
                              DRM_MODE_PAGE_FLIP_EVENT,
                              this);
    if (ret) {
        qErrnoWarning("Could not queue DRM page flip!");
        gbm_surface_release_buffer(m_gbm_surface, m_gbm_bo_next);
        m_gbm_bo_next = Q_NULLPTR;

    }

As seen in the QtCreator debug view, the rendering seems to be triggered by DRM_IOCTL_MODE_SETCRTC.  Now that we have identified at least 1 DRM ioctl that seems to work, let's dive into the driver.

axi_hdmi_tx DRM driver in action

The best tool I've found so far to understand the kernel and (statically linked) driver code is with a JTAG debugger.  Following the procedure in this blog entry, I start setting hardware breakpoints in the driver.  The following is what I've been able to surmise this way--and by browsing the driver code.

According to the ADI's axi_hdmi_tx device driver "documentation" (if you call a 1 pager a documentation), "the driver [axi_hdmi_drm] is implemented as a DRM KMS driver"--which means very little to anyone except the few people in the world who actually write DRM device drivers.  But this picture on the Xilinx DRM KMS driver wiki helps.
The ADV7511 is an HDMI transmitter, and not a GPU, so the ADI's ADV7511 driver is at most the encoder and connector in the above block diagram.  As seen in the DTS entry in the previous section, the axi_hdmi_tx DRM driver (<kernel>/drivers/gpu/drm/adi_axi-hdmi/axi_hdmi_drv.c) is the master of the adv7511 encoder driver hanging off the axi-iic device driver (its sibling is the adau1761 device driver), with the following DTS property:

adv7511: adv7511@39 {
compatible = "adi,adv7511";
reg = <0x39>;

adi,input-style = <0x02>;
adi,input-id = <0x01>;
adi,input-color-depth = <0x3>;
adi,sync-pulse = <0x03>;
adi,bit-justification = <0x01>;
adi,up-conversion = <0x00>;
adi,timing-generation-sequence = <0x00>;
adi,vsync-polarity = <0x02>;
adi,hsync-polarity = <0x02>;
adi,tdms-clock-inversion;
adi,clock-delay = <0x03>;
};

These ADI properties remain largely a mystery even when matched against the counterparts in <kernel>/drivers/gpu/i2c/adv7511.h, as in the code snippet below:

/**
 * enum adv7511_input_bit_justifiction - Selects the input format bit justifiction
 * ADV7511_INPUT_BIT_JUSTIFICATION_EVENLY: Input bits are evenly distributed
 * ADV7511_INPUT_BIT_JUSTIFICATION_RIGHT: Input bit signals have right justification
 * ADV7511_INPUT_BIT_JUSTIFICATION_LEFT: Input bit signals have left justification
 **/
enum adv7511_input_bit_justifiction {
ADV7511_INPUT_BIT_JUSTIFICATION_EVENLY = 0,
ADV7511_INPUT_BIT_JUSTIFICATION_RIGHT = 1,
ADV7511_INPUT_BIT_JUSTIFICATION_LEFT = 2,
};

/**
 * enum adv7511_input_color_depth - Selects the input format color depth
 * @ADV7511_INPUT_COLOR_DEPTH_8BIT: Input format color depth is 8 bits per channel
 * @ADV7511_INPUT_COLOR_DEPTH_10BIT: Input format color dpeth is 10 bits per channel
 * @ADV7511_INPUT_COLOR_DEPTH_12BIT: Input format color depth is 12 bits per channel
 **/
enum adv7511_input_color_depth {
ADV7511_INPUT_COLOR_DEPTH_8BIT = 3,
ADV7511_INPUT_COLOR_DEPTH_10BIT = 1,
ADV7511_INPUT_COLOR_DEPTH_12BIT = 2,
};

When the curiosity gets strong enough, I download the chip datasheet and start reading.  But since this is a very complex chip (its footprint is almost as large as the Zynq processor itself!), sufficient understanding my not be possible without actually bringing it up myself.  For the purpose of understanding the Linux display stack, I think just following the currently working code path is sufficient.  Many of the parameters in the DTS will eventually be written to the ADV7511 over I2C through the regmap API, as in this example in adv7511_set_link_config():

regmap_update_bits(adv7511->regmap, ADV7511_REG_VIDEO_INPUT_CFG1, 0x7e,
  (config->input_color_depth << 4) |
  (config->input_style << 2));

There are combined 70 such references to adv7511->regmap in adv7511_audio.c and adv7511_core.c.  This regmap is an instance of the i2c regmap.  Take a look at the Linux's rich support for the all different ways to interface to an external IC:

henry@w540:~/band/adi_kernel/drivers/base/regmap$ ls *.c
regcache.c       regcache-rbtree.c  regmap-debugfs.c  regmap-mmio.c
regcache-flat.c  regmap-ac97.c      regmap-i2c.c      regmap-spi.c
regcache-lzo.c   regmap.c           regmap-irq.c      regmap-spmi.c

Scanning the functions that use this regmap is a quick and dirty way to glean where the driver is interacting with the ADV7511 chip over I2C:

  • adv7511_set_colormap
  • adv7511_set_config
  • adv7511_set_link_cconfig
  • adv7511_packet_enable/disable
  • adv7511_hpd
  • adv7511_is_interrupt_pending
  • adv7511_get_edid_block (EDID: extended display identification data)
  • adv7511_get_modes (drm_mode.h defines DPMS flags that are bit compatible with Xorg definition: ON, STANDBY, SUSPEND, OFF)
  • adv7511_encoder_dpms
  • adv7511_encoder_detect
  • adv7511_encoder_mode_set
  • adv7511_probe

As mentioned in the lead-up to the HW overview, the "video" axi-vdma driver is a slave of the axi_hdmi_tx driver.  During the probe, a handle to that vdma driver is obtained with this code:

private->dma = dma_request_slave_channel(&pdev->dev, "video");

The master driver (axi_hdmi_tx) does not use this DMA directly, but delegates the DMA handle to the CRTC (see the vaguely explained device driver block diagram from ADI) in axi_hdmi_load() --> axi_hdmi_crtc_create(), which uses it in crtc_prepare() and crtc_update().  prepare() just terminates the current DMA, so all the interesting action is in update(), which is called from some of the crtc methods listed when discussing the (lack of) cursor support in this device driver:

  • crtc_commit()
  • crtc_dpms()
  • crtc_mode_set_base()

Apparently, the Xilinx VDMA needs a few config parameters before it can actually start work.  So the driver commands the VDMA IP twice to commence DMA transfer from memory to the axi_hdmi_tx IP.  The "error-free" code path is:

obj = drm_fb_cma_get_gem_obj(fb, 0);

axi_hdmi_crtc->dma_config.hsize = mode->hdisplay * fb->bits_per_pixel / 8;
axi_hdmi_crtc->dma_config.vsize = mode->vdisplay;
axi_hdmi_crtc->dma_config.stride = fb->pitches[0];

dmaengine_device_control(axi_hdmi_crtc->dma, DMA_SLAVE_CONFIG,
(unsigned long)&axi_hdmi_crtc->dma_config);

offset = crtc->x * fb->bits_per_pixel / 8 + crtc->y * fb->pitches[0];

desc = dmaengine_prep_slave_single(axi_hdmi_crtc->dma,
obj->paddr + offset,
mode->vdisplay * fb->pitches[0],
DMA_MEM_TO_DEV, 0);

dmaengine_submit(desc);
dma_async_issue_pending(axi_hdmi_crtc->dma);

Note the extensive use of existing kernel support for DMA.

This DRM driver does not support page flip

The repeated page flip error seems to be caused by the device driver (axi_hdmi_drm) ioctl(DRM_IOCTL_MODE_PAGE_FLIP = 0xC01864B0, flags = DRM_MODE_PAGE_FLIP_EVENT) failing with return code -22.
 (EINVAL) during OpenGL context swap buffer.  The only documentation I found so far on page flip is in the <kernel>/include/uapi/drm/drm_mode.h:
 * This ioctl will ask KMS to schedule a page flip for the specified
 * crtc.  Once any pending rendering targeting the specified fb (as of
 * ioctl time) has completed, the crtc will be reprogrammed to display
 * that fb after the next vertical refresh.  The ioctl returns
 * immediately, but subsequent rendering to the current fb will block
 * in the execbuffer ioctl until the page flip happens.  If a page
 * flip is already pending as the ioctl is called, EBUSY will be
 * returned.
 *
 * Flag DRM_MODE_PAGE_FLIP_EVENT requests that drm sends back a vblank
 * event (see drm.h: struct drm_event_vblank) when the page flip is
 * done.  The user_data field passed in with this ioctl will be
 * returned as the user_data field in the vblank event struct.
 *
 * Flag DRM_MODE_PAGE_FLIP_ASYNC requests that the flip happen
 * 'as soon as possible', meaning that it not delay waiting for vblank.
 * This may cause tearing on the screen.
The kernel drm wrapper code is set to return -EINVAL if the driver does NOT supply page_flip() method--as is the case here.  So perhaps one can live with this, and find a way to tell Qt eglfs_kms platform to NOT request page flip.

Accessing the DRM driver from userspace

The drm driver exposes a DRI interface file through the DRM infrastructure (somehow), as you can see below:
When loaded, a card-specific drm helper module calls into the drm module to register itself as a “drm driver”, and provides a set of function-pointers that the drm core module may invoke. The “drm core” then creates a file /dev/dri/card{n} on which IOCTLs can be made to talk to the driver.  [Through this file, “event” such as “vblank” or “flip complete” can be read.]  ...DRIVER_MODESET indicates that it supports kernel modesetting (KMS).
 userspace code can perform mode-setting (but not generate graphics) through the controlD{n} file.
The driver declares the supported features like this:

static struct drm_driver axi_hdmi_driver = {
.driver_features = DRIVER_MODESET | DRIVER_GEM,

If the flag includes DRIVER_RENDER, /dev/dri/renderD64 would have been created (<>/drivers/gpu/drm/drm_drv.c).

Conclusion

Without a GPU, the axi_hdmi driver could not render 1920x1080 pixels (~2M pixel) even at 1 Hz, to show the second hand moving; the SW rasterization took too much time.  This hints at the crushing amount of floating point calculations necessary for a modern 3D rendered GUI, because the same HW and driver had no problem refreshing a 2D GUI with Qt widgets API in previous projects.  It appeared that the CPU could update at roughly 1/3 Hz rate.  If I use a smaller screen (say 128x96 ~ 12K pixels), I should be able to update the screen at nearly 60 Hz using exactly the same HW and drivers MINUS the ADV7511 specific portion.  Of course, I will wind up pegging the CPU all the time, which is a huge point against SW rasterization.

Multi-monitor Buildroot x64 target

I am working on an application SW that displays to multiple monitors (somewhere between 2 to 4).  Eventually, I want to drive the multi-monitor display from an SoC, but working out the SW architecture with a multi-monitor GPU plugged into a PCIe slot of a modern PC is an excellent way to understand and derisk problems.  Before diving into a multi-monitor GPU, I can experiment with software supported multi-monitor setup in virtualbox.

NFS booting the virtual multi-monitor x64 target from a virtual Ubuntu server

In a previous blog entry, I PXE-booted a Buildroot x64 target (Dell Optiplex 755) from an Ubuntu server.  If I run both the target and the server as Virtualbox guests, I can use the Virtualbox internal network, which is completely a software network stack.  This is convenient when studying the kernel and software on a laptop, while away from the desktop server.

Setup virtualbox target

Created a virtualbox x64 guest with these settings:
  • General, Basic
    • Name: Target
    • Type: Linux
    • Version: Other Linux (64 bit)
  • System
    • Base memory: 512 MB
    • Boot order: Network ONLY
  • Display
    • Video memory: 64 MB
    • Monitor count: 4
    • Enable 3D acceleration
  • No storage, no audio, no serial port, no USB, no shared folder
  • Network:
    • 1st Adapter: NAT
    • 2nd adapter: internal network "intnet", which should MATCH the name of the 2nd adapter's internal network for the server.

Cross-compiling the target on the virtual Ubuntu server

WARNING: Buildroot must be extracted and then built on a filesystem that supports softlinks (so don't do this on an NTFS!).

After getting the latest stable Buildroot as shown in a previous blog entry, I configure Buildroot with the following options.
  • Target options: x86_64, corei2 (core7 selects SSE4 features, which Virtualbox does NOT support yet)
  • Build options:
    • enable ccache, but change the cache location to within the Buildroot directory (to avoid saving to the virtual HDD, and keep all work in the Vbox shared folder): $(TOPDIR)/.buildroot-ccache
    • Optimization level 3
  • Toolchain
    • glibc
    • Enable C++ support, necessary for Qt5
    • Build cross gdb for the host: does NOT build on the vbox server for some reason.  Besides, the native gdb should just work
    • Register toolchain within Eclipse Buildroot plug-in
  • System configuration
    • /dev management: eudev
  • Kernel
    • Using a custom--as supposed to in-tree--(def)config file: I need to add a couple of options for the NFS rootfs. I changed x86_64_defconfig from the kernel.org to create /home/henry/x64/BR2/kernel.config
  • Target packages
    • Debugging
      • gdb: only the gdbserver
    • Graphics
      • mesa3d
        • Gallium swrast (software OpenGL) driver
        • OpenGL EGL
        • OpenGL ES
      • Qt5
        • Approve free license
        • Compile and install examples
        • gui module
          • widgets
          • OpenGL: OpenGL ES 2.0 and opengl module
          • linuxfb support
          • eglfs support
          • Default platform: linuxfb
          • GIF, JPEG, PNG support
    • Hardware handling
      • lshw (does NOT even build!)
      • pciutils (lspci)
  • Networking applications
    • openssh: necessary to connect to the target from gdb on the server

Kernel config for NFS rootfs

The x86_64_defconfig alsready has a few options I needed, so I just added the following:

CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_USE_KERNEL_DNS=y
CONFIG_E1000E=y

The stock x86_64_defconfig has CONFIG_E1000 but does NOT contain CONFIG_E1000E, so this is just a safety measure.

Build and deploy the binaries

After Buildroot make runs at the top level, its output/images will contain bzImage.  Copy this to the tftp download on the Virtualbox host.  On Windows, this is in C:\Users\<your uid>\.VirtualBox\TFTP folder, as shown here:

The rootfs images need to be extracted to the NFS export on the server:

~/o755/buildroot$ sudo tar -C /export/root/o755/ -xf ~/o755/buildroot/output/images/rootfs.tar

Setup the Virtualbox NFS server

Create a virtualbox x64 guest, with at least 2 CPUs, as much memory as possible, and the network with 2 adapters:
  • 1st adapter: NAT: for general Internet access (I am writing this blog on the host)
  • 2nd adapter: internal network: to communicate with the target.  Assign a static IP address 192.168.2.1 in Ubuntu Unity network settings, so that the target can refer to the NFS server with a static IP address (see below).
Virtualbox NAT network already has a PXE enabled DHCP server.  On Windows, the folder C:\Users\<your account>\.VirtualBox\TFTP plays the role of /var/lib/tftpboot (by default) for the Ubuntu hpa-tftp server.  So the following files should be put into that folder:
  • bzImage: compressed Linux kernel built by Buildroot, in its output/images (see above)
  • Target.pxe: I copied this file from Ubuntu desktop's /usr/lib/syslinux/pxelinux.0.  VirtualBox DHCP server has a rule for mapping the PXE image for each virtual box by its <name>.pxe.  Since my target's name is "Target", the PXE binary should be named Target.pxe.
  • menu.c32: This is the PXE menu program, copied verbatim from Ubuntu desktop's /usr/lib/syslinux/ folder.
  • pxelinux.cfg/ folder, which will contain the PXE menu entry
    • default: the catch-all menu.  PXE has lots of rules for matching the menu by the target's IP address, netmask, etc.  But default is the final fallback.  This file should point to the NFS rootfs the virtual server will host (see below).  My "default" file therefore looks like this:
DEFAULT menu.c32
PROMPT 0
MENU TITLE PXE Boot Menu
TIMEOUT 50 #This means 5 seconds
LABEL buildroot
 MENU LABEL buildroot kernel
 kernel o755Image
 append ip=192.168.2.3:192.168.2.1:192.168.2.1:255.255.255.0:o755:eth1:off root=/dev/nfs nfsroot=192.168.2.1:/export/root/o755 rw earlyprintk

The "ip" (used to be called nfsaddr) syntax is ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>

Setup the NFS server

The NFS server should get a static IP address.  I prefer to use the Ubuntu Unity tool, as you can see below.

Install the NFS server:

$ sudo apt-get install nfs-kernel-server

I configured the NFS server in /etc/exports, to serve any target in the "intnet":

/export         192.168.2.0/24(rw,fsid=0,insecure,no_subtree_check,async)

/export/root    192.168.2.0/24(rw,no_root_squash,no_subtree_check)


Remember to restart the NFS server after saving this file.  The root file system Buildroot generated should be expanded to the /export/root folder just mentioned, like this:

$ sudo mkdir /export/root/o755

$ sudo tar -C /export/root/o755/ -xf ~/o755/buildroot/output/images/rootfs.tar




nVidia Quadro multi-monitor video card

I bought an nVidia Quadro NVS 420 (Dell PN K722J)--I wanted an nVidia card because IMHO nVidia has the best Linux driver support--from eBay.  As you can see below, there was even a driver update earlier this year.
Now to the fine prints, from the driver README file: X is required, and glibc >= 2.0 is required.  The X server requirement is a deal-breaker for me: I want to keep the embedded distribution small.  Maybe a better alternative is to pick up an open source driver from the noveau project.  Quadro NVS 420 is supported under the NV50 Tesla family, code name NV98 (G98), as you can see on this page.

Buildroot config for Qt5, OpenCV, and nVidia Quadro 420 GPU

Since I am an embedded SW engineer, I treat even the PCs like targets (rather than desktops).  In a previous blog, I demonstrated an NFS booted Buildroot distribution for this Intel Core2 Duo Dell PC.  Except for updating Buildroot to the latest stable release (2015.02), I'll pick up from where I left off.

$ cd buildroot
$ git checkout 2015.02
$ git pull . 2015.02

The only difference in the Buildroot config between the virtual target and this real target is the Gallium nouveau driver (which supports all nVidia cards), under Target packages --> Graphics libraries and applications --> mesa3d.

To pick up the nouveau device driver, I added CONFIG_DRM_NOVEAU=y to the kernel defconfig file.

Also, I can connect over serial to the real target, by adding console=ttyS0,115200 to the kernel parameters in the pxelinux.cfg/default's "append" line.  During boot, the GPU is probed:

...
[    1.198462] nouveau  [  DEVICE][0000:03:00.0] BOOT0  : 0x298c00a2
[    1.204534] nouveau  [  DEVICE][0000:03:00.0] Chipset: G98 (NV98)
[    1.210605] nouveau  [  DEVICE][0000:03:00.0] Family : NV50
[    1.216169] nouveau  [   VBIOS][0000:03:00.0] checking PRAMIN for image...
[    1.233709] Console: switching to colour frame buffer device 160x64
[    1.245164] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    1.251332] i915 0000:00:02.0: registered panic notifier
[    1.307274] nouveau  [   VBIOS][0000:03:00.0] ... appears to be valid
[    1.313692] nouveau  [   VBIOS][0000:03:00.0] using image from PRAMIN
[    1.320231] nouveau  [   VBIOS][0000:03:00.0] BIT signature found
[    1.326303] nouveau  [   VBIOS][0000:03:00.0] version 62.98.6f.00.07
[    1.352876] nouveau 0000:03:00.0: irq 27 for MSI/MSI-X
[    1.352886] nouveau  [     PMC][0000:03:00.0] MSI interrupts enabled
[    1.359245] nouveau  [     PFB][0000:03:00.0] RAM type: GDDR3
[    1.364969] nouveau  [     PFB][0000:03:00.0] RAM size: 256 MiB
[    1.370868] nouveau  [     PFB][0000:03:00.0]    ZCOMP: 960 tags
[    1.378477] nouveau  [    VOLT][0000:03:00.0] GPU voltage: 1110000uv
[    1.915015] tsc: Refined TSC clocksource calibration: 2992.481 MHz
[    2.024020] nouveau  [  PTHERM][0000:03:00.0] FAN control: PWM
[    2.029844] nouveau  [  PTHERM][0000:03:00.0] fan management: automatic
[    2.036484] nouveau  [  PTHERM][0000:03:00.0] internal sensor: yes
[    2.062678] nouveau  [     CLK][0000:03:00.0] 03: core 169 MHz shader 358 MHz memory 100 MHz
[    2.071087] nouveau  [     CLK][0000:03:00.0] 0f: core 550 MHz shader 1400 MHz memory 700 MHz
[    2.079645] nouveau  [     CLK][0000:03:00.0] --: core 550 MHz shader 1400 MHz memory 702 MHz
[    2.088296] [TTM] Zone  kernel: Available graphics memory: 1001774 kiB
[    2.094802] [TTM] Initializing pool allocator
[    2.099147] [TTM] Initializing DMA pool allocator
[    2.103841] nouveau  [     DRM] VRAM: 256 MiB
[    2.108181] nouveau  [     DRM] GART: 1048576 MiB
[    2.112869] nouveau  [     DRM] TMDS table version 2.0
[    2.117987] nouveau  [     DRM] DCB version 4.0
[    2.122500] nouveau  [     DRM] DCB outp 00: 02000386 0f220010
[    2.128312] nouveau  [     DRM] DCB outp 01: 02000302 00020010
[    2.134124] nouveau  [     DRM] DCB outp 02: 040113a6 0f220010
[    2.139935] nouveau  [     DRM] DCB outp 03: 04011312 00020010
[    2.145747] nouveau  [     DRM] DCB conn 00: 00005046
[    2.150791] nouveau  [     DRM] DCB conn 01: 00006146
[    2.182206] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    2.188798] [drm] Driver supports precise vblank timestamp query.
[    2.247926] nouveau  [     DRM] MM: using M2MF for buffer copies
[    2.287545] nouveau 0000:03:00.0: No connectors reported connected with modes
[    2.294654] [drm] Cannot find any crtc or sizes - going 1024x768
[    2.302510] nouveau  [     DRM] allocated 1024x768 fb: 0x60000, bo ffff88007a126c00
[    2.310181] fbcon: nouveaufb (fb1) is primary device
[    2.310182] fbcon: Remapping primary device, fb1, to tty 1-63
[    2.453168] nouveau 0000:03:00.0: fb1: nouveaufb frame buffer device
[    2.459500] [drm] Initialized nouveau 1.2.1 20120801 for 0000:03:00.0 on minor 1

The GPU shows up as another framebuffer device /dev/fb1 (/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:00.0/0000:03:00.0/graphics/fb1).  The multiple folder level corresponds to the PCI switches built into the card (apparently), as can be seen from the lspci output:

01:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
02:00.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
02:02.0 PCI bridge: NVIDIA Corporation NF200 PCIe 2.0 switch for Quadro Plex S4 / Tesla S870 / Tesla S1070 / Tesla S2050 (rev a3)
03:00.0 VGA compatible controller: NVIDIA Corporation G98 [Quadro NVS 420] (rev a1)

I can see Qt GUI drawing to fb1 by running Qt examples like these:

/usr/lib/qt/examples/opengl/2dpainting/2dpainting -platform linuxfb:fb=/dev/fb1

Of course, OpenGL display doesn't work against the linuxfb platform.  But there seems to be a problem with the mesa3d nouveau platform I compiled, because gbm cycles through a few Gallium drivers EXCEPT the one I actually built: nouveau_dri, which is even in the search path (/usr/lib/dri).

# ./application -platform eglfs
gbm: failed to open any driver (search paths /usr/lib/dri)
gbm: Last dlopen error: /usr/lib/dri/i915_dri.so: cannot open shared object file: No such file or directory
...
Could not initialize egl display

I realized that eglfs platform is querying the framebuffer property from /dev/fb0 even though I set the environment variable QT_QPA_EGLFS_FB to /dev/fb1.  So I removed the Intel GPU from the picture by commenting out the Intel GPU drivers from the kernel config.  And now I see this message:

# ./openglwindow -platform eglfs
Unable to query physical screen size, defaulting to 100 dpi.
To override, set QT_QPA_EGLFS_PHYSICAL_WIDTH and QT_QPA_EGLFS_PHYSICAL_HEIGHT (in millimeters).
EGL Error : Could not create the egl surface: error = 0x300b
Aborted

The error (EGL_BAD_NATIVE_WINDOW) is logged in src/plugins/platforms/eglfs/qeglfswindow.cpp, QEglFSWindow::resetSurface(), but thrown in eglCreateWindowSurface:

    if (!rx::IsValidEGLNativeWindowType(win))
    {
        recordError(egl::Error(EGL_BAD_NATIVE_WINDOW));
        return EGL_NO_SURFACE;
    }

This begs the question: what is a Qt native window?  I decided that I still do NOT know the low level graphics SW stack, and just stick to either linuxfb or directfb.

directfb: the lowest level display abstraction in userspace

DirectFB (Direct Frame Buffer) is a software library with a small memory footprint that provides graphics acceleration, input device handling and abstraction layer, and integrated windowing system with support for translucent windows and multiple display layers on top of the Linux framebuffer without requiring any kernel modifications.  DirectFB allows applications to talk directly to video hardware through a direct API, speeding up and simplifying graphic operations.

But directfb does not support Quadro (noveau) drivers?  See directfb/gfxdrivers/nvidia/nvidia.c  Platform independent examples src is in buildroot/output/build/directfb-examples-1.6.0/srcles

Each GPU detected by DRM is referred as a DRM device, and a device file /dev/dri/cardX (where X is a sequential number) is created to interface with it, as in this example for the NVS420 card in a PC:

# ls /dev/dri
card0       controlD64  renderD128

Note that on Zedboard which lacks a GPU, Lars Clausen (of ADI)'s adv7511 DRM driver does not offer the renderD128 file:

# ls /dev/dri
card0       controlD64

User space programs that want to talk to the GPU must open the file and use ioctl calls to communicate with DRM. Different ioctls correspond to different functions of the DRM API.  A library called libdrm was created to facilitate the interface of user space programs with the DRM subsystem, as shown here:

This library is merely a wrapper that provides a function written in C for every ioctl of the DRM API, as well as constants, structures and other helper elements.  DRM consists of two parts: a generic "DRM core" and a specific one ("DRM Driver") for each type of supported hardware.  DRM driver, on the other hand, implements the hardware-dependent part of the API, specific to the type of GPU it supports; it should provide the implementation to the remainder ioctls not covered by DRM core, but it may also extend the API offering additional ioctls with extra functionality, for which extra userspace library is offered.  For the nVidia card, we therefore see:

# ls /usr/lib/libdrm*
/usr/lib/libdrm.so.2.4.0   /usr/lib/libdrm_nouveau.so.2.0.0

But strangely, Zedboard has enumerated a non-existent card (Adreno)?

# ls -Lh /usr/lib/libdrm*
/usr/lib/libdrm.so.2.4.0   /usr/lib/libdrm_freedreno.so.1.0.0

GEM (graphics executaion manager) manages graphics buffers.  Through GEM, a user space program can create, handle and destroy memory objects living in the GPU's video memory.  Confusingly, there are mesa3d userspace drivers that use the kernel drivers, as you can see in the AMD example below:

As the demand for better graphics increased, hardware manufacturers created a way to decrease the amount of CPU time required to fill the framebuffer. This is commonly called "graphics accelerating".  Common graphics drawing commands (many of them geometric) are sent to the graphics accelerator in their raw form. The accelerator then rasterizes the results of the command to the framebuffer.

Debugging a full screen directfb application

The df_fire example did NOT run against the nouveau driver, so let's debug into it.  Following relevant build variables are in the Makefile:

CFLAGS = -Wall -O3 -pipe -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64  -pipe -O3  -Werror-implicit-function-declaration
CPPFLAGS = -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
DIRECTFB_CFLAGS = -D_REENTRANT -I/home/henry/work/o755/buildroot/output/host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/include/directfb  
DIRECTFB_LIBS = -ldirectfb -lfusion -L/home/henry/work/o755/buildroot/output/host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/lib -ldirect -lpthread  
AM_CFLAGS = -D_REENTRANT -I/home/henry/work/o755/buildroot/output/host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/include/directfb   -D_GNU_SOURCE
LIBADDS = \
        -ldirectfb -lfusion -L/home/henry/work/o755/buildroot/output/host/usr/x86_64-buildroot-linux-gnu/sysroot/usr/lib -ldirect -lpthread  

AM_CPPFLAGS = \
        -DDATADIR=\"${datarootdir}/directfb-examples\" \
        -DFONT=\"$(fontsdatadir)/decker.ttf\"

Trying the Quadro NVS 420 nVidia card (model G98) on Ubuntu desktop

I could not get Ubuntu to see the DLP 3010 evaluation module, so I checked whether the nVidia proprietary drivers could The list of driver updates can be queried:

henry@o755:~$ sudo ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:00.0/0000:03:00.0 ==
model    : G98 [Quadro NVS 420]
modalias : pci:v000010DEd000006F8sv000010DEsd0000057Ebc03sc00i00
vendor   : NVIDIA Corporation
driver   : nvidia-304-updates - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin
driver   : nvidia-331 - distro non-free recommended
driver   : nvidia-331-updates - distro non-free
driver   : nvidia-304 - distro non-free
driver   : nvidia-173 - distro non-free

An easy way to install the proprietary device drivers is through System Settings (unity-control-center) --> System --> Software & Updates --> Additional Drivers.  Even after installing the nVidia proprietary driver, the 2nd monitor would not enumerate; I had to run the nVidia Xserver settings tool, and explicitly detect the display for the Xserver settings to update.  I think this means that nouveau driver cannot drive NVS420 card in multi-monitor mode.
Indeed, the computer only sees 1 framebuffer device:

$ ls -lhg /sys/class/graphics/
total 0
lrwxrwxrwx 1 root 0 Apr 25 12:06 fb0 -> ../../devices/pci0000:00/0000:00:02.0/graphics/fb0
lrwxrwxrwx 1 root 0 Apr 25 11:50 fbcon -> ../../devices/virtual/graphics/fbcon

As this nouveau multi-monitor setup explanation shows, it is the X server that lays out the multi-monitor on the desktop.  Multiple monitors has to be organized as 1 logical screen, because an X application can only display to 1 screen (i.e. application cannot choose to run on multiple screens, nor change from 1 screen to another dynamically).