Oct 14, 2014

Ways to study the Linux kernel and driver source code

I picked up dorking around with Linux device driver after 10 years of working on either high level SW or low level FW (no OS).  What big change to Linux!  One of the welcome changes is the possibility to browse the kernel in an IDE.

Current status (so you don't have to read to the end)

  • Eclipse kernel source browsing works to > 80% of my expectation.
  • Haven't tried seriously to cross compile the kernel for ARM within Eclipse, because I use Buildroot.
  • Can browse statically compiled source and set software breakpoint within gdb to kgdb, but cannot set hardware breakpoint on Zedboard.
  • kgdbwait to delay kernel startup does NOT work on Zedboard, but DOES work on x86 target (see my other blog post just for the Dell Optiplex x86 target)
  • JTAG debugging from Xilinx SDK works for some functions, but the breakpoints I really want (like the Ethernet driver interrupt) does NOT hit.

Eclipse to browse the code: I prefer this to KDevelop

For me, Eclipse CDT worked much better than KDevelop, because I am already used to the Eclipse shortcuts.  I just followed this Eclipse documentation mostly.  One missing information is that the Buildroot plugin must first be installed into Eclipse, through menu --> Help --> Install New Software --> Add; the Buildroot Eclipse SDK for Eclipse Luna : http://buildroot.org/downloads/eclipse/luna.  I could not tell Eclipse how to cross compile using the CROSS_COMPILE environment variable (which for me is "arm-xilinx-linux-gnueabi-").

henry@Zotac64:~/work/zynq/kernel$ which ${CROSS_COMPILE}gcc
/opt/Xilinx/SDK/2014.2/gnu/arm/lin/bin/arm-xilinx-linux-gnueabi-gcc
What I WANT TO setup in Eclipse is:
  1. the location of my defconfig (the file specified to "make ARCH=arm <defconfig>"), and
  2. Let Eclipse build uImage with the make command
  3. [Not necessary] My DTS file, so that it will also build the DTB

kgdb to source debug device driver

Now that I am building not only the kernel, but the whole toolchain and the root file system in Buildroot, the ability to cross compile in Eclipse is not so necessary after all.  Instead, ability to break at a line might be invaluable.  This is one of the pleasant changes to find after 10 years: kgdb is now in mainline kernel (and even documented!).

Building kgdb into the kernel

Necessary config options:
  • CONFIG_KGDB=y
  • CONFIG_DEBUG_INFO=y -- I already had it on for ftrace, to turn on symbolic data
Optional but recommended (makes sense for me):
  • CONFIG_FRAME_POINTER=y -- save frame information, to allow gdb to more accurately construct stack trace
  • CONFIG_KGDB_SERIAL_CONSOLE=y -- kgdb over Ethernet is not in mainline, so stick with the tried and true.  This also allow kgdbwait and early debugging
  • # CONFIG_DEBUG_RODATA is not set -- This option marks certain regions of the kernel's memory space as RO.  I did NOT have this option to begin with, so I will leave this alone.  If my processor did NOT have HW breakpoint, I should turn this off.  Zynq has 5 breakpoints and 1 watchpoint registers--which I found kgdb cannot use for some reason.
  • # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set -- I have plenty of disk space but the processor is relatively slow.  Besides, I find it confusing if the compiler optimizes some code away when debugging.

Booting the kernel with kgdb bootarg

Since I statically built the kgdboc into the kernel, I don't have to modprobe kgdboc on kernel startup.  Therefore, I can use the same serial console option I use.  On Zedboard, I've been using ttyPS0

kgdboc=ttyPS0,115200

So at the moment, the full bootargs (much of it carried over from previous blog entry on NFS root file system) is set in U-Boot like this:

zynq-uboot> setenv bootargs 'console=ttyPS0 ip=192.168.1.9:192.168.1.2:192.168.1.1:255.255.255.0 root=/dev/nfs nfsroot=192.168.1.2:/export/root/zed rw earlyprintk kgdboc=ttyPS0'

zynq-uboot> saveenv

For me, the baud is optional, actually, because my FSBL opens the serial console even before Linux starts.  I checked that nothing is broken by booting the modified kernel with the above bootargs.  And because the target only has 1 serial console (ttyPS0), I cannot run the kgdbcon through the same device, according to the kgdb documentation on kgdbcon.

NOT booting the kernel with kgdbwait bootarg (wait for gdb)

This will be a PITA normally, but a life saver when debugging early boot problem.  Insert "kgdbwait" right after the kgdboc argument in the bootargs, like this:

zynq-uboot> setenv bootargs 'kgdboc=ttyPS0 kgdbwait kgdboc=ttyPS0 kgdbcon earlyprintk console=ttyPS0 ip=192.168.1.9:192.168.1.2:192.168.1.1:255.255.255.0 root=/dev/nfs nfsroot=192.168.1.2:/export/root/zed rw'

Unfortunately for me, this "wait for gdb" feature did NOT work.; the kernel startup just proceeds as if the kernel option is not there.  After bootup, the kgdboc sys file contains nothing, so I knew that the initial registration did not succeed.

$ cat /sys/module/kgdboc/parameters/kgdboc

The strange thing is that kgdb can use ttyPS0 just fine after startup (see below).

Need agent-proxy to use from gdb

To halt the remote kernel over serial console in gdb, agent-proxy is recommended:

$ git clone git://git.kernel.org/pub/scm/utils/kernel/kgdb/agent-proxy.git
$ cd agent-proxy; make

Agent-proxy starts 2 TCP servers on localhost.  The 2nd port is the debug port.

$ ./agent-proxy 2223^2222 0 /dev/ttyACM0,115200

Agent Proxy 1.96 Started with: 2223^2222 0 /dev/ttyACM0,115200

Agent Proxy running. pid: 17078



If we have not set the kgdboc option in kernel command line, we would have to do that on the target BEFORE we connect from gdb:


$ echo ttyPS0 > /sys/module/kgdboc/parameters/kgdboc

Leave this running.  We can even connect to the target's serial port by telnetting to the 1st port:


$ telnet localhost 2223

Need cross gdb on the host

I tell Buildroot to build gdb, by checking Toolchain --> Build cross gdb for the host.  It will be in BR2/output/host/usr/bin/arm-linux-gdb.  gdb cannot work with uImage, which is compressed.  The right file to feed to the ARM gdb is <BR2>/output/build/linux-custom/vmlinux:

$ output/host/usr/bin/arm-linux-gdb    output/build/linux-custom/vmlinux


We can then connect to the target over serial, and get debug:


(gdb) target remote localhost:2222
Remote debugging using localhost:2222
kgdb_breakpoint () at kernel/debug/debug_core.c:1050
1050 arch_kgdb_breakpoint();
(gdb) where
#0  kgdb_breakpoint () at kernel/debug/debug_core.c:1050
#1  0xc0088e60 in kgdb_initial_breakpoint () at kernel/debug/debug_core.c:949
...


While we are stopped, we can list functions that ARE actually in the kernel--which is a bit of a guesswork in Eclipse.  We can always stop the kernel with Ctrl-C in gdb, and then resume with the "continue" command:

(gdb) c
Continuing.

Alas, the target does NOT seem to support HW breakpoint.

(gdb) hb tty_find_polling_driver
Hardware assisted breakpoint 1 at 0xc0260280: file drivers/tty/tty_io.c, line 356.
(gdb) c
Continuing.
Warning:
Cannot insert hardware breakpoint 1.
Could not insert hardware breakpoints:
You may have requested too many hardware breakpoints/watchpoints.

Debugging kgdbwait with kgdb

Looking at configure_kgdboc(void), the possible problems are:
  • kgdboc option might be getting dropped, AFTER printing the kernel boot args (unlikely)
  • CONFIG_CONSOLE_POLL is turned off in the kernel: since kgdboc DOES work after startup, I reject this possibility
  • tty_find_polling_driver() for the ttyPS0
  • ttyPS0 might not have been registered as a console driver yet
Strangely, opt_kgdb_wait is not setup:

(gdb) p __setup_opt_kgdb_wait
$1 = {str = 0x0 <__vectors_start>, setup_func = 0x0 <__vectors_start>, early = 0}

(gdb) p __setup_kgdboc_early_init

$3 = {str = 0x0 <__vectors_start>, setup_func = 0x0 <__vectors_start>, early = 0}

(gdb) p __setup_str_opt_kgdb_wait

$4 = "available"

To figure out why the early param was not getting handled, I put in kgdb_breakpoint() in start_kernel, at the beginning of parse_early_param(), but that hangs the kernel.  This is an outstanding problem for me to solve...

gdb debug kernel over JTAG

Since kgdbwait is not working for me on the Zedboard, I gave JTAG route a shot.  So my kernel config contains:
  • CONFIG_KGDB=y
  • CONFIG_KGDB_SERIAL_CONSOLE=y
  • CONFIG_DEBUG_INFO=y -- I already had it on for ftrace, to turn on symbolic data
  • # CONFIG_DEBUG_RODATA is not set
  • CONFIG_FRAME_POINTER=y
  • # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
I also deleted the kgdb related bootargs.  Now fire up Xilinx SDK (see my xsdk on Linux recipe in firefox/Chrome), and switch to the Debug perspective (look at the XSDK window's upper right corner; the "+" icon lets you add new Eclipse perspectives like this):

In menu --> Run --> Debug Configurations, select Xilinx C/C++ application (System Debugger), like this:
It's a bit confusing, because System Debugger IS based on the TCF (Eclipse Target Communication Framework; see this explanation).  Click the "+" icon in the upper left corner to create a new debug configuration.  I named the configuration "Kernel", and It's probably a good idea to specify the symbol file and the source lookup paths in before while creating the debug configuration as well.

To load the kernel image to the debugger, so right click on Core #0 above --> Symbol Files --> Add --> browse to the vmlinux file (the uncompressed kernel image, which is sitting at the <kernel> directory after a successful build).
I am not sure what the benefit of specifying the kernel start address (see my earlier blog entry for why the kernel start address is 0x0300000) is--other than "I don't know how the debugger will figure out what address the set the breakpoint at without this clue"--but it doesn't seem to hurt.  I also checked "Instructions read", to force use of the HW breakpoints when possible.

On Linux, I don't have to add source lookup path, so I don't have to wonder about the difference between equally unexplained concepts: e.g. Compilation Path vs. Path Mapping.

Click "Debug"; this is where the SDK magic happens: it connects to the target's debug HW over JTAG over localhost:3121, and auto discovers the CPUs, as you can see below:
Because I set max_cpus=1 in my bootargs (I am working toward running Linux on Core #0 and bare metal, hard real-time state machines on Core #1), Linux only runs on Core #0 above.  When the debugger stops the program, I could see that the kernel was in the background idle processing, in start_kernel --> rest_init().  I can set breakpoints in the source such as the kernel main.c, or at a function call by clicking on the small inverted triangle on the upper right corner of the breakpoint tab.
Unfortunately, this breakpoint, or even a breakpoint in my Ethernet MAC driver ISR, is NOT hitting, rendering this effort rather futile.  In kgdb, at least the breakpoints I set AFTER the kernel starts are hitting reliably.

Another problem: I don't know how to restart the program from the XSDK (posted the question to Xilinx EDK forum).  Until I figure this out, the only way to debug the statically compiled __initcalls is to get a root console and type "reboot", which means that I can only debug a kernel that actually boots and gives me a console.

 KDevelop: did not work out well for me

Mostly a note to myself in case I have to setup KDevelop on another Ubuntu desktop.  Mostly based on http://www.gnurou.org/code/kdevelop-kernel.  Starting from a downloaded kernel (in this case the ADI Zynq kernel, located in a folder we will call <kernel>)

  1. apt-get package is kdevelop
  2. Turn off background parser: menu --> Settings --> Configure KDevelop --> Background Parser group --> uncheck.
  3. Import: menu --> Project --> Open/Import Project --> Browse INTO (don't just select the folder) <kernel> --> Next, then:
    1. I named the project adi_kernel
    2. Generic Project Manager
    3. Finish --> KDevelop goes to work for a few minutes, and then yields a project in the Project explorer window.
  4. Right click this new project --> Open Configuration, and "Add button" the folders to INCLUDE lists; this gets to be "trial-and-error" but I added only drivers on as-needed basis
    1. /include/*
    2. /arch/arm/*
    3. /lib/*
    4. /ipc/*
    5. /init/*
    6. /mm/*
    7. /kernel/*
    8. /drivers/amba/*, /drivers/i2c/*, /drivers/gpu/drm/*, /drivers/usb/*
  5. Enable the background parser.  It will start chewing through the source
  6. To add additional include paths, pull up any source and hover over unresolved header files at the top of the file --> Solve --> Add custom include path.  In the "Setup Custom Include Paths" window:
    1. Storage directory: <kernel>
    2. Specify the following relative paths:
      1. include
      2. arch/arm/include
      3. arch/arm/mach-versatile/include
TODO: consider building the kernel in the IDE

6 comments:

  1. Hello.

    I understand this is a three years old post so you may be not even interested :). Nevertheless, I came to this blog entry because I also had a problem with kgdboc/kgdbwait not working on zedboard. I think I have solved the problem, and I will put the solution here just in hope it will be interesting.

    The problem is that kgdboc does not see proper tty driver when it initializes.
    This is due to kgdboc init runs *before* xilinx_uart init. The init order seems to be defined by compilation/linkage order.

    Thus, just going to drivers/tty/serial/Makefile and moving kgdboc.o entry to the end of the list solves the problem :).

    ```
    diff --git a/drivers/tty/serial/Makefile b/drivers/tty/serial/Makefile
    index 1278d37..fba1aca 100644
    --- a/drivers/tty/serial/Makefile
    +++ b/drivers/tty/serial/Makefile
    @@ -68,7 +68,6 @@ obj-$(CONFIG_SERIAL_OMAP) += omap-serial.o
    obj-$(CONFIG_SERIAL_ALTERA_UART) += altera_uart.o
    obj-$(CONFIG_SERIAL_ST_ASC) += st-asc.o
    obj-$(CONFIG_SERIAL_TILEGX) += tilegx.o
    -obj-$(CONFIG_KGDB_SERIAL_CONSOLE) += kgdboc.o
    obj-$(CONFIG_SERIAL_QE) += ucc_uart.o
    obj-$(CONFIG_SERIAL_TIMBERDALE) += timbuart.o
    obj-$(CONFIG_SERIAL_GRLIB_GAISLER_APBUART) += apbuart.o
    @@ -93,6 +92,7 @@ obj-$(CONFIG_SERIAL_STM32) += stm32-usart.o
    obj-$(CONFIG_SERIAL_MVEBU_UART) += mvebu-uart.o
    obj-$(CONFIG_SERIAL_PIC32) += pic32_uart.o
    obj-$(CONFIG_SERIAL_MPS2_UART) += mps2-uart.o
    +obj-$(CONFIG_KGDB_SERIAL_CONSOLE) += kgdboc.o

    # GPIOLIB helpers for modem control lines
    obj-$(CONFIG_SERIAL_MCTRL_GPIO) += serial_mctrl_gpio.o
    ```

    ReplyDelete
    Replies
    1. Hi Dimitry, I just tried your workaround on Raspberry Pi 3, but the kernel is not stopping. How did you debug the problem in the first place and arrive at this conclusion (what tipped you off)?

      Delete
    2. I just compiled the kernel and ran it as an standalone embedded app with jtag attached. Risking to be boring, but before you run into debugger try to make sure you really recompiled the entire thing.

      Delete
    3. This is the part that I never dug as deeply as I should have 3 years ago: with pr_devel(), I figured out that the Raspberry Pi ttyAMA0 console gets registered long after configure_kgdboc() gets called. So compiling the console into the kernel (which I've already done) doesn't change anything. Now chasing down how to register the console before kgdboc is configured. Also found out that there is an "ekgdboc" kernel param for the early part of the kernel init. Digging into this also.

      Delete
    4. I spent all day yesterday trying to get the console to be created before kgdboc init runs (level 3). Even after moving the pl011 init from device (level 6) to level 3 to create the ttyAMA0 device, the console for ttyAMA0 would not get created until printk late init ran, at the very end of the init process. I found that kgdbwait with kgdboc or even ekgdboc works great on x86, but there are so many problems on RaspberryPi. Maybe Zynq is an odd ball out--in that kgdbwait worked for you? Although I learned a lot, I think I should just use a JTAG debugger (which I have a question out to SEGGER).

      Delete
  2. Thank you Dimitry, I am picking up another dual-system project (this time on Raspberry PI 3), so I'll put your tip to the test on it.

    ReplyDelete