Nov 16, 2014

Understanding the Linux serial device drivers on Xilinx Zynq Zedboard

Serial communication is used extensively to interface with a simple IC on a board.  In a simple uC (micro-controller) project, there are dedicated registers for serial communication.  But in a modern uP (micro-processor) running Linux, we have to jump through a lot of hoops to accomplish the same task.  I've been studying ARM Linux on Avnet Zedboard sporting Zynq processor, and I'll stay on that platform for this serial communication study as well.

Available HW

Zynq TRM (Xilinx document UG585 Zynq-7000 All Programmable SoC Technical Reference Manual) Appendix B lists all peripherals available ON the chip itself (we are NOT talking about HW that can be programmed in FPGA fabric surrounding the processor).  The sheer number of advanced HW peripherals addressable by direct register access will shock you; for example:
  • 2 CAN (Control Area Network--must have for automotive) on 0xE0008000 and 0xE0008000
  • Cortex A9 performance monitoring unit for each of the dual CPUs (0xF8891000 and 0xF8893000)
  • 2 GEM (gigabit ethernet) on 0xE000B000 and 0xE000C000
  • GPIO on 0xE000A000
  • 2 I2C on 0xE0004000 and 0xE0005000
  • LQSPI (quad-SPI flash controller) on 0xE000D000
  • 2 SD2.0/SDIO2.0/MMC3.31 on 0xE0100000 and 0xE0101000
  • 2 SPI on 0xE0006000 and 0xE0007000
  • 2 UART on 0xE0000000 and 0xE0001000
  • 2 USB on 0xE0002000 and 0xE0003000
These peripherals are mapped to the PS MIO in a configurable manner, as shown in this screenshot, where the green rectangles in the bottom row show currently available MIO pins.

The HW instances enabled in the Zedboard HDMI reference design (even the processor is configurable) are highlighted. Despite the wonderful serial peripheral addressable from the CPU, many of them are disabled by default in a typical Zynq HW design; mostly because the board does not expose the pins (could be a chicken and the egg problem).  For example, the reference Zedboard HW design makes the following choices:
o
  • The SD peripheral is routed to the SD card slot on the board.  I should try plugging in a {802.11 | bluetooth} SDIO card into this slot once I put the boot loader SW on QSPI (I showed how to do this in the hobby project link above), which is routed to a Spansion flash.
  • This HW design instead controls the I2S sound chip and I2C HDMI driver through custom FPGA logic.  To drive I2C signals out from the processor directly, GPIO pins left on Bank500 (Zedboard rev c schematic page 10) would have to be sacrified.
To recap, the reference HW design cannot drive a generic I2C or SPI device.  There may be additional constraints on the remaining MIO (there are rules about what peripherals can be connected to a given MIO pins) that prevent design that incorporates the on-chip I2C or SPI peripherals.  If I avoid touching the serial boot devices--which show up as block devices in Linux--for now, my choices for dorking around among above peripherals are the UART, USB, and the gigabit Ethernet.

SPI

A good thing about SPI is the simplicity: when commanding a device from the processor acting as the master, you just need a chip select (often), SCL (clock) and MOSI (master out, slave in) wires; reading back from the slave requires another wire (MISO).  A bad thing about SPI is its simplicity: there is no way for the master to realize that the wire is cut, and more modern protocols require just 2 (like I2C) or even 1 wire for directional communication and while allowing multiple slaves (or even multiple masters!).

For a concrete example, I salvaged a Honeywell pressure sensor HSCDAND030PGSA3 from e-Waste.
This sensor responds over SPI, so no need to worry about addressing or calibration, as the sensor does not even have a pin for data in, as you can see below.
The way to read out the pressure (and the optional temperature) data from the sensor is to pull down the chip select (SS), and shake the clock lines.  The master side SPI peripheral should latch the slave data at the rising edge of the clock, as you can see in the timing diagram below, copied from the Honeywell SPI communication document:
The master may stop the data reception at any point by pulling up nSS, but reading all 4 bytes yields the temperature data as well, as you can see in this screenshot of the same SPI communication document:

Modify the Zedboard HW design to drive SPI

When picking the port assignment for the SPI pins, I first have to find all available MIO for a given HW peripheral.  Scanning the green rectangles in the above map is an easy way to do that.  Among these, I then have to find the pins that are easily accessible on a given board.  According to the Zedboard rev c schematic page 10, all the PS MIO pins are on the Zynq IO bank 500 and 501 shown below:
As you can see, most of the MIO are already claimed for Ethernet, USB (OTG), SD, UART, and QSPI chips, and PB1 and PB2 are already routed to optional MIO buttons on the Zedboard.  This leaves only the MIO10/11 or MIO14/15 if I want to use I2C0 peripheral.  These 2 pairs of MIO pins are exposed through the MIO PMOD connector JE on Zedboard, which has a voltage regulating zener diode in parallel, and a 200 ohm protection resistor, shown below:
For small current, the voltage drop across the 200 ohm resistor should be negligible, and MIO can drive the SPI device connected to the connector JE above.

Normally, I would just use the first peripheral, but SPI0 MIO pins are already claimed, so that I have to map SPI1 on MIO 10...15 (pins JE2-MOSI, JE3-MISO, JE4-CLK, JE1-SS0, JE9-SS1, JE10-SS2 on the Zedboard rev c), as you can see in the screenshot of the Vivado Zynq7 PS peripheral I/O pin configuration screen (you can use the MIO configuration screen; please read my old Google doc entry on using Vivado to generate a Zedboard design capable of supporting a Buildroot Linux):
Note that Zynq lets you use up to 3 separate slave select, so that you can have up to 3 SPI devices driven by the same SPI peripheral (the IP is from Cadence).  Optionally, let's attach 2 sensors to see how a multi-sensor device driver might work.  (So I am going to do enable ss[1], but you don't have to, because these sensors are NOT cheap, at >$40/each!).  Changing this configuration "dirties" the system.bd in Vivado.  Normally, when you add new IP into a design, the I/O ports change, so the system wrapper should be regenerated (in Vivado --> Block Design --> Sources --> right click on the system.bd --> Create HDL wrapper).  BUT I did NOT change the HW; I merely reconfigured the PS (Zynq CPU) configuration, affecting the assignment of on-chip peripheral on the EXISTING MIO pins, which appear as "inout[53:0] FIXED_IO_mio" in the auto-generated system_wrapper.v.  Therefore, I can just continue to use the same wrapper HDL, and just regenerate the bitstream.

But there is known Zynq HW bug that prevents SS0 from being used, which requires SS[0] to be pulled up to VCC.  The simplest way to do that is just to short the SS[0] pin to the VCC, as I've done here with a blue wire to a connector plugged into the  PMOD connector:
To connect the Honeywell sensor, 

Boot this HW

As instructed in my old Google doc entry, the modified HW should be re-exported to the SW by : menu → File → Export → Export Hardware → check “Include bitstream” and keep the default export location (<local to Project>) → OK.  Export itself is quick.  Now we can open the SDK: menu → File → Launch SDK → no need to change the exported location and the workspace for this project → OK.  As expected, the address map for the exported HW (which is now called system_top_hw_platform_1, since there is already the previously exported HW platform) will now contain ps7_spi_1 at 0xE0007000 through 0xE0007FFF.

If you follow the standard Xilinx recipe, you should generate the BSP for the HW design, and FSBL, etc.  If you delete the now orphaned original HW design (the one without SPI) from the SDK workspace, the BSP (and its dependent, the FSBL project) will not build any more, so you are forced to regenerate the BSP.

A shortcut (living dangerously!) I can correct this by changing the BSP's project reference to the new HW design, as you can see in this snapshot of the BSP project properties window:
FSBL looks for the header ps7_init.c in the old HW folder, so I pointed to the new HW design in FSBL project properties --> Resource --> Linked Resources, and in C/C++ General --> Paths and Symbols.

Alternatively, since I already have a working FSBL and SSBL (U-Boot), and Linux does NOT use the BSP anyway, so cannot think of a reason why I should regenerate the FSBL--given that I will NOT address the SPI device except in Linux.  So I just take a chance and change only the bitstream in the BOOT.bin.

Regardless of how you manage the transition of the project references to the new HW, you should arrive at a point where you have at least the BSP and FSBL rebuilt.  From that point, to create a new BOOT.bin: right click on any project (I chose FSBL) --> Create Boot Image (almost all the way at the bottom of the context menu).  The default output BIF file path is within the FSBL project; I changed this to the root of the project (~/Zynq/ZedSPI/projects/adv7511/zed/boot) to be less immune to the FSBL project getting deleted.  In addition to the FSBL.elf that is pulled into the boot image, I add the system_top.bit, located in the exported HW folder under the Vivado project root/adv7511_zed.runs/impl1/ as a datafile.  The SSBL--U-boot.elf already built before--should also be added to the boot image partition as a data file, kind of like this:

The kernel can boot with this HW and FSBL, and populated /sys/bus/spi tree.  But let's check whether I have all the right kernel configs to drive Cadence SPI recommended by the Xilinx SPI driver wiki.

Kernel config to drive Cadence SPI

Xilinx SPI wiki recommended the following:
  • CONFIG_SPI=y
  • CONFIG_SPI_MASTER=y
  • CONFIG_SPI_CADENCE=y
CONFIG_SPI_MASTER does NOT mean that the Linux kernel will be the SPI master; that is unnecessary to say, since "Linux has no slave side programming interface ... between controller drivers and protocol drivers"--according to <kernel>/Documentation/spi/spi-summary.txt.  Rather it seems to pick up some device drivers that I don't even care about, like AD5446.

So my kernel config has the following SPI related items:

CONFIG_SPI=y
CONFIG_SPI_CADENCE=y
CONFIG_SPI_XCOMM=y
CONFIG_SPI_AD9250FMC=y
CONFIG_SPI_XILINX=y
CONFIG_SPI_ZYNQ_QSPI=y
CONFIG_SPI_SPIDEV=y
# CONFIG_SND_SPI is not set
CONFIG_AD7606_IFACE_SPI=y
CONFIG_AD5624R_SPI=y

board file (arch/arm/mach-omap2/board-omap4panda.c)

There are already many SPI device drivers included in this kernel, as you can see here:

$ ls /sys/bus/spi/drivers
ad5064       ad5686       ad7303       ad8366       adf4350      adis16204    at25
ad5360       ad5755       ad7476       ad9122       adis16060_r  adis16209    m25p80
ad5380       ad5764       ad7606       ad9144       adis16060_w  adis16220    spi-ad9250
ad5421       ad5791       ad7780       ad9361       adis16080    adis16240    spidev
ad5446       ad7192       ad7791       ad9467       adis16130    adis16260
ad5449       ad7266       ad7793       ad9517       adis16136    adis16400
ad5504       ad7280       ad7816       ad9523       adis16201    adis16480
ad5624r      ad7298       ad7887       ad9548       adis16203    adxrs450

Study the Cadence SPI device driver

For devices that are only read from or written to, the userspace SPI device driver is the most convenient.

Kernel doc on spidev: "The simplest way to arrange to use this driver is to just list it in the spi_board_info for a device as the driver it should use:  the "modalias" entry is "spidev", matching the name of the driver exposing this API."  An example is 

static struct spi_board_info gsia18s_spi_devices[] = {
{ /* User accessible spi0, cs0 used for communication with MSP RTC */
.modalias = "spidev",
.bus_num = 0,
.chip_select = 0,
.max_speed_hz = 5000000,
.mode = SPI_MODE_0,
},
...

or

static struct spi_board_info bfin_spi_board_info[] __initdata = {
#if IS_ENABLED(CONFIG_SPI_SPIDEV)
{
.modalias = "spidev",
.max_speed_hz = 3125000,     /* max spi clock (SCK) speed in HZ */
.bus_num = 0,
.chip_select = 1,
},
#endif


This spi_board_info array is used in

static void __init gsia18s_board_init(void) {
...

at91_add_device_spi(gsia18s_spi_devices,

ARRAY_SIZE(gsia18s_spi_devices));

...

at91_add_device_spi does a lot of Atmel specific opcs besides spi_register_board_info()., but blackfin is simpler:

static int __init ad7160eval_init(void)
{
printk(KERN_INFO "%s(): registering device resources\n", __func__);
...
spi_register_board_info(bfin_spi_board_info, ARRAY_SIZE(bfin_spi_board_info));
...
}
arch_initcall(ad7160eval_init);

arch_initcall is level 3 (from 0 through 7) initcall, fired during kernel startup.  spi_register_board_info() handles ALL SPI drivers declared in spi_board_info structure, of which the bus_num is important during spi_register_board_info(), because it matches each board_info against a global spi_master_list (in spi.c; remember that Linux does not support slave mode), and creates a device for the matching driver.  So as spidev document says, I can easily create a spi_board_info structure for the Zynq SPI peripheral 1:

static struct spi_board_info zed_spi_board_info[] = {
  { .modalias = "spidev",
.bus_num = 1,/* Because I am using SPI1 */
.chip_select = 1,/* 
.max_speed_hz = 800000,/* limited by sensor, Table 2 */
.mode = SPI_MODE_0,/* 0 polarity, rising edge */
  },
}

But without a matching master driver, this board info element will go lonely.  This global spi_master_list populated in spi_register_master().  On the Zedboard, this may be called through the Zynq QSPI driver probe: module_platform_driver(zynq_qspi_driver); --> platform_driver_register() --> spi_register_master(), in <kernel>/drivers/spi/spi_zynq_qspi.c.  But QSPI is different than Cadence SPI, so I cannot use this device driver.  But in the same folder, there is the Cadence SPI driver (master mode only) source: spi-cadence.c (named "cdns-spi") written by Xilinx, based on the Blackfin SoC SPI driver (spi_bfin5xx.c).  cdns_spi device is a device complying to the Linux device driver model, so the following picture shows how the device driver gets to the Cadence custom data, starting with the spi_master device pointer.

In cdns_spi above, rx_bytes is the bytes to TO BE READ from the HW and tx_bytes is the bytes yet TO WRITE to the HW.

HW initialization cdns_spi_init_hw(struct cdns_spi *xspi) sets SPI peripheral registers to known values and then enabling the HW controller.  SPI clock configuration is done through the clock divisor written to the control register in cdns_spi_config_clock_freq(), which means that the baud clock is faster than the requested rate if the requested rate is not a power of 2.  The clock configuration is broken out because EACH transfer may use different baud rate (remember that multiple slaves may be on the same peripheral, with only the CS pin being dedicated).

Send is assisted (complicated?) by the txbuf, so that cdns_spi_fill_tx_fifo() just writes maximum bytes allowed.  The function just copies a byte at a time from the tx_buf queue, and when the queue runs empty, shoves 0, for tx_bytes total.  The send is not DMAed, but at least a memory copy is avoided because the tx_buf points to struct spi_transfer's tx_buf.  The nature of SPI communication is that each byte transfer is an exchange between the master and slave, so tx_bytes is set the same as rx_bytes in cdns_transfer_one(), right before bytes are shoved into the HW FIFO and then the interrupts are enabled for SPI mode fault and TX FIFO going below water mark, which is the more interesting case of course--although it is strange that they are using TX FIFO NOT full event instead of RX FIFO NOT empty.  I will have to look at the Cadence datasheet for their logic's behavior.  It would make sense if the interrupt is asserted only after a byte is sent (and therefore a byte is received during the exchange), so that it is valid to read the RX FIFO in the ISR.  Note that the ISR turns off the interrupt when a transfer is complete.

When the device is probed, the memory mapped registers are remapped to the kernel's virtual address:

res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
xspi->regs = devm_ioremap_resource(&pdev->dev, res);

Describe the Cadence SPI HW in DTS

ARM platform device resources are defined in DTS, as in this example:

/ {
axi: amba@0 {
spi1: ps7-spi@e0007000 { 
   #address-cells = <1>;
   #size-cells = <0>;
compatible = "xlnx,zynq-spi-r1p6"; 
clock-names = "ref_clk", "pclk";
clocks = <&clkc 26>, <&clkc 35>; 
interrupt-parent = <&gic>; 
interrupts = <0 49 4>;
num-cs = <3>; 
reg = <0xe0007000 0x1000>;
bus-num = <1>;
   speed-hz = <50000000>;

spidev@0 {
compatible="spidev";
reg = <1>; //chipselect
spi-max-frequency= <800000>;
bus-num = <1>;
};
};
};
};


Note that the HW is nested consistently with other HW accessed through MDIO for the Zedboard, such as the GPIO and USB.  Because I am using SPI1, the clocks have to be 1 greater than the SPI0 examples found on the web (clkc counts from 0); clkc 26 and 35 are "spi1" and "spi1_aper" respectively, among the clocks to be controlled through the system register.  Registers exposed through memory mapped IO is one of the device resource types:

#define IORESOURCE_IO 0x00000100 /* PCI/ISA I/O ports */
#define IORESOURCE_MEM 0x00000200
#define IORESOURCE_REG 0x00000300 /* Register offsets */
#define IORESOURCE_IRQ 0x00000400
#define IORESOURCE_DMA 0x00000800
#define IORESOURCE_BUS 0x00001000

Let's see if my kernel can successfully probe this HW using the above DTS entry.  Since I have a working DTS include chain (described here), I want to extend from this chain rather than modify any of the working DTS files.  So I rename the final DTS to DTSI, to include in a final DTS file and create my specialization file:

$ cd ~/work/zed/kernel/arch/arm/boot/dts
$ mv zynq-zed-adv7511.dts zynq-zed-adv7511.dtsi
$ touch zynq-zed-adv7511-spi1.dts

The final DTS should include the last DTSI in the include chain, and then describe the SPI HW described above

/dts-v1/;
#include "zynq-zed-adv7511.dtsi"
<pull in the HW definition given above>

DTSI should NOT define the DTS version, so the first line should be REMOVED from zynq-zed-adv7511.dtsi before building.  Rebuilding the DTB manually is possible, but Buildroot does it for you if you change the DTS name in the kernel section of the Buildroot config.  The target is still expecting the old DTB file name, so I just copy the new DTB file from the <BR2>/out/images to /var/lib/tftpboot as the old name to fool the target.

When I try out this DTB, the kernel boots but of course I don't see the cdns-spi device.  So to debug, I turn on CONFIG_PROC_DEVICETREE in kernel defconfig, to see whether the DTB shows up correctly.  With the new kernel, this is what I see:

# cat /proc/device-tree/amba@0/ps7-spi@e0007000/compatible 
cdns,spi-r1p6

So it appears that the DTB is correct.  Furthermore, I see the device enumerated under /sys:

$ cat /sys/devices/amba.0/e0007000.ps7-spi/modalias 
of:Nps7-spiT<NULL>Cxlnx,zynq-spi-r1p6

Initially, I was hoping to see this device get probed, but I did not see any evidence of probing.  I was hoping to match this device with a spidev described in the above DTS, to leave the kernel source code unchanged.  To dig into why the master device is not getting probed as expected, I step into the kernel, but finding the place where the cdns-spi driver is getting rejected long before probing (which happens in platform_drv_probe) is not easy.  After some noodling around, I started with spi_match_master_to_boardinfo(), which matches against the spi_board_info's bus_num.  When stopped, I saw in the debugger expressions view that the spi_master_list had only 1 node (because that node pointed back to the spi_master_list), and that node is the cdns SPI device, which registered cdns_spi_transfer_one() as the transfer_one method of a SPI driver..

I used (cross) gdb to translate symbol <--> address, as you can see in this example:

(gdb) info symbol 0xc084be38
spi_master_list in section .data

(gdb) info symbol 0xc033e6f4
cdns_transfer_one in section .text

I also saw that this master SPI device had bus_num = -1, causing the match to fail.  I confirmed that even in /proc/device-tree, the bus_num are not populated, as you can see here:

# cat /proc/device-tree/amba@0/ps7-spi@e0007000/bus-num

When I removed bus-num from DTS, I saw both the master SPI device and spidev appear in /dev and /sys/class:

# ls -l /sys/class/spi_master
spi32766 -> ../../devices/amba.0/e0007000.ps7-spi/spi_master/spi32766

# ls -l /sys/class/spidev/
spidev32766.1 -> ../../devices/amba.0/e0007000.ps7-spi/spi_master/spi32766/spi32766.1/spidev/spidev32766.1

Clearly, the chip select is getting picked up from DTS, but "bus-num" is ineffectual.  Since bus_num is actually unnecessary for me, let's just proceed for now.

Write some bytes to the Cadence driver through spidev

The spidev device is owned by root--which seems to be the default policy for Buildroot udev.

# ls -lo /dev/spidev32766.1 
crw-------    1 root     153,   0 Jan  1 00:00 /dev/spidev32766.1

Therefore, if I write a few bytes to the device, I can see which device driver functions are hit.

# echo hello > /dev/spidev32766.1

What I observed is:
  1. The first is prepare_transfer_hardware(), which just turns on the HW.
  2. Then prepare_message() configures the SPI clock polarity and phase.
  3. set_cs() selects the slave on chip_select 1
  4. We start transfer_one().
  5. set_cs() deselect the slave.
  6. ISR is called 
Here is an example stack trace in JTAG debugger:

0xc033dfcc cdns_spi_chipselect(): drivers/spi/spi-cadence.c, line 176
0xc033aa6c spi_set_cs(): drivers/spi/spi.c, line 579
0xc033b4e8 spi_transfer_one_message(): drivers/spi/spi.c, line 781
0xc033bd8c spi_pump_messages(): drivers/spi/spi.c, line 959
0xc0045694 kthread_worker_fn(): kernel/kthread.c, line 576
0xc00454e4 kthread(): kernel/kthread.c, line 207
0xc000ed78 ret_from_fork(): arch/arm/kernel/entry-common.S, line 91

The SS[1] pin should therefore go low for the whole duration of the exchange, wherein the CLK will be shaking at spi-max-frequency, as you can see in the snapshot of the CLK and SS[1] pin when I ran

# echo ABC > /dev/spidev32766.1


Since I am writing 'ABC', I should see 32 clocks (8 times 3 + \n).  Oddly, the clock frequency is not the requested 1 MHz, but 667 kHz (32 clocks / 48 usec).  The data does vary for each byte, as you can see below, when I probe the MOSI pin:

Just for fun, let's see if I can confirm the data.  When + polarity and 0 phase, the data should be clocked out at the rising edge.  Therefore, above pattern reads 32'b0100_0001_0100_0010_0100_0011_0000_1010, or {0x41, 0x42, 0x43, 0x0A}, which translates to (using the ASCII table) 'ABC\n'!

Read a few bytes from the Cadence driver through spidev

To read, Cadence device driver stuffs 0x00 into the HW and exchanges bytes with the slave, so transfer_one() will still be called upon the following command:

# od -vAn -N4 -tx4 /dev/spidev32766.1
 ffffffff

After I solder the sensor, the read value becomes more interesting:

# od -vAn -N4 -tx4 /dev/spidev32766.1
 a3567406
# od -vAn -N4 -tx4 /dev/spidev32766.1
 c3567106
# od -vAn -N4 -tx4 /dev/spidev32766.1
 23557606

[NOTE: I realized later that od flips the bytes]

According to Honeywell Technical Note: SPI Communication with Honeywell Digital Output Pressure Sensors Figure 3, I can extract the following data from the 4 byte readout, with data[0] being the LSB for the example above:
  • status = data[3] >> 6 = 0 for the last reading: valid data.  Apparently, it takes a few readings for the sensor data to become valid?
  • bridge data[13:0] = (data[3] & 0x3F) << 8 | data[2] = 0x2355
  • temperature data[10:0] = (data[1] << 2) | (data[0] >> 5) = 0x7306 >> 5 = 920
The pressure conversion formula is: (data - data_min) * (Pmax - Pmin) / (data_max - data_min) + Pmin.  Except for the data, all other values are fixed for a given sensor model. HSCDAND030PGSA, measures gage pressure between 0~30 psi.  The data should range from 0 to 2^14 -1.  Therefore, the conversion works out to (0x2357/0x3FFF) * 30 + 0 = 55% of full.  This seems wrong, given that the sensor is open to ATM...

The temperature conversion formula is T_C = temp_data * 200 / 2047 - 50 = 40 C.  This is the temperature used for calibration, not necessarily the measured temperature!

Writing and reading at the same time with spidev driver

Fundamentally, SPI protocol is based on exchange of bits between the master and the slave on the MOSI and MISO lines.  The spidev read() and write() system calls used above are convenient for 1-way communication, but for support bi-directional communication, spidev specific ioctl must be used, as in this example from spidev_test.c (in <kernel>/Documentation/spi/ folder):

struct spi_ioc_transfer tr = {
.tx_buf = (unsigned long)tx,
.rx_buf = (unsigned long)rx,
.len = ARRAY_SIZE(tx),
.delay_usecs = delay,
.speed_hz = speed,
.bits_per_word = bits,
};
ret = ioctl(fd, SPI_IOC_MESSAGE(1), &tr);

That is, the transfer sets up the send and receive buffers (of equal size!), and iotctl() does the exchange for the caller.  This example comes with standard kernel along with the Makefile, but is not built by default.  Normally, a cross-compilation is necessary, but since I am using Buildroot, I have to jump through more hoops to get it on my Zedboard.  Lucky for me, there are other people who are a step ahead of me and patched Buildroot already, but in a manner that requires kernel >= 3.15.  Another piece of luck: the kernel I downloaded from ADI (to pick up the ADV7511 driver) is version 3.15:

# uname -a
Linux zed 3.15.0 #2 SMP PREEMPT Sun Dec 28 13:26:46 PST 2014 armv7l GNU/Linux

Cross compiling spidev_test in Buildroot (only for kernel >= 3.15)

First, create a Buildroot package called spidev_test, which will hold its own config and makefile:

~/work/zed/buildroot/package$ mkdir spidev_test
~/work/zed/buildroot/package$ touch spidev_test/Config.in
~/work/zed/buildroot/package$ touch spidev_test/spidev_test.mk

The config file should list its dependency and some blurb

config BR2_PACKAGE_SPIDEV_TEST
bool "spidev_test"
depends on BR2_LINUX_KERNEL
help
 SPI testing utility (using spidev driver).

 This package builds and installs the userspace 'spidev_test'
 command. It is up to the user to ensure that the kernel
 configuration has all the suitable options enabled to allow a
 proper operation of 'spidev_test'.

 https://www.kernel.org/doc/Documentation/spi/spidev_test.c

comment "spidev_test needs Linux kernel to be built"
depends on !BR2_LINUX_KERNEL

The makefile lists the build and install rule (in this case to /usr/sbin):

####################################################################
#
# spidev_test
#
####################################################################

# Source taken from the Linux kernel tree
SPIDEV_TEST_SOURCE =
SPIDEV_TEST_VERSION = $(call qstrip,$(BR2_LINUX_KERNEL_VERSION))
SPIDEV_TEST_DEPENDENCIES = linux

define SPIDEV_TEST_BUILD_CMDS
$(TARGET_MAKE_ENV) $(TARGET_CC) $(TARGET_CFLAGS) -o $(@D)/spidev_test \
$(LINUX_DIR)/Documentation/spi/spidev_test.c
endef

define SPIDEV_TEST_INSTALL_TARGET_CMDS
$(INSTALL) -D -m 755 $(@D)/spidev_test \
$(TARGET_DIR)/usr/sbin/spidev_test
endef

$(eval $(generic-package))

Then insert the spidev_test package into Buildroot package config, as highlighted below:

  source "package/rt-tests/Config.in"
source "package/spidev_test/Config.in"
  source "package/strace/Config.in"

When I run make xconfig again in <BR2> folder, I see the new spidev_test package option under "Debugging, profileing, and benchmark, and I can select it.  Since Buildroot does not support smart incremental build, I nuked the whole output/build/ folder and rebuilt everything (takes ~30 minutes on my old Core2 Duo, 4 GB).  And spidev_test is available in /usr/sbin on the target, and I can run it against my device:

# spidev_test --device /dev/spidev32766.1 --speed 800000
spi mode: 0x0
bits per word: 8
max speed: 800000 Hz (800 KHz)

06 76 55 23 3B 2A 
91 9D 95 48 CE CA 
A4 67 65 52 33 B2 
A9 19 D9 54 8C EC 
AA 46 76 55 23 3B 
2A 91 9D 95 48 CE 
CA A4 

Since the sensor only has 4 bytes as meaningful data, the rest is just garbage.  Note that the bytes are flipped from what od printed!  This means that my interpretation was wrong above.  Let's do this one again:
  • status = data[3] >> 6 = 0 for the last reading: valid data.
  • bridge data[13:0] = (data[3] & 0x3F) << 8 | data[2] = 0x676
  • temperature data[10:0] = (data[1] << 2) | (data[0] >> 5) = 0x5523 >> 5 = 681
The pressure is (0x676/0x3FFF) * 30 + 0 = 1654/4096 * 30 = 40 % of 30 psi full = 12 psi?  T_C = temp_data * 200 / 2047 - 50 = 681 * 200 / 2047 - 50 = 16.5 C, which makes more sense.

QSPI

How is QSPI different than SPI?  There is a QSPI flash memory on Zedboard that is driven by the spi-zynq-qspi driver.  Since the Cadence SPI driver is a generic SPI driver (no assumption about the device nature), the 2 device drivers will be awkward to compare.  Even in DTS, the QSPI has entries that are flash memory specific as children (just as spidev was a child of the spi1 node):

qspi0: qspi@e000d000 {
        #address-cells = <1>;
        #size-cells = <0>;
        bus-num = <0>;
        compatible = "xlnx,zynq-qspi-1.00.a", "xlnx,ps7-qspi-1.00.a";
        interrupt-parent = <&gic>;
        interrupts = <0 19 4>;
        clock-names = "ref_clk", "aper_clk";
        clocks = <&clkc 10>, <&clkc 43>;
        is-dual = <0>;
        num-chip-select = <1>;
        reg = <0xe000d000 0x1000>;
        xlnx,fb-clk = <0x1>;
        xlnx,qspi-clk-freq-hz = <0xbebc200>; //200 MHz
        xlnx,qspi-mode = <0x0>;
        primary_flash: ps7-qspi@0 {
                #address-cells = <1>;
                #size-cells = <1>;
                compatible = "st,m25p80";
                reg = <0x0>;
                spi-max-frequency = <50000000>;
                partition@0x00000000 {
                        label = "boot";
                        reg = <0x00000000 0x00500000>;
                };
                partition@0x00500000 {
                        label = "bootenv";
                        reg = <0x00500000 0x00020000>;
                };
...
         };
};

That the HW clock source, interrupt line, and the register address are different are understandable.  But apparently, QSPI has a notion of dual that I did not see before.

Still, it is better to have some source code to compare.  Confusingly, the driver ignores the SPI mode specified in DTS:

master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_RX_DUAL | SPI_RX_QUAD |
   SPI_TX_DUAL | SPI_TX_QUAD;

The driver works similarly to the Cadence device driver.  It is significant that despite the high speed memory access usage of the device driver, this device driver memcpys.

A sensor protocol driver

Userspace access to SPI device through the SPI master driver framework and spidev is convenient; saving work on both the implementer and the user.  But the user has to know the device message protocol.  The protocol drivers take away such flexibility in return for hiding the low level details of the device message.  I will write such a device driver, for the Honeywell sensor, but since this blog entry is getting too long, the work will continue in another blog entry.

I2C

As mentioned in my writeup on the Zedboard HDMI reference design, the ADI ADV7511 HDMI driver chip is controlled through an I2C logic in FPGA fabric, rather than an CPU I2C peripheral.  I think ADI chose this path to make ADV7511 available on as many FPGA platforms as possible, rather than just on Zynq.  This means that ADI did not write code that shows example of how to push a lot of data through on-CPU I2C, although the HDMI device driver in <kernel>/driverse/gpu/drm/adi_axi_hdmi makes I2C calls  for the low speed control path.  For cheap low speed sensors (and what sensors are not cheap these days?), this is fine.

But to learn how to control and I2C device from CPU, let's add the I2C peripheral in Vivado PS7 configurator wizard.  Even though the Zynq device has 2 SPI and 2 I2C peripherals each that can potentially be placed on different pins, Zedboard constraints mean that only I2C0 can be put in MIO.  Trying to enable I2C1, for example, will trip a constraint check--as you can see below, where Ethernet 0 lights up red.
This is a Zedboard specific constraint, which does NOT show up in a generic the IO map above, but is indicated in the board schematic, as shown below.  Note that MIO pins 52 and 53 are used up for the Ethernet HW.  Similarly, trying to put I2C0 on MIO 46 and 47 is disallowed, because there is already SD HW on those pins (SD write protect and carrier detect).
The 2 I2C pins are on Bank 501 which is on 1.8V, and trace to the 2 MIO buttons on the Zedboard, shown in the middle of the picture below.

If I take out the button, I should be able to solder a pair of wires to any I2C device--which seem to be  either the on Vdd_IO = 1.8 or 2.8 V.  When I expand the Vivado PS7 configurator MIO pin assignment, I see that MIO50 (going to PB1 above) is the clock, and the other is the data.
The pull-up resistor is disabled by default (as you can see above), but I2C REQUIRES pull-up (ACK is logic low, so if the receiver is NOT there to pull it down, the SDA should go high).  I tried enabling the internal pull-up resistor, but the resistance value is apparently too high for high speed I2C clock (400 kHz); a lower resistance (150 Ohm) is required for a stronger pull-up.  So I soldered an external 150 Ohm resistor across the SDA and SCL to the VADJ (1.8 V here) to achieve 400 kHz I2C frequency.

After generating the bitstream and exporting the HW description to XSDK, the I2C_0 peripheral's registers shows up with this memory mapping to CPU1: 0xE0004000~0xE0004FFF (the same on CPU0).  To see Xilinx examples, recreate the BSP for CPU1 from XSDK.  Then the "Import Examples" link appears in system.mss (the BSP description), as shown here:
Then choose which scenarios best fits my application, which is the CPU1 master driving a slave WITHOUT any interrupt (note that I did NOT check the interrupt box above)

The very first function call in the generated example--XIicPs_CfgInitialize--resets the I2C peripheral.
  • Write 0x00000000 to XIICPS_CR_OFFSET to the I2C register base address (0xE0004000 for I2C 0).  The configuration register consists of:
    • Stage A and B (chained) clock dividers.
      • Input clock to the I2C HW is APB CPU_1x clock (1/4th of the CPU clock), 166.75 MHz
      • The actual achievable clock rate seems to be input clock Hz / (22 * (div_A + 1) * (div_B + 1)), but I found that this formula yields a clock rate about 25% slower than the expected when measured on the scope.
      • The maximum I2C clock frequency supported is 400 KHz.
    • clear FIFO bit.
    • slave monitor bit (1: monitor)
    • HOLD bit: 0 => transfer terminates as soon as all data has been transferred.  If writing a long byte sequence (the TX FIFO is 16 bytes deep), use this bit to avoid terminating the transfer in the middle.
    • ACKEN bit: 1 => ACK transmitted, 0 => NACK transmitted
    • Addressing mode: 1 => normal (7-bit) address; 0 => 10 bit
    • Master bit: 1 => master
    • Transfer direction bit: 0 => transmitter.  To be changed for each read/write?
  • Write time out to the TIME_OUT register (offset 0x1C).  Unnecessary if the desired value TIMEOUT (when the accessed slave holds the sclk line low for longer than the time out period, thus prohibiting the I2C interface in master mode to complete the current transfer, an interrupt is generated and TO interrupt flag is set; maximum is 255) is the same as the reset value (0x1F).
  • Write to 0x2FF to interrupt DISABLE register (offset 0x28)--which is the dual of the interrupt ENABLE register (offset 0x24), which is ALL interrupts the HW supports.  In case I use I2C interrupt in the future, they are:
    • 0x200: arbitration lost
    • 0x080: RX underflow
    • 0x040: TX overflow
    • 0x020: RX overflow
    • 0x010: monitored slave ready
    • 0x008: transfer timeout
    • 0x004: transfer NACK
    • 0x002: more data
    • 0x001: IXR_COMP; TX complete 
Next, the I2C is self-tested--consisting of reading back the registers just written, and the slave monitor pause register SLV_PAUSE (PoR value 0).

Master writing to the slave

  1. The I2C bus is busy if the BA bit of the HW status register (offset 0x4) is set.
  2. Write to the control register to set up SCL speed and addressing mode.
  3. Set the MS, ACKEN, and CLR_FIFO bits and clear the RW bit (to indicate a write).   Turn on data complete and arbitration lost bits in the interrupt enable register.
  4. If required (bytes to transfer is longer than available FIFO), set the HOLD bit. Otherwise write the first byte of data to the I2C Data register.
  5. Fill the FIFO with data to send
  6. Write the slave address into the I2C address register. This initiates the I2C transfer.
  7. Continue to load the remaining data to be sent to the slave by writing to the I2C Data register.  But avoid overflowing the TX FIFO by waiting until the FIFO empties, by checking the TXDV bit of the status register (offset 0x4).  The data is pushed in the FIFO each time the host writes to the I2C Data register.
  8. When transfer is completed, the IXR_COMP bit of the interrupt status register (offset 0x10) goes high.  Error during transfer aborts the transfer and sets the corresponding bit (listed above) in the interrupt status register.
  9. If HOLD bit was set earlier, unset it now.
Some I2C device supports auto-increment, so that only the starting address is written at the beginning, and subsequent register values are written in succession.  The Zynq I2C HW itself supports a consecutive write if the address register is written again before the end of the current transfer; this is DIFFERENT than the I2C slave device itself supporting consecutive write.

Master reading from the slave

  1. Write to the Control register to set up the SCL speed and addressing mode.
  2. Set MS, ACKEN, CLR_FIFO bits, and RW bit.  Turn on the "more data", RX overflow, data complete, and arbitration lost bits in the interrupt enable register.
  3. If the host wants to hold the bus after the data is received, it must also set the HOLD bit.
  4. Write the slave address in the I2C Address register. This initiates the I2C transfer.
  5. Write the number of requested bytes in the Transfer Size register.  But the maximum transfer size supported is 252 bytes.
  6. As long as there is bytes valid data to read (RXDV bit in the status register is set), read from the data FIFO.  If the remaining bytes to receive is less than I2C receive pipeline depth (16), turn off the HOLD bit BEFORE reading out the FIFO.
A complex read where the slave's register address is required for a read can then consist of a write and then a read, with the HOLD bit on, to avoid sending the STOP bit in the middle.

USB

The USB HW on Zedboard is described in <kernel>/arch/arm/boot/dts/zynq.dtsi:

usb: usb@e0002000 {
  compatible = "xlnx,ps7-usb-1.00.a", "xlnx,zynq-usb-1.00.a";
  reg = <0xe0002000 0x1000>;
  interrupts = <0 21 4>;
  interrupt-parent = <&gic>;
  clocks = <&clkc 28>;
  dr_mode = "host";
  phy_type = "ulpi";
};

"compatible" field is for Linux to find a matching driver.  The 0 in interrupts specification triplet is supposed to indicate whether the interrupt is shared, is atually meaningless because everyone just puts in 0 regardless of the actual shared status; 4 means the interrupt is active high level triggered, and the actual interrupt number depends on what the HW interrupt controller assigns (NOT necessarily 1-1 fashion.  I look up the 28th (counting from 0) clock from the clocks available through the system control registers:

slcr: slcr@f8000000 {
  #address-cells = <1>;
  #size-cells = <1>;
  compatible = "xlnx,zynq-slcr", "syscon";
  reg = <0xf8000000 0x1000>;
  ranges ;
  clkc: clkc {
    #clock-cells = <1>;
    clock-output-names = "armpll", "ddrpll", "iopll", "cpu_6or4x", "cpu_3or2x",
      "cpu_2x", "cpu_1x", "ddr2x", "ddr3x", "dci",
      "lqspi", "smc", "pcap", "gem0", "gem1",
      "fclk0", "fclk1", "fclk2", "fclk3", "can0",
      "can1", "sdio0", "sdio1", "uart0", "uart1",
      "spi0", "spi1", "dma", "usb0_aper", "usb1_aper",
      "gem0_aper", "gem1_aper", "sdio0_aper", "sdio1_aper", "spi0_aper",
      "spi1_aper", "can0_aper", "can1_aper", "i2c0_aper", "i2c1_aper",
      "uart0_aper", "uart1_aper", "gpio_aper", "lqspi_aper", "smc_aper",
      "swdt", "dbg_trc", "dbg_apb";
    compatible = "xlnx,ps7-clkc";
    ps-clk-frequency = <33333333>;
    fclk-enable = <0xf>;
    reg = <0x100 0x100>;
  };
};

Some of the USB device and subsystem initialization is done through the __initcall constructor magic discussed in the previous section.

module_init(ehci_hcd_init)
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver


zynq_dr_init() --> platform_driver_register() --> zynq_dr_of_probe() --> otg_ulpi_create() --> ulpi_init()
ULPI transceiver vendor/product ID 0x0451/0x1507
Found TI TUSB1210 ULPI transceiver.
ULPI integrity check: passed.

ehci_zynq_drv_probe() --> usb_hcd_zynq_probe() --> ehci_zynq_otg_start_host() 
zynq-ehci zynq-ehci.0: Xilinx Zynq USB EHCI Host Controller
zynq-ehci zynq-ehci.0: new USB bus registered, assigned bus number 1
zynq-ehci zynq-ehci.0: irq 53, io mem 0x00000000
zynq-ehci zynq-ehci.0: USB 2.0 started, EHCI 1.00

hub 1-0:1.0: USB hub found
hub 1-0:1.0: 1 port detected
usbcore: registered new interface driver usb-storage
usbcore: registered new interface driver usbserial
usbcore: registered new interface driver usbserial_generic
usbserial: USB Serial support registered for generic
usbcore: registered new interface driver ftdi_sio
usbserial: USB Serial support registered for FTDI USB Serial Device
...

hidraw: raw HID events driver (C) Jiri Kosina
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
...
usb 1-1: new high-speed USB device number 2 using zynq-ehci
...
hub 1-1:1.0: USB hub found
hub 1-1:1.0: 4 ports detected
usb 1-1.1: new low-speed USB device number 3 using zynq-ehci
input: USB Optical Mouse as /devices/amba.0/e0002000.usb/zynq-ehci.0/usb1/1-1/1-1.1/1-1.1:1.0/0003:0461:4D81.0001/input/input0
hid-generic 0003:0461:4D81.0001: input,hidraw0: USB HID v1.11 Mouse [USB Optical Mouse] on usb-zynq-ehci.0-1.1/input0
usb 1-1.2: new low-speed USB device number 4 using zynq-ehci
input: Microsoft Natural® Ergonomic Keyboard 4000 as /devices/amba.0/e0002000.usb/zynq-ehci.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:045E:00DB.0002/input/input1
microsoft 0003:045E:00DB.0002: input,hidraw1: USB HID v1.11 Keyboard [Microsoft Natural® Ergonomic Keyboard 4000] on usb-zynq-ehci.0-1.2/input0
input: Microsoft Natural® Ergonomic Keyboard 4000 as /devices/amba.0/e0002000.usb/zynq-ehci.0/usb1/1-1/1-1.2/1-1.2:1.1/0003:045E:00DB.0003/input/input2
microsoft 0003:045E:00DB.0003: input,hidraw2: USB HID v1.11 Device [Microsoft Natural® Ergonomic Keyboard 4000] on usb-zynq-ehci.0-1.2/input1

Update ADI kernel 3.19 did away with the Xilinx drivers in favor of the Marvell chipidea driver

As part of effort to mainline the Xilinx/ADI kernel, the Xilinx drivers were tossed, and replaced with the Marvell driver <kernel>/drivers/usb/chipidea/ci_hdrc_usb2.c.  The only problem is that the dr_mode was dropped from the <kernel>/arch/arm/boot/dts/zynq.dtsi, so I patched up the DTSI file:

                ps7_usb_0: ps7-usb@e0002000 {
                        clocks = <&clkc 28>;
                        compatible = "xlnx,ps7-usb-1.00.a", "xlnx,zynq-usb-1.00.a";
                        dr_mode = "host";
                        interrupt-parent = <&ps7_scugic_0>;
                        interrupts = <0 21 4>;
                        phy_type = "ulpi";
                        reg = <0xe0002000 0x1000>;
                } ;

After the change, my keyboard and mouse are correctly detected, according to dmesg:

usb 1-1.1: new low-speed USB device number 3 using ci_hdrc
input: Logitech USB Keyboard as /devices/soc0/amba@0/e0002000.usb/ci_hdrc.0/usb1/1-1/1-1.1/1-1.1:1.0/0003:046D:C31D.0001/input/input0
hid-generic 0003:046D:C31D.0001: input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Keyboard] on usb-ci_hdrc.0-1.1/input0
input: Logitech USB Keyboard as /devices/soc0/amba@0/e0002000.usb/ci_hdrc.0/usb1/1-1/1-1.1/1-1.1:1.1/0003:046D:C31D.0002/input/input1
hid-generic 0003:046D:C31D.0002: input,hidraw1: USB HID v1.10 Device [Logitech USB Keyboard] on usb-ci_hdrc.0-1.1/input1
usb 1-1.2: new low-speed USB device number 4 using ci_hdrc
...
input: Logitech USB Trackball as /devices/soc0/amba@0/e0002000.usb/ci_hdrc.0/usb1/1-1/1-1.2/1-1.2:1.0/0003:046D:C408.0003/input/input2
hid-generic 0003:046D:C408.0003: input,hidraw2: USB HID v1.10 Mouse [Logitech USB Trackball] on usb-ci_hdrc.0-1.2/input0

Ethernet

The Zynq design I've been using for HDMI display enabled ARM Linux target accesses GigE HW through direct memory mapped registers, described in section 16 of Xilinx Zynq TRM.  The HW maps [0xE000B000, 0xE000C000) for the ps7_ethernet_0 peripheral accessed directly from the PS7 (standins for ARM7 processing system) with MIO pin shown below (copied from ps7_init.html in the Vivado exported HW project in Xilinx SDK):
MIO PinPeripheralSignalIO TypeSpeedPullupDirection
MIO 16Enet 0tx_clkLVCMOS 1.8Vfastdisabledout
MIO 17Enet 0txd[0]LVCMOS 1.8Vfastdisabledout
MIO 18Enet 0txd[1]LVCMOS 1.8Vfastdisabledout
MIO 19Enet 0txd[2]LVCMOS 1.8Vfastdisabledout
MIO 20Enet 0txd[3]LVCMOS 1.8Vfastdisabledout
MIO 21Enet 0tx_ctlLVCMOS 1.8Vfastdisabledout
MIO 22Enet 0rx_clkLVCMOS 1.8Vfastdisabledin
MIO 23Enet 0rxd[0]LVCMOS 1.8Vfastdisabledin
MIO 24Enet 0rxd[1]LVCMOS 1.8Vfastdisabledin
MIO 25Enet 0rxd[2]LVCMOS 1.8Vfastdisabledin
MIO 26Enet 0rxd[3]LVCMOS 1.8Vfastdisabledin
MIO 27Enet 0rx_ctlLVCMOS 1.8Vfastdisabledin
MIO 52Enet 0mdcLVCMOS 1.8Vslowdisabledout
MIO 53Enet 0mdioLVCMOS 1.8Vslowdisabledinout

These MIO pins are programmable directly from the CPU using memory mapped registers, such as MIO_PIN_52 at 0xF80007D0, which is NOT in any memory mapping assigned to other peripherals--which makes sense.

Xilinx Ethernet MAC PS core concepts

  • Control: net_ctrl
    • TXEN: net_ctrl[3]
    • RXEN: net_ctrl[2]
  • Full/half duplex mode: In half duplex mode (at least in 10/100 Mbit half duplex mode), the transmitter can shake the RX wires, whereas full duplex mode cannot.  Transmitters and receivers interact with upstream data sink/source through FIFO, even for DMA.
  • DMA block
    • RX and TX buffer descriptors point to a memory area in kernel
    • DMA controller perfroms 4 operations: TX/TX buffer manager read/write, RX data DMA write, TX data DMA read
    • Transfer may burst 1, 4, 8, 16 DWORD
    • Receive buffer queue pointer: rx_qbar
      • Buffer descriptor structure: TRM table 16-2, but in the device driver, looks like:
        struct xemacps_bd { u32 addr;//DMA controller will read/write here u32 ctrl;};
      • Descriptor[0][1] bit marks the last descriptor in a circular queue
      • Descriptor[0][0] indicates ownership
      • Only the final buffer contains the received frame description--except SoF (start of frame) and EoF (end of frame) bits; in case of error, there can be sequence of SoF without a matching EoF
    • Only good received frames are written out to DMA, but if DMA runs out of buffer descriptor, there can be a partial frame appearing in memory, and interrupt will be triggered with BUFFNA (rx_status[0]) set, so that the SW can free the DMA memory to prevent receive overrun--which which will trigger an interrupt, and the buffer currently being written is recycled.  If dma_cfg[24], the HW will discard the received packet proactively.  The SW can also make room by writing net_ctrl[18] to flush a packet from the receive buffer--if the RX DMA is IDLE.
  • Frame check sequence.  Receive descriptor[1][13] indicates FCS validity
  • Errors
    • alignmentCRC (FCS): ignored if net_cfg[26] FALSE
    • Short frame ignored if net_cfg[16]
    • long frame
    • jabber
    • receive symbol error: carrier extension error only detected during minimum slot time
  • Filter
    • HW supports up to 4 addresses, activated when last 2 bytes of the Ethernet address is written.
    • Up to 4 types supported per destination address.  The type filter register[31] must be asserted.
    • No broadcast: net_cfg[5]
    • Destination address hash matching
    • copy all (AKA promiscuous) bit: net_cfg[4].  Does NOT apply to error frames if net_cfg[26] set set
    • Pause frame: net_cfg[23]
    • Discard non-VLAN frame: net_cfg[2]
  • VLAN
    • RX buffer description status holds VLAN frame information
  • Wake on LAN: wake_on_lan register

HW interface to the Linux kernel

Zynq GigE is described to the Linux kernel through the DTS entry (<kernel>/arch/arm/boot/dts/zynq-zed.dtsi):

eth: eth@e000b000 {
  compatible = "xlnx,ps7-ethernet-1.00.a";
  reg = <0xe000b000 0x1000>;
  interrupts = <0 22 4>;
  interrupt-parent = <&gic>;
  #address-cells = <0x1>;
  #size-cells = <0x0>;

  clock-names = "ref_clk", "aper_clk";
  clocks = <&clkc 13>, <&clkc 30>;

  xlnx,enet-clk-freq-hz = <0x17d7840>;
  xlnx,enet-reset = "MIO 11";
  xlnx,enet-slcr-1000mbps-div0 = <0x8>;
  xlnx,enet-slcr-1000mbps-div1 = <0x1>;
  xlnx,enet-slcr-100mbps-div0 = <0x8>;
  xlnx,enet-slcr-100mbps-div1 = <0x5>;
  xlnx,enet-slcr-10mbps-div0 = <0x8>;
  xlnx,enet-slcr-10mbps-div1 = <0x32>;
  xlnx,eth-mode = <0x1>;
  xlnx,has-mdio = <0x1>;
  xlnx,ptp-enet-clock = <111111115>;
};

Note that the GigE clock is 25 MHz.  Even though Ethernet itself is a serial device (the TX and RX are twisted differential pairs), MII Ethernet phy is NOT a serial device; the MII standard uses 4 TX wires in parallel; you can see above that after auto-negotiation with a 100 Mbps switch, the clock is set to 100M/4 = 25 MHz.  This device shows up in /sys like this:

xemacps (Xilinx Ethernet MAC PS) is the platform driver for eth@e000b000 above, as you can see below:

static struct of_device_id xemacps_of_match[] = {
{ .compatible = "xlnx,ps7-ethernet-1.00.a", },
{ /* end of table */}
};
MODULE_DEVICE_TABLE(of, xemacps_of_match);

We can check the match table worked by checking the /sys after the kernel starts:

$ ls -l bus/mdio_bus/devices/
d748dd80:00 -> ../../../devices/amba.0/e000b000.eth/net/eth0/d748dd80:00

$ ls -l devices/amba.0/e000b000.eth/
driver -> ../../../bus/platform/drivers/xemacps
subsystem -> ../../../bus/platform

So the dmesg output makes sense: "XEMACPS mii bus" is the name assigned to the mii_bus in and used in <kernel>/drivers/net/phy/mdio_bus.c:mdiobus_register().

libphy: XEMACPS mii bus: probed

xemacps e000b000.eth: pdev->id -1, baseaddr 0xe000b000, irq 54
...


xemacps e000b000.eth: Set clk to 25000000 Hz
xemacps e000b000.eth: link up (100/FULL)

Note that the IRQ seen by the kernel is 32 PLUS the IRQ assigned to the GIC in DTS (22).  But wait!  Checking the /sys more carefully, Marvell 88E1510 driver thinks it is the driver for the Ethernet device as well:

$ ls -l bus/mdio_bus/drivers/Marvell\ 88E1510
d748dd80:00 -> ../../../../devices/amba.0/e000b000.eth/net/eth0/d748dd80:00

This driver is showing up because of the PHY that eth0 is controlling (through the MDIO interface pins on MIO pins 52 and 53):

&eth {
        phy-handle = <&phy0>;
        phy-mode = "rgmii-id";

        phy0: phy@0 {
                compatible = "marvell,88e1510";
                device_type = "ethernet-phy";
                reg = <0x0>;
                marvell,reg-init=<3 16 0xff00 0x1e 3 17 0xfff0 0x0a>;
        };
};

phy-mode of rgmii-id is described in Wikipedia.  "&eth" aliased to ethernet0 in all Zynq DTS.  So above statement in zynq-zed.dtsi (a header file in DTS speak, to describe Zedboard specialization) tacks on phy@0 as a child node of eth@e000b000, as you can see below:

$ ls /proc/device-tree/amba@0/eth@e000b000/
#address-cells                reg
#size-cells                   xlnx,enet-clk-freq-hz
clock-names                   xlnx,enet-reset
clocks                        xlnx,enet-slcr-1000mbps-div0
compatible                    xlnx,enet-slcr-1000mbps-div1
interrupt-parent              xlnx,enet-slcr-100mbps-div0
interrupts                    xlnx,enet-slcr-100mbps-div1
local-mac-address             xlnx,enet-slcr-10mbps-div0
name                          xlnx,enet-slcr-10mbps-div1
phy-handle                    xlnx,eth-mode
phy-mode                      xlnx,has-mdio
phy@0                         xlnx,ptp-enet-clock

So in summary, <>/drivers/net/phy/marvell.c is the PHY driver, and <kernel>/drivers/net/ethernet/xilinx/xilinx_emacps.c is the MDIO MAC driver for eth0.  Since eth0 is the parent device of the PHY, let's talk about the eth0 first.  It was statically compiled and registered as a platform driver at the bottom of xilinx_emacps.c:

static struct platform_driver xemacps_driver = {
.probe   = xemacps_probe,
.remove  = xemacps_remove,
.driver  = {
.name  = DRIVER_NAME,
.of_match_table = xemacps_of_match,
.pm = XEMACPS_PM,
},
};

module_platform_driver(xemacps_driver);

Eth probe()

The platform bus core calls the probe() function--hooked up to xemacps_probe()--because the of_match_table matched against the compatible field in DTS.  The probe function starts allocating resource for the device right away:

 r_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 r_irq = platform_get_resource(pdev, IORESOURCE_IRQ, 0);

platform_get_resource() actually does NOT allocate the said resource; those resources were probably  allocated and added by the core (platform_device_add_resources) BEFORE probe is called (Q: but exactly when?), and platform_get_resource() merely returns those resources.  The memory resource is remapped to a virtual address:

lp->baseaddr = devm_ioremap_resource(&pdev->dev, r_mem);

The device uses not 1 but 4 spinlocks; an associated sequence diagram of when these locks would have been much more elucidating.

spin_lock_init(&lp->tx_lock);
spin_lock_init(&lp->rx_lock);
spin_lock_init(&lp->nwctrlreg_lock);
spin_lock_init(&lp->mdio_lock);

Ethernet_dev is like an abstract class, with various operations the network interface driver supplies:

ndev = alloc_etherdev(sizeof(*lp));
ndev->netdev_ops = &netdev_ops;
ndev->watchdog_timeo = TX_TIMEOUT;
ndev->ethtool_ops = &xemacps_ethtool_ops;
ndev->base_addr = r_mem->start;
ndev->features = NETIF_F_IP_CSUM | NETIF_F_SG;
netif_napi_add(ndev, &lp->napi, xemacps_rx_poll, XEMACPS_NAPI_WEIGHT);

rc = register_netdev(ndev);

Note the special callback for the NAPI receive (which is also described in LDD 3rd edition, p. 526): we will see this in the eth ISR below.

Next, clocks are enabled:

lp->aperclk = devm_clk_get(&pdev->dev, "aper_clk");
lp->devclk = devm_clk_get(&pdev->dev, "ref_clk");
rc = clk_prepare_enable(lp->aperclk);
rc = clk_prepare_enable(lp->devclk);

Next, eth probe() consults the DTS setting:

rc = of_property_read_u32(lp->pdev->dev.of_node, "xlnx,has-mdio", &lp->has_mdio);
lp->phy_node = of_parse_phandle(lp->pdev->dev.of_node, "phy-handle", 0);
lp->gmii2rgmii_phy_node = of_parse_phandle(lp->pdev->dev.of_node,
"gmii2rgmii-phy-handle", 0);
rc = of_get_phy_mode(lp->pdev->dev.of_node);

Finally, the eth driver starts writing to the memory mapped registers, described in the Zynq 7000 TRM (Technical Reference Manual) GEM chapter (chapter 16) to configure the MDC clock:

regval = (MDC_DIV_224 << XEMACPS_NWCFG_MDC_SHIFT_MASK);
xemacps_write(lp->baseaddr, XEMACPS_NWCFG_OFFSET, regval);

regval = XEMACPS_NWCTRL_MDEN_MASK;
xemacps_write(lp->baseaddr, XEMACPS_NWCTRL_OFFSET, regval);

The same constants are defined verbatim (sometimes even the comments) in the Xilinx bare-metal XEmacPs driver (xemacps_hw.h).  xemacps_read/write() is just readl/writel_relaxed(), which just calls LDR and STR assembly instructions.  According to LDD 3rd edition, we are supposed to use ioread32/iowrite32 functions (and ioread32_rep/iowrite32_rep for multiple double words).  I guess it's still OK because it's ARM.

Next, MDIO bus is created (but it's called mii_bus, because MDIO and MII are synonymous--as well as SMI; in this lingo, eth, which manages the MII/MDIO device is called SME--station management entity).  Remember: the PHY is on the MDIO bus but eth device is on the platform bus, and OWNS the MDIO bus.

rc = xemacps_mii_init(lp);

This is the mdio_bus linking the device_driver and the device we saw above, in /sys/bus.  Be careful not to be confused about the bus functions from the device functions.  Q: How are they different from the device functions?

lp->mii_bus->name  = "XEMACPS mii bus";
lp->mii_bus->read  = &xemacps_mdio_read;
lp->mii_bus->write = &xemacps_mdio_write;
lp->mii_bus->reset = &xemacps_mdio_reset;
lp->mii_bus->priv = lp;
lp->mii_bus->parent = &lp->ndev->dev;

lp->mii_bus->irq = kmalloc(sizeof(int)*PHY_MAX_ADDR, GFP_KERNEL);
for (i = 0; i < PHY_MAX_ADDR; i++) lp->mii_bus->irq[i] = PHY_POLL;

DTS PHY is required for registering the OF MDIO bus.

if (lp->phy_node) {
if (of_mdiobus_register(lp->mii_bus, np))

Marvell PHY gets probed at this time.

PHY probe()

<>/drivers/net/phy/marvell.c is the jack of all Marvell PHYs.  The PHY driver module init declares a table of all Marvell 88E series phys with associated callback functions, and registers them en-mass:

static struct mdio_device_id __maybe_unused marvell_tbl[] = {
...
{ MARVELL_PHY_ID_88E1510, MARVELL_PHY_ID_MASK },
{ }
};

MODULE_DEVICE_TABLE(mdio, marvell_tbl);

static struct phy_driver marvell_drivers[] = {
... {
.phy_id = MARVELL_PHY_ID_88E1510,
.phy_id_mask = MARVELL_PHY_ID_MASK,
.name = "Marvell 88E1510",
.features = PHY_GBIT_FEATURES,
.flags = PHY_HAS_INTERRUPT,
.config_aneg = &m88e1510_config_aneg,
.read_status = &marvell_read_status,
.ack_interrupt = &marvell_ack_interrupt,
.config_intr = &marvell_config_intr,
.did_interrupt = &m88e1121_did_interrupt,
.resume = &genphy_resume,
.suspend = &genphy_suspend,
.driver = { .owner = THIS_MODULE },
},
}

static int __init marvell_init(void)
{
return phy_drivers_register(marvell_drivers,
ARRAY_SIZE(marvell_drivers));

}
module_init(marvell_init);

Note the liberal use of functions that are commonly defined for multiple Marvell phys, and even phy library (<>/drivers/net/phy/phy_device.c); genphy stands for "Gigabit Ethernet PHY".  The phy library provides the probe() function.

int phy_driver_register(struct phy_driver *new_driver)
{
  int retval;
  new_driver->driver.name = new_driver->name;
  new_driver->driver.bus = &mdio_bus_type;
  new_driver->driver.probe = phy_probe;
...

Marvell PHY drivers like the 88E1510 shown above does NOT provide its own probe().  Despite being a generic PHY library, libphy can manipulate MDIO (MII) defined registers, as in this example from genphy_suspend():

value = phy_read(phydev, MII_BMCR);
phy_write(phydev, MII_BMCR, value | BMCR_PDOWN);

I have not yet gotten a good explanation of the MII registers.

The phy_id and phy_id_mask can be used as a last resort to match up found device/device driver:

static int mdio_bus_match(struct device *dev, struct device_driver *drv)
{
  struct phy_device *phydev = to_phy_device(dev);
  struct phy_driver *phydrv = to_phy_driver(drv);
  if (of_driver_match_device(dev, drv))
    return 1;
  if (phydrv->match_phy_device)
    return phydrv->match_phy_device(phydev);
  return (phydrv->phy_id & phydrv->phy_id_mask)
      == (phydev->phy_id & phydrv->phy_id_mask);
}

In the of subsystem (the DTB world), the match is decided by the .compatible field in DTS, which is "marvell,88e1510" , but there is no matching hard coded string the kernel code, or the match_phy_device() callback, so the match is probably falling back all the way to the ID and mask in the bottom.  MDIO bus can query a PHY for its ID (<>/drivers/od/of_mdio.c).

"marvell,reg-init" is read in marvell_of_reg_init():
  • reg_page: 3
  • reg: 16
  • mask: 0xFF00
  • val_bits: 0x1E
The device driver does NOT talk to the PHY directly, but rather through the MDIO bus:

static inline int phy_read(struct phy_device *phydev, u32 regnum)
{
return mdiobus_read(phydev->bus, phydev->addr, regnum);
}
static inline int phy_write(struct phy_device *phydev, u32 regnum, u16 val)
{
return mdiobus_write(phydev->bus, phydev->addr, regnum, val);
}

Rest of the eth probe()

Resuming eth probe() again, the HW's MAC address is read with peripheral register read, and checked for validity.

eth creates a tasklet to reclaim the TX DMA buffers (tasklet function: xemacps_tx_poll), and a single thread work queue (a work queue that only runs on 1 CPU) to check TX timeout (work queue function: xemacps_reinit_for_txtimeout).

The device power management is activated and enabled.  I want to understand the PM state machine.

pm_runtime_set_active(&pdev->dev);
pm_runtime_enable(&pdev->dev);

Finally, threaded interrupt (one that splits the ISR into IRQ acknowledgement in the ISR itself and actual handling in a kernel thread) line is allocated with devm_request_irq().  But the thread handler is NULL!  What is the point??  Anyway, the ISR may be invoked from this point on.  The HW is NOT yet initialized; probe() just set up the device driver, but eth open() actually initializes the HW.  See the subsection on open() below.

xemacps_open()

The eth driver is opened in ip_auto_config (change the device flag to IFF_UP, at which time the network device's ndo_open virtual method is called), as you can see in this stack trace:

ARM Cortex-A9 MPCore #0 (Breakpoint)
0xc0347830 xemacps_open(): .../ethernet/xilinx/xilinx_emacps.c, line 2010
0xc04a7ee4 __dev_open(): net/core/dev.c, line 1261
0xc04a81c4 __dev_change_flags(): net/core/dev.c, line 5404
0xc04a82c0 dev_change_flags(): net/core/dev.c, line 5473
0xc0812ce0 ip_auto_config(): net/ipv4/ipconfig.c, line 236
0xc00089bc do_one_initcall(): init/main.c, line 696
0xc07dbcf4 kernel_init_freeable(): init/main.c, line 762
0xc0557904 kernel_init(): init/main.c, line 840
0xc000ed78 ret_from_fork(): arch/arm/kernel/entry-common.S, line 91
0x00000000
<select to see more frames>

xemacps_open() calls xemacps_descriptor_init(), which creates hard coded number of (256) TX/RX skb (socket buffer) and buffer descriptors (see core XEmacPs core concept earlier) that will be around for the whole life of the driver (therefore dma_alloc_coherent() is used).  This confirms that the buffer descriptors are shared between the CPU and the MAC peripheral.

struct ring_info {
struct sk_buff *skb;
dma_addr_t mapping;
size_t len;
};

xemacps_descriptor_init() also allocates (netdev_alloc_skb) a fixed sized (XEMACPS_RX_BUF_SIZE = 1536 byte) skb, and then dma maps that memory (dma_map_single).  Note that TX skb is NOT allocated by the driver; the higher layer should.  The driver just maps the TX buffer descriptors.

Next, driver registers again the pm (power management) runtime.

The driver tell the XEmacPs HW about the DMA descriptors in xemacps_init_hw(), as it initializes the XEmacPs dma_cr (DMA control register):

xemacps_write(lp->baseaddr, XEMACPS_RXQBASE_OFFSET, lp->rx_bd_dma);
xemacps_write(lp->baseaddr, XEMACPS_TXQBASE_OFFSET, lp->tx_bd_dma);

But don't you think it's strange: the send skb should be created by the higher layer and loaned to the driver, rather than the other way around!

xemacps_init_hw() also enables the HW interrupt:

regval  = XEMACPS_IXR_ALL_MASK;
xemacps_write(lp->baseaddr, XEMACPS_IER_OFFSET, regval);

Probe MII--the control interface to the PHY, where the PHY HW is initialized.

dma_map_single()

What happens under the hood when we map CPU accessible memory to the DMA?  For ARM, what I see in the debugger is that the architecture specific code to map the page to device is called (<>/arch/arm/mm/dma-mappings.c), and eventually this assembly code is called:

ENTRY(v7_dma_map_area)
add r1, r1, r0
teq r2, #DMA_FROM_DEVICE
beq v7_dma_inv_range
b v7_dma_clean_range
ENDPROC(v7_dma_map_area)

The CPU then syncs the cache.

eth ISR

The ISR irqreturn_t xemacps_interrupt(int irq, void *dev_id) is essentially a do-while loop, checking for the pending interrupts and acknowledging it. For TXCOMPL and TX_ERR interrupts, it schedules the DMA buffer reclaim tasklet.  For FRAMERX (RX complete), it will schedule a NAPI poll.

As discussed in the XEmacPs concepts section earlier, an interrupt may also come in because of rx overrun.  The ISR tries to flush a (the first?  last?) packet from the receive buffer, to make room.

spin_lock(&lp->nwctrlreg_lock); regctrl = xemacps_read(lp->baseaddr, XEMACPS_NWCTRL_OFFSET); regctrl |= XEMACPS_NWCTRL_FLUSH_DPRAM_MASK; xemacps_write(lp->baseaddr, XEMACPS_NWCTRL_OFFSET, regctrl); spin_unlock(&lp->nwctrlreg_lock);

It also schedules NAPI, because clearly, the kernel is not sucking out the received frames fast enough.

NAPI receive

During eth probe(), int xemacps_rx_poll(struct napi_struct *napi, int budget) was registered as the NAPI handler.  In it, the while loop elucidates the correct usage of the NAPI receive:

spin_lock(&lp->rx_lock); while (1) { [check receive status] [write 1 to the RX status bit to clear it; Zynq TRM p. 1155] if (regval & XEMACPS_RXSR_HRESPNOK_MASK) //Huh? dev_err(&lp->pdev->dev, "RX error 0x%x\n", regval); work_done += xemacps_rx(lp, budget - work_done); if (work_done >= budget) break; napi_complete(napi); /* We disabled RX interrupts in interrupt service * routine, now it is time to enable it back. */ xemacps_write(lp->baseaddr, XEMACPS_IER_OFFSET, XEMACPS_IXR_FRAMERX_MASK);
...


xemacps_rx

There is something strange in the receive: if the buffer descriptor's ownership bit (address[0]) is 0, it will create a new skbuf and dma_map_single() that memory.  That is:

new_skb = netdev_alloc_skb(lp->ndev, XEMACPS_RX_BUF_SIZE);
/* Get dma handle of skb->data */
new_skb_baddr = (u32) dma_map_single(lp->ndev->dev.parent,
new_skb->data, XEMACPS_RX_BUF_SIZE, DMA_FROM_DEVICE);
skb = lp->rx_skb[lp->rx_bd_ci].skb;
dma_unmap_single(lp->ndev->dev.parent,
lp->rx_skb[lp->rx_bd_ci].mapping,
lp->rx_skb[lp->rx_bd_ci].len, DMA_FROM_DEVICE);

Since we preallocated all DMA buffer up front in open() above, why is this necessary?  It's confusing at first, but after staring at the code more, I realized that the newly allocated skb is the REPLACEMENT for the skb the driver is going to loan out the the upper layer!

It occurs to me that performance can perhaps be increased by oversizing the pre-allocated skb, and juggling the spares against the used skb, rather than asking a new page every time, but perhaps the putative performance improvement is too small to justify...

Before moving on, unmapping the DMA pinned memory reveals interesting details about cache management.  dma_unmap_single(..., DMA_FROM_DEVICE) is where the magic mappens, through the kernel's dma_map_ops virtual interface:

struct dma_map_ops *ops = get_dma_ops(dev);
...
ops->unmap_page(dev, addr, size, dir, attrs);

On ARM, unmap_page() method is arm_dma_unmap_page:

if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
  __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
     handle & ~PAGE_MASK, size, dir);

Most DMA HW cannot afford to skip the synchronization (some may, and set DMA_ATTR_SKIP_CPU_SYNC).  So in the normal case, the outer cache (L2 cache) is invalidated:

if (dir != DMA_TO_DEVICE)
outer_inv_range(paddr, paddr + size);

dma_cache_maint_page(page, off, size, dir, dmac_unmap_area);

What's even more interesting is how the actual invalidation happens on a specific platform.  On my ARM cache controller, the cp15 (co-processor 15, which is the MMU HW) specific instruction is called:

__asm__("mrc p15, 1, %0, c0, c0, 1" : "=r" (l2ctype));

for (set = 0; set < CACHE_SET_SIZE(l2ctype); set++) {
for (way = 0; way < CACHE_WAY_PER_SET; way++) {
set_way = (way << 29) | (set << 5);
__asm__("mcr p15, 1, %0, c7, c11, 2" : : "r"(set_way));
}
}

dsb();

I will have to read more on the MMU and cache coherency to understand how invalidating the L2 cache also invalidate the L1 cache.

Note the last dsb() barrier call.  Starting with ARM arch, it uses the dedicated instruction DMB, while for older architectures, MMU (cp15) instruction is used.

#if __LINUX_ARM_ARCH__ >= 7
#define isb(option) __asm__ __volatile__ ("isb " #option : : : "memory")
#define dsb(option) __asm__ __volatile__ ("dsb " #option : : : "memory")
#define dmb(option) __asm__ __volatile__ ("dmb " #option : : : "memory")
#else
#define isb(x) __asm__ __volatile__ ("" : : : "memory")
#define dsb(x) __asm__ __volatile__ ("mcr p15, 0, %0, c7, c10, 4" \
   : : "r" (0) : "memory")
#define dmb(x) __asm__ __volatile__ ("" : : : "memory")
#endif

I know from dmesg that my __LINUX__ARM_ARCH__ IS version 7:

CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d

Because this string comes from:

#define MODULE_ARCH_VERMAGIC_ARMVSN "ARMv" __stringify(__LINUX_ARM_ARCH__) " "

Sending: xemacps_start_xmit()

The stack trace shows an example of sending an Ethernet frame: ARP message.

ARM Cortex-A9 MPCore #0 (Breakpoint)
0xc0346fc8 xemacps_start_xmit(): .../ethernet/xilinx/xilinx_emacps.c, line 2210
0xc04a6f98 dev_hard_start_xmit(): net/core/dev.c, line 2620
0xc04c3020 sch_direct_xmit(): net/sched/sch_generic.c, line 129
0xc04a750c __dev_queue_xmit(): net/core/dev.c, line 2732
0xc04a7828 dev_queue_xmit(): net/core/dev.c, line 2906
0xc04fceec arp_send(): net/ipv4/arp.c, line 688
0xc04fd7bc arp_solicit(): net/ipv4/arp.c, line 706
0xc04aeb14 neigh_probe(): net/core/neighbour.c, line 878
0xc04b0374 __neigh_event_send(): net/core/neighbour.c, line 1031
0xc04b0690 neigh_resolve_output(): include/net/neighbour.h, line 373
0xc04d246c ip_finish_output(): include/net/dst.h, line 421
0xc04d40a8 ip_output(): net/ipv4/ip_output.c, line 343
0xc04d38bc ip_local_out_sk(): include/net/dst.h, line 458
0xc04d4bb8 ip_send_skb(): include/net/ip.h, line 116
0xc04f774c udp_send_skb(): net/ipv4/udp.c, line 808
0xc04f9878 udp_sendmsg(): net/ipv4/udp.c, line 1026
0xc0503494 inet_sendmsg(): net/ipv4/af_inet.c, line 740
0xc048ce88 sock_sendmsg(): net/socket.c, line 634
0xc048cee8 kernel_sendmsg(): net/socket.c, line 687
0xc0535bd0 xs_send_kvec(): net/sunrpc/xprtsock.c, line 399
0xc0535c6c xs_sendpages(): net/sunrpc/xprtsock.c, line 465
0xc0535f2c xs_udp_send_request(): net/sunrpc/xprtsock.c, line 636
0xc0533d04 xprt_transmit(): net/sunrpc/xprt.c, line 937
0xc0531078 call_transmit(): net/sunrpc/clnt.c, line 1861
0xc0539968 __rpc_execute(): net/sunrpc/sched.c, line 751
0xc0539d0c rpc_async_schedule(): net/sunrpc/sched.c, line 825
0xc003e584 process_one_work(): kernel/workqueue.c, line 2227
0xc003f378 worker_thread(): kernel/workqueue.c, line 2353
0xc00454e4 kthread(): kernel/kthread.c, line 207
0xc000ed78 ret_from_fork(): arch/arm/kernel/entry-common.S, line 91
<select to see more frames>

Unbeknownst to me, there was a RPC service running, and it scheduled a work on a kernel thread (SOFTIRQ_TX?), which eventually calls gather send (xs_send_kvec).  I learned from this page that dev_queue_xmit() is the kernel's interface to a network device driver, but it turns out that is only roughly true: firstly, the dispatching to the network device's ndo_start_xmit() method happens in dev_hard_start_smit; secondly, long before we get to this point (so far up that I cannot make out exactly how far up), the device that will handle the skb has already been decided.

In xemacps_start_xmit(), a new dma mapping is made, to zero-copy send the skb that was created in the higher layer.  To begin send, eth just flips a bit in the net_ctrl register.

spin_lock_irqsave(&lp->nwctrlreg_lock, flags);
regval = xemacps_read(lp->baseaddr, XEMACPS_NWCTRL_OFFSET);
xemacps_write(lp->baseaddr, XEMACPS_NWCTRL_OFFSET,
(regval | XEMACPS_NWCTRL_STARTTX_MASK));
spin_unlock_irqrestore(&lp->nwctrlreg_lock, flags);

In sch_direct_xmit(), the close interaction of the device driver with upper network layer is evident: depending on the xmit status, the network layer may retransmit:

if (dev_xmit_complete(ret)) {
/* Driver sent out skb successfully or skb was consumed */
ret = qdisc_qlen(q);
} else if (ret == NETDEV_TX_LOCKED) {
/* Driver try lock failed */
ret = handle_dev_cpu_collision(skb, txq, q);
} else {
/* Driver returned NETDEV_TX_BUSY - requeue skb */
if (unlikely(ret != NETDEV_TX_BUSY))
net_warn_ratelimited("BUG %s code %d qlen %d\n",
    dev->name, ret, q->q.qlen);
ret = dev_requeue_skb(skb, q);
}

Note that xmit_complete does NOT mean that the HW has actually written out the bits on the wire; it just means that the skb has been transferred to the HW, in which case (as is true for xemacps) the skb are still loaned, and should be recovered in tx_bdreclaim tasklet, scheduled in xemacps_interrupt():

if (regisr & (XEMACPS_IXR_TXCOMPL_MASK |
XEMACPS_IXR_TX_ERR_MASK)) {
tasklet_schedule(&lp->tx_bdreclaim_tasklet);
}

PHY management

So far, we've talked almost exclusively about the MAC HW.  How does the PHY show up in the driver code besides in HW startup, when we connect the PHY?

One place is ioctl support:

switch (cmd) {

case SIOCGMIIPHY:

case SIOCGMIIREG:

case SIOCSMIIREG:

return phy_mii_ioctl(phydev, rq, cmd);

#ifdef CONFIG_XILINX_PS_EMAC_HWTSTAMP

case SIOCSHWTSTAMP:

return xemacps_hwtstamp_ioctl(ndev, rq, cmd);

#endif


Another place is ethtool support, to change properties like speed, duplex, autonegotiation.

From the stack trace, the kernel has a phy library (<>/drivers/net/phy/phy_device.c) which apparently has a state machine, requiring periodic PHY status update.

ARM Cortex-A9 MPCore #0 (Step Into)
0xc0346c50 xemacps_mdio_read(): include/linux/pm_runtime.h, line 213
0xc034397c mdiobus_read(): drivers/net/phy/mdio_bus.c, line 263
0xc0342958 genphy_update_link(): include/linux/phy.h, line 577
0xc03449f8 marvell_read_status(): drivers/net/phy/marvell.c, line 704
0xc0341bb8 phy_state_machine(): include/linux/phy.h, line 666
0xc003e584 process_one_work(): kernel/workqueue.c, line 2227
0xc003f378 worker_thread(): kernel/workqueue.c, line 2353
0xc00454e4 kthread(): kernel/kthread.c, line 207
0xc000ed78 ret_from_fork(): arch/arm/kernel/entry-common.S, line 91
0x00000000

Apparently, even talking to the PHY goes through the XEmacPs register (PHY maintenance register).

SDIO

Some Zynq device must support WIFI over SDIO, because the Xilinx Linux driver wiki page lists the Atheros 6kl, in <kernel>/drivers/net/wireless/ath/ath6kl/sdio.c.  I was trying to find out how WIFI over SDIO worked, and even bought a EoL WIFI card WL11-SD (sporting an integration of the CGUYS CG100V3 SDIO interface IC, TI TNETW1100B WIFI application processor, and Maxim MAX2821A 802.11b PHY) to better understand SDIO drivers, as you can see below:
Unlike the completely integrated AR600x, this Ambicom WL11-SD is an amalgamation of 3 chips, 2 of which I do cannot get the datasheet for.  Similarly frustrating, I cannot seem to find datasheet for AR600x chips, but let's study the code to learn the SW side the things AFTER reading the architecture "document".  Resulting terminologies:
  • BMI (bootloader message interface): used to download an application to ATH6KL, to provide patches to code that is already resident on ATH6KL, and generally to examine and modify state.  The Host has an opportunity to use BMI only once during bootup.  Once the Host issues BMI_DONE command, this opportunity ends.
  • Physical interconnect (bus): The HIF layer relies on underlying interconnect-specific and platform-specific software to drive a hardware controller of some sort.
  • HIF (host interconnect framework): HTC calls into the HIF layer when it needs to access the chipset address space. An HIF implementation exists for each combination of platform and interconnect API (e.g., HIF for Linux standard SDIO/MMC stack). HIF abstracts away register and memory access details and provides an interconnect-independent and platform-independent API for use (mainly) by HTC.
  • HTC (host <--> target communication): The wireless device driver calls into HTC to handle message transport. HTC does not understand the contents of messages it transports (only WMI understands the contents of control messages), but it does understand the mechanics of messaging with the AR600x chipset. It handles flow control and knows which chipset addresses must be read and written to relay messages.
    • Endpoint?
    • Tag?
    • Cookie?
    • Credit?
  • WMI (wireless module interface): host <--> target messaging protocol--the commands and requests/events.
    • sub-type?
  • cfg80211
  • Wireless device driver: handles both the vendor specific proprietary ioctls and the standard ones defined under wireless extensions, and implements the CFG80211 APIs (supporting nl80211 based applications).  Basically, shuttles data between the HTC layer and IP stack.
  • VIF: ? Atheros 600x only support up to 3
  • WiPhy?

Driver's firmware interface

Firstly, the firmwares and data files as a place with the kernel image to record the locations of the binary files, as you can see in the example below:

#define __MODULE_INFO(tag, name, info)  \
static const char __UNIQUE_ID(name)[]  \
  __used __attribute__((section(".modinfo"), unused, aligned(1)))  \
  = __stringify(tag) "=" info


/* Optional firmware file (or files) needed by the module
 * format is simply firmware file name.  Multiple firmware
 * files require multiple MODULE_FIRMWARE() specifiers */
#define MODULE_FIRMWARE(_firmware) MODULE_INFO(firmware, _firmware)

MODULE_FIRMWARE(AR6004_HW_1_3_FW_DIR "/" AR6004_HW_1_3_FIRMWARE_FILE); MODULE_FIRMWARE(AR6004_HW_1_3_BOARD_DATA_FILE); MODULE_FIRMWARE(AR6004_HW_1_3_DEFAULT_BOARD_DATA_FILE);

This location should be relative to the ?, as explained here.

In my Zedboard kernel, the FW related kernel configs are:

CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="ad9517.stp adau1761.bin imageon_edid.bin" CONFIG_EXTRA_FIRMWARE_DIR="firmware" # CONFIG_CYPRESS_FIRMWARE is not set # CONFIG_DRM_LOAD_EDID_FIRMWARE is not set # CONFIG_FIRMWARE_EDID is not set
CONFIG_FW_LOADER=y

CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"

This list does NOT cover the Atheros FW listed in <kernel>/drivers/net/wireless/ath/ath6kl/sdio.c, so I looked into how the adau1761.bin (the FW for the ADAU1761 I2C sound codec chip) is loaded (or whether it's even loaded), and found it <>/sound/soc/codecs/adau1761.c:adau1761_codec_probe():

  ret = adau17x1_load_firmware(adau, codec->dev, ADAU1761_FIRMWARE);

which calls the firmware API described in LDD3:

  ret = request_firmware(&fw, name, dev);

I cannot yet make out whether the FW is compiled into the kernel image, or loaded from the user space with some helper.  I'd guess the former, but this is another case where the kgdbwait would have helped.  I might have to enable CONFIG_ATH6KL_DEBUG and CONFIG_ATH6KL_TRACING in kernel config... And why doesn't "debug", "silent", or "ignore_loglevel" boot args change what I see in dmesg?

SDIO device data structure (<>/include/linux/mmc/sdio_func.h)

SDIO belongs to another card called MMC?  Wikipedia suggests they are competing technologies.

struct sdio_func {
struct mmc_card *card; /* the card this device belongs to */
struct device dev; /* the device */
sdio_irq_handler_t *irq_handler; /* IRQ callback */
unsigned int num; /* function number */
...


ath6kl driver probe()

The device driver is registered through module_init(), called from __initcall infrastructure:

static const struct sdio_device_id ath6kl_sdio_devices[] = {
...

MODULE_DEVICE_TABLE(sdio, ath6kl_sdio_devices);


static struct sdio_driver ath6kl_sdio_driver = {

.name = "ath6kl_sdio",

.id_table = ath6kl_sdio_devices,

.probe = ath6kl_sdio_probe,

...



static int __init ath6kl_sdio_init(void) {

ret = sdio_register_driver(&ath6kl_sdio_driver);

...



module_init(ath6kl_sdio_init);

That is, the device driver's module init registers the driver to the SDIO bus core, which is presumably inited before the drivers.  One of the responsibilities of a bus is to match up discovered devices against drivers (the match interface described in LDD3).  For SDIO, the match comes boils down to checking all IDs an SDIO device driver declared (see ath6kl_sdio_devices[] table above) against the device's class, vendor, device fields:

ids = sdrv->id_table;
if (ids) {
  while (ids->class || ids->vendor || ids->device) {
    if (sdio_match_one(func, ids))
      return ids;
    ids++; /* No match?  Try the next table element */
  }
}

If there IS a match, the kernel invokes probe(struct sdio_func *func, const struct sdio_device_id *id), to [note that sdio_func is an SD-ish for "device", according to SD Specifications Part E1, SDIO Simplified Specification]:
  • kzalloc(GFP_KERNEL--NOT GFP_DMA) 32KB DMA buffer
  • Init spin locks and mutex
  • Init scatter request list, free list, and asynchronous write queue
  • Init asynchronous write work queue
  • Init "irq_wq"--not sure what this is for
  • Free up all (64) bus request (tokens?)
  • Create ath6kl "core" (refactored code)
    • Create new ath6lk_cfg80211: a new wiphy (wireless hardware description) for use with cfg80211 (cfg80211 is the Linux 802.11 configuration API).  According to the cfg80211 author, cfg80211 sits above MAC80211 device driver.
    • Init skb_queues (psq, apsdq, mcastpsq??)
    • ap_country_code
    • Too many objects to list all...
  • Connect common operation function pointers to driver's implementation.  HIF = host interface (SDIO/USB)?
    ar->hif_ops = &ath6kl_sdio_ops;
  • Setup mailbox (just a bunch of registers) to the SDIO device
  • Configure SDIO
    • sdio_claim_host(func): exclusively take a bus (called the host here; dependent on func): reuses mmc_claim_host(func->card->host)
    • Enable 4-bit ASYNC interrupt with FUNC0  CMD52 (IO_RW_DIRECT) to interrupt mode register CCCR_SDIO_IRQ_MODE_REG.  In the end, this calls the mmc_host_ops.request() method , but due to the "pure virtual" interface arrangement, it's hard to figure out from code browsing what which class is actually implementing this interface.
    • sdio_set_block_size(): mmc sdio lib function
    • sdio_release_host(func)
  • ath6kl_core_init(ar, ATH6KL_HTC_TYPE_MBOX)
    • What is the difference between MBOX and PIPE?  Probably completely HW specific
    • htc_mbox_attach: setup HTC ops (function pointers)
    • Create "ath6kl" singlethread_workqueue
    • ath6kl_bmi_init()
    • ath6kl_hif_power_on(): see the implementations below
    • ath6kl_bmi_get_target_info() --> ath6kl_hif_bmi_read
    • ath6kl_init_hw_params(): based on HW version, fill the HW data structure with hard-coded table values (hw_list), which includes fiels like clock rate, and FW locations
    • ath6kl_htc_create()
      • "create" mbox for HW
      • kzalloc
      • Init spin_locks for HTC, RX, TX
      • Init lists: free_ctrl_txbuf, free_ctrl_rxbuf, cred_dist_list
      • setup host interface
        • Disable HIF interrupt with hif_read_write_sync().  There are apparently 4 interrupt status registers:
        • struct ath6kl_irq_enable_reg { u8 int_status_en; u8 cpu_int_status_en; u8 err_int_status_en; u8 cntr_int_status_en;} __packed;
    • ath6kl_init_fetch_firmwares()
      • ath6kl_fetch_board_file()
      • ath6kl_fetch_testmode_file()
      • ath6kl_fetch_fw_apin() or ath6kl_fetch_fw_api1()
    • ath6kl_wmi_init()
    • ath6kl_cookie_init()
    • ath6kl_debug_init()
    • ath6kl_init_hw_start()
    • ath6kl_cfg80211_init()
    • ath6kl_debug_init_fs()
    • Add an initial station interface
    • ath6kl_recovery_init()
When removing the module, the asynchronous write work queue has to be synced; it must mean that close() doesn't do that:

ath6kl_stop_txrx(ar_sdio->ar);
cancel_work_sync(&ar_sdio->wr_async_work);

ath6kl_core_cleanup(ar_sdio->ar);
ath6kl_core_destroy(ar_sdio->ar);

NOT open()/close(), but ndo_open()/ndo_stop()

The driver does NOT implement open() and close() methods?!  This is when running with high printk verbosity, or turning on all event tracing may help...

After browsing the network packet data flow description and finding ndo_start_xmit(skb, dev) in <>/net/core/dev.c:dev_hard_start_xmit(), I realized that network devices do not have open()/close() but net_device_ops::ndo_init()/ndo_open()/ndo_stop() interface callbacks.  net_device_ops interface (described in LKN--Linux Kernel Networking Appendix, p. 500) is huge!  ath6kl device implements only a fraction of this interface (in <>/drivers/net/wireless/ath/ath6kl/main.c), including ndo_open() and ndo_stop(), which are called when the network device is going up/down.

static const struct net_device_ops ath6kl_netdev_ops = {
.ndo_open               = ath6kl_open,
.ndo_stop               = ath6kl_close,
.ndo_start_xmit         = ath6kl_data_tx,
.ndo_get_stats          = ath6kl_get_stats,
.ndo_set_features       = ath6kl_set_features,
.ndo_set_rx_mode = ath6kl_set_multicast_list,
};

Note how the open and stop (close below) are slightly different:

ath6kl_open()ath6kl_close()
struct ath6kl_vif *vif = netdev_priv(dev);

set_bit(WLAN_ENABLED, &vif->flags);

if (test_bit(CONNECTED, &vif->flags)) {
netif_carrier_on(dev);
netif_wake_queue(dev);
} else
netif_carrier_off(dev);

struct ath6kl_vif *vif = netdev_priv(dev);

netif_stop_queue(dev);

ath6kl_cfg80211_stop(vif);

clear_bit(WLAN_ENABLED, &vif->flags);

power_on()/power_off()

These are NOT standard power management operations, because they take struct ath6kl.  power_on() is called from ath6kl_init_hw_start() and ath6kl_init_hw_restart(), which are called a few places, including ath6kl_core_init(), ath6kl_cfg80211_resume().