Showing posts with label Zedboard. Show all posts
Showing posts with label Zedboard. Show all posts

Jun 24, 2015

Bare metal code to read ADC on Zynq

In a previous blog entry, I studied interacting with the on-chip ADC peripheral using the Xilinx XADC device driver.  I was able to to understand device driver, and get results out, but I prefer real-time HW control, with minimum code between the HW and the data collection.  Xenomai patch will yield a nearly real-time behavior (as I reported in a real-time jitter test of various Linux options), and even though Linux device driver framework is a wonderful SW (literally; whenever I stare at the Linux kernel code, I think: "how did they pull all of this off?"), I just don't want to run all that code just read a few bits, when the HW itself is relatively simple.

Aside from my minimalist preference, there is a concrete benefit to bare metal code: certifiability; when I was in the defense industry for a few years after school, the software certification cost was running at about $100~$150 per LOC.  I am sure it is more now.

Status (so you don't have to read the whole thing)

  • Able to read the dedicated analog input and 15 other auxiliary channels in channel sequencing mode, kicked off from a timer interrupt (to turn off XADC when not in use).
  • Have a thermistor circuit that can connect to the Zedboard.
  • Reading out the XADC seems to take too long, even though it's just a bunch of memory mapped register read/write operations.  I think it may be due to the XADC wizard performing a read over the DRP FIFO internally...

XADC peripheral

To recap, the 2 XADC peripherals on Zynq is (as described in the Zynq TRM) a 12-bit 1M sample/s HW, with up to 17 external analog channels each.  There is a not-so-high-quality on-chip reference voltage is good enough for the internal temperature and voltage sensors, but the TRM recommended an external 1.25 V reference IC.  The HW has these potentially useful features:
  1. Remembering the maximum/minimum values in dedicated registers--until the next XADC reset.
  2. Interrupting based on threshold breach--so that the SW filtering can be bypassed.  The maskable alarm is latched in the XADCIF_INT_STS register, and is cleared by writing to the same register.
  3. Averaging in HW: if the system's anti-aliasing filter BW is higher than the application's sampling BW, sampling the ADC fast and then averaging in HW may be the only option.
Q: why does UG480 say: "auxiliary analog inputs do not require any user-specificied constraints or pin locations...  All configuration is automatic when the analog inputs are connected to the top level of the design."?

ADC sampling is divided into 3 phases:
  1. Acquisition: the capacitor in the ADC HW is charged from the input source.  Depending on the input impedance, this may take longer than the default 4 ADCCLK.
  2. Conversion: the charge in the capacitor is converted to digital value, and written out to the appropriate register.
  3. Readout: SW should read out from the register--unless a streaming arrangement is made ahead of time.
In sequencing mode, the acquisition for the next channel is overlapped with the conversion of the current channel (the pipeline concept).  If the acquisition requires more time, the conversion for the next channel can be made to start 10 ADCCLKs later by asserting the ACQ in the control register.

Control interface

The peripheral can be accessed through 2 bus (you have to choose):
  • PS-XADC: directly from the processor, through the APB bus.  Commands are serialized through a 15-word deep FIFO, and against through a read FIFO on the way back.  This is a lower speed interface, but more convenient than setting up the AXI channel to the XADC. 
  • PS-to AXI ADC: through the AXI bus.
It appears that to enable the PS-XADC interface, the XADCIF_CFG[ENABLE] bit has to be asserted, to enable the interface arbiter to choose the PS-XADC (instead of PL-JTAG).  The serial interface is clocked by the PCAP_2x clock from the PS clocking subsystem (nominally 200 MHz), but divided by 4 * XADCIF_CFG[TCLK_RATE], to guarantee the serializer clock rate to maximum of 50 MHz. The interface is complicated slightly by its SPI-like behavior: the HW pushes out data to the read FIFO (read through the XADCIF_RDFIFO register) only when the write FIFO is written to, so that a NOOP command (XADCIF_CMD[29:26] = 4'b0000) is necessary to read the value of a single read.  In a continuous read case, such NOOP would not be required.  I can see that concretely in the XSDK BSP generated examples.

XSDK BSP generated PS XADC example

Q: why is the INTR_ID in the generated code 39?
A: BSP thinks I am using sysmon feature to talk to the XADC.  xparameters.h:

#define XPS_SYSMON_INT_ID 39

It also thinks my base address is different than what I was assigned!  XADC inteferface configuration is apparently not the same as the AXI 4 Lite interface to the XADC wizard IP.

#define XPAR_PS7_XADC_0_BASEADDR 0xF8007100

This register is shown in the Zynq TRM, XADCIF_CFG, in section B.16, Device Configuration Interface.  What is confusing about this at first is that this is NOT the direct interface to the XADC HW itself, but rather the SPI-like FIFO to push the commands to.  To control the XADC from the CPU, several writes to the device configuration register blocks were required:
  1. Write a magic value (0x757BDF0D) to the unlock register (offset 0x34 in the configuration register base address of 0xF807000).  This write is a simple memory mapped register write.
  2. And then the example writes to the XADCIF_CFG register (0xF8007100), turning on the following bits:
    1. ENABLE
    2. CFIFOTH = 0xF
    3. DFIFOTH = 0xF
  3. Clear the miscellaneous control (MCTL), to release XADC reset
To self-test the XADC:
  1. Reset XADC by bouncing the reset bit in MCTL register just discussed above.
  2. Write some values into the alarm threshold (like VCCINT_UPPER, which is at 0x51 in the XADC internal register banks) and read it back.  This tests the command FIFO to the XADC.  Internally, this is what happens during write/read:
    1. Write
      1. 32 bit message is formatted into this 32-bit JTAG message
        1. MSB: write (0x08) or read (0x04)
        2. register address left shifted by 16 bits (address bits are position [25:16])
        3. 16-bit data in the 2 LSB
      2. Write into the command FIFO (@ 0xF8007110).
      3. Read the Read FIFO (@ 0xF8007114) after any write since for each write, one location of Read FIFO gets updated.  Throw the read value away.
    2. Read
      1. Create a dummy data (0) to write into the write FIFO: 0x04000000 | (reg offset << 16)
      2. Write into the CMDFIFO
      3. Read from the RDFIFO TWICE, and keep only the 2nd read
To configure the XADC for real now,
  1. Set sequencer mode to the default mode (value 0 to CFG1[15:12]), to prevent alarms from tripping.
  2. Disable all alarms (value 0 to CFG1[11:0]).
  3. Restore the on-chip temperature and voltage alarms.
  4. Register SCU interrupt handler.  This interrupt handler reads the XADC int status register @ 0xF8007104 (which only tells you about the FIFO threshold, overtemp, or alarm)--these are not what I am interested in.
  5. Change the sequencer mode again.
  6. Read internal registers, like temperature (0x00) and Vccint (0x1)
While it IS possible to control the XADC HW directly through the built-in control path in the PS, it appears that going through the memory mapped XADC wizard register will be simpler (next).

XSDK BSP generate sysmon example

To reset the XADC wizard IP (I guess Xilinx calls it sysmon), example writes 0xA to the XADC wizard base address.  Self test consists of writing and reading back the Vccint alarm threshold--over memory mapped 32-bit register access.  To configure the sequencer, write safe (default) sequencer mode to the CFR1 (configuration register 1) bits [15:12] @ offset 0x304 from the XADC wizard base address.  This is apparently necessary before enabling channels (by writing to the group of 8 registers @ offset 0x320 ~ 0x33C).  The format of these channel sequence registers are NOT given in PG091 (XADC wizard IP documentation), but rather in UG480 (XADC HW documentation), Chapter 4, Automatic Channel Sequencer.  To add even more confusion, sequence 8 and 9 configuration registers are NOT even continuous with the rest, but in what PG091 calls "test register" block: @ offset 0x318 and 0x31C, and therefore the bits of the register are not even documented.

XADC clock rate is configurable by changing the clock divisor in XADC wizard configuration register 2, @ offset 0x308 (documented in UG480, table 3-6).

Config register @ offset 0x300 MUX bit must be asserted--with an associated input channel--to use the external mux feature.

The BSP generated example only checks for EOS (end of sequence), and reads out all channels.  When reading the converted value, one generally reads from blocks at offset 0x200 through 0x2B8.  What is NOT obvious from the documentation that is clarified in the example code is that while the XADC input is shared for different channels, the conversion results are stored in the respective readout registers according to the channel.  Curiously, the example does NOT right shift the 32 bit readout value, even though the documentation says that the data is MSB justified.

I/O

As you can see in this image, the XADC can become an IO pin hog.  If I wanted to sample 16 channels, I would have to sacrifice 32 I/O pins (UG480: "all analog input channels are differential and require two package balls...  The analog inputs of the ADC use a differential sampling scheme to reduce the effects of common-mode noise signals")!
Vp/Vn pins should be grounded if not used.  Normally, the Vivado XADC wizard will only expose used pins, according to the channel selection mode.  But it seems that the XADC wizard always instantiates the Vp/Vn pins.

A way to reduce the I/O pin count is to use an external multiplexer.  For example, when in simultaneous mode, two 3-bit multiplexer chip (something I have to solder outside Zynq chip) can select among the 8 pairs of input channels, as you can see below:
XADC wizard configuration does not seem to expose this feature, because I cannot select a pair of inputs into the XADC wizard module...

Zedboard XADC

That the Zedboard is almost the "Cadillac of the Zynq eval boards" can be seen when you see the external reference voltage available for ADC, and that the analog ground is decoupled from the digital ground with a ferrite bead.

Reference voltage

ADCs need reference voltage; sometimes for the external circuit, but often for internal digitization implementation as well.  XADC reference voltage requires 1.25 V, which should be placed as close as possible to the reference pins and connected directly to the V REFP input, using the decoupling capacitors recommended in the reference IC data sheet.

Zedboard is generating the 1.25V Vref, as you can see here:
This Vref is available for circuits that need high quality reference voltage (like thermistors) on the Zedboard's XADC header (pin 11):
If I want to use some other reference voltage than the Zedboard generated one above, I can just lift R186 below, and connect pin 11 above to the desired reference voltage.
For XADC on Zynq, Vref pin should be tied to ground if internal reference is to be used.

The DXP/N (XADC-DXP/N: pins 7/12; according to Zynq TRM Table 2-13 PL Pin Summary, these are temperature sensing diode pins) are bit of an anachronism; according to a Xilinx forum discussion, they should not be used any more.

XADC input header on Zedboard

As shown in the XADC header pin assignment above, there are 3 differential pairs of ADC input accessible on the XADC header Zedboard (DXP/N are completely independent of XADC):
  1. XADC-VP/N: dedicated analog input pair on pins 2/1.  There is only 1 such dedicated analog input on the XADC.  When not used, these should be shorted to ground.
  2. XADC-AD0_P/N: pins 3/6
  3. XADC-AD8_P/N: pins 8/7
Note that the pairs 0 and 8 can be SIMULTANEOUSLY sampled if using the simultaneous selection mode in the XADC wizard.  These are are low-pass filtered on Zedboard with R=100 Ohm and C=1nF as shown below, so that the filter bandwidth is 10 MHz.
This is hardly sufficient for low frequency signals like those from an accelerometer, for which 100 Hz BW is more appropriate and can be accomplished by using a 10 kOhm resistor and a 1 uF capacitor combination.  This is easily possible with an extra 1 uF capacitor across the P/N pins, and a 1 kOhm resistor on the positive pin.

An FMC connector breakout board allows access to all other differential input pairs, all of them on bank 35 (VADJ).

Analog ground

When I looked at the accelerometer signal on a scope, I connected the analog ground to the Zedboard's digital ground.  It is NOT advisable to use the digital ground as an analog ground reference for XADC, because the ground "shakes" in response to the high frequency digital currents.  In an effort to improve the ADC performance, a dedicated supply and ground reference is provided on the Zedboard: A ferrite bead filters out the high frequency noise from the command ground is fed to the Zynq's analog ground input pin GNDADC_0, as you can see below.
The jumper J12 lets you bypass this ferrite bead, although I cannot imagine why you would want to do that.  It is this ground that I want to use for the analog ground of the accelerometer, so I should short JP12's pin 1 to the common analog ground output of the accelerometer AND the negative pins of the XADC's differential input pins XADC-VN, XADC-AD0_N, XADC-AD8_N in the XADC header.

I cannot directly connect the accelerometer output's 3.3V to the XADC positive pins, because the maximum voltage difference between the XADC P/N pair is 1 V; I have to use a voltage divider instead; something like 2.5 kOhm and 1 kOhm should work.

Including XADC IP in a Zynq HW design

While Zedboard itself is ready for XADC, the XADC is an optional IP in Zynq; the "Ubuntu on Zedboard" Zynq reference HW design I have been using for the bare metal, hard-real-time SW development till now LACKS the XADC block.  To include it, open the system.bd (block diagram) in a Vivado project--> click the "Add IP" icon (the one with the + symbol) on the left toolbar -->  Enter "xadc" in the search window --> double-click on XADC Wizard --> double-click on the resulting IP block to configure it.  To understand the configuration options for the XADC Wizard, read Xilinx document PG091, rather than UG480 which explains the HW level primitive that the wizard encapsulates.  The following setup targets human-machine interface application that uses accelerometer and mic, and 1 thermistor for ambient temperature sensing.
  • Basic tab
    • Leave the interface as AXI4Lite, to control XADC from the SW through memory mapped registers
    • Startup channel: sequencer, with MUX, with Vp/n and all Vaux pairs enabled (except for vaxu4 pair, which cannot be selected for some reason, as shown below)
      • Single channel mode is inefficient if many channels need to be sampled, because setting up a channel, and then waiting for the conversion takes many clock cycles (26 ADCCLKs for signal acquisition and conversion in a continuous mode, but commanding the XADC over the shared and slow AXI4Lite bus will take many more cycles).  The biggest problem with the single channel mode is that a channel is selected (fixed) in the wizard, so there is no chance to monitor multiple channels without using an external mux.
      • In all products I worked on, there are MANY ADC channels to monitor, so either a sequencer or simultaneous (sequencing) mode are used.
        • In simultaneous mode, 8 pairs channels can be converted simultaneously.  The pairs are hard coded in HW to 0/8, 1/9, ..., 7/15.  Note that Vp/Vn pair cannot be sampled in this mode.  In XADC wizard IP configuration window, selecting simultaneous selection does NOT add another mux input, adding to the confusion.
        • Independent ADC is necessary if you need to monitor the Zynq die voltage/temperature constantly, even while monitoring external channels.
    • Temp bus if for monitoring the DRAM internal temperature, which I am going to ignore for now.
    • Timing mode: event mode vs. continuous; since temperature does not change that fast, there is no need to continuously sample the channels at maximum throughput.  Actually, I've used continuous sampling in the past mostly for convenience.
      • Event mode trigger: convst_in.  Since the IP acts on the OR of the convst_in and the CONVST register (which I will control), I conected convst_in to a Const IP module to pull it up. 
  • ADC setup
    • I enabled external mux even though I only want to sample 1 channel (external thermistor) for now, because a real application will have many channels. I will use the dedicated analog input channel Vp/n.
  • Alarms tab: turn off all temperature and voltage alarms, to reduce the pin count of the core. 
    • Note: a simple control application implements its own alarm in SW, but some safety critical application may want a SW independent alarm trip (that can inhibit the circuit that is driving significant current/voltage).
When using the mux, the SW cannot control which channel to sequence: the XADC picks the next MUX channel, as you can see from this screenshot of UG480:
My understanding is that the core remembers the current channel, and advances to the next one.  XADC can (in simultaneous sampling mode) sample 2 simultaneous inputs, with 2 muxes.

The resulting XADC interface, with only the pins I connected (see explanation below) are shown:
Since the interrupt goes high for any of the interrupt conditions that can be enabled through the interrupt enable register, the eoc_out, alarm_out, eos_out seem redundant.  But because I am only interested in the eoc for now (or eos, if I use the channel sequencer in the future), I connect eoc_out to the only free pin left in the interrupt concatenator (sys_concat_intc[2]).  As explained in a previous blog, the Zynq interrupt ID for this interrupt line would be 89.

The existing "Ubuntu on Zedboard" already contains an AXI interconnect for the AXI Lite interfaces.  I keep adding more IPs to this multiplexer, as shown in a previous blog.  I just add another AXI4Lite port, and connect that to the s_axi_lite interface shown above.  As before, I let Vivado assign register address for this new module, and obtain 0x43C10000 as the base address.

Although I will only sample 1 channel in this investigation and therefore do NOT need a MUX, I will bring out the MUX to the top level for posterity's sake.  muxaddr_out should be used with an external 5 bit mux, to switch between channels.  While this approach saves pins, this presents a problem to a multi-axis device such as an accelerometer, whose 3 axis should ideally be sampled simultaneously.  Even with the simultaneous sequencing mode, XADC cannot sample more than 2 channels at the same time.  To expose muxaddr_out bus to the top level, right click on the bus in the system diagram view --> make external --> rename to XADC_mux.  Similarly, make the ADC input channel Vp_Vn external (and optionally, rename the interface to XADC_in).

Conversion can be started by driving const_in high, but I left it disconnected because I will start the conversion from the SW.

After generating the HDL wrapper (right click on the block design --> Sources --> Hierarchy view --> system.bd --> Generate HDL wrapper), these are the new interface wires to the system:

  input XADC_in_v_n;
  input XADC_in_v_p;
  output [4:0]XADC_mux;

These wires should be brought out to the top level Verilog file verbatim, so the block diagram's pins can be constrained.  Therefore, the top level module will just copy these wires verbatim as well.  Finally, they have to be constrained.  This is what PG091 said about constraining the XADC input pins: "VP/VN and 16 VAUXP/VAUXN pin pairs do not need LOC constraints to be specified in XDC.  VP/VN is a dedicated input and VAUXP/VAUXN I/Os are dual mode I/O for 7 series FPGAs.
Vivado tool performs placement of these analog inputs automatically, but VP/VN and 16 VAUXP/VAUXN pin pairs need analog I/O standard constraints for the implementation."  These are the Zynq XADC VP/N pins I read off the Zedboard schematic:
  • VP: L11
  • VN: M12
Since these reserved pins are in bank 0, where all other pins seems to be connected to 3.3 V, I guessed at the IO standard.  Contrary to the documentation, I found that I had to locate the Vp/n pins for implementation (DRC: design rule check) to pass.

set_property  -dict {PACKAGE_PIN  L11   IOSTANDARD LVCMOS33} [get_ports XADC_in_v_p];
set_property  -dict {PACKAGE_PIN  M12   IOSTANDARD LVCMOS33} [get_ports XADC_in_v_n];

The mux output can be placed anywhere, so I chose the unused OLED pins.
  • XADC_mux[0] --> V20
  • XADC_mux[1] --> U20
  • XADC_mux[2] --> V19
  • XADC_mux[3] --> V18
  • XADC_mux[4] --> AB22
VGA R, G, B channels mix multiple bits into analog value as you can see below, so the resistors will have to be removed if the mux will actually be used in the future.

set_property  -dict {PACKAGE_PIN  V20   IOSTANDARD LVCMOS33} [get_ports XADC_mux[0]];
set_property  -dict {PACKAGE_PIN  U20   IOSTANDARD LVCMOS33} [get_ports XADC_mux[1]];
set_property  -dict {PACKAGE_PIN  V19   IOSTANDARD LVCMOS33} [get_ports XADC_mux[2]];
set_property  -dict {PACKAGE_PIN  V18   IOSTANDARD LVCMOS33} [get_ports XADC_mux[3]];
set_property  -dict {PACKAGE_PIN  AB22  IOSTANDARD LVCMOS33} [get_ports XADC_mux[4]];

After a successful bitstream generation and export (of the hardware definition and the bitstream), the memory mapped registers for xadc_wiz_0 appears at 0x43C10000, which the SW can then access.

A thermistor circuit driven by 1.25 Vref

For this experiment, I will use a 10 KOhm thermistor (this is the resistance at 20 C), which has an inverse relationship between the measured temperature and the resistance.  The input to the XADC Vp should be the high voltage side of the thermistor and the Vn should be the analog ground, in a voltage divider configuration, like this:
The 5 KOhm value shown above is just an example.  To properly size the voltage divider resistor, we have to know the expected operating temperature, and more importantly, the nominal resistance of the thermistor at those 2 extremes.  The Vp-Vn into the XADC will be the greatest when the thermistor resistance is high--that is, when the temperature at themistor is low.  Let's say we want to measure between 0 C to 50 C.  An NTC themistor I have the following resistance values:
  • @ 0 C,  32336 Ohm
  • @ 50 C, 3635 Ohm
  • @ 100 C, 700 Ohm
 If I want Vp-Vn to be 1.0 V at 0 C, then I have to solve for the voltage divider equation: 32336/(32336 + R) = 1.0/1.25 => 32336 * 1.25 = 32336 + R => R = 32336 * (0.25) = 8084.  Using a 10 kOhm resistor, these are the Vp-Vn values I would see:
  • @ 0 C, Vref * 32336 / (32336 + 10k) = 0.954 V
  • @ 50 C, Vref * 3633 / (3633 + 10k) = 0.333 V
The ln(Rthermistor) is NON-linear, and the slope flattens out at higher temperature; on the cold side.  At the highest temperature, the thermistor resistance is roughly 3 kOhm, so putting a 1 uF capacitor across Vp-Vn would give me a low-pass filter with bandwidth of 1/(3k * 1E-6) ~ 330 Hz, which would reject a lot of high frequency noise.  1 uF capacitor would do an even better job of high frequency noise rejection.
The low-pass resistor 100 kOhm introduces a bias term (on the order of 1/10th) during transient.  A larger resistor (and a smaller capacitor) would reduce the error.

Bare metal C++ to read XADC wizard

Most ADC examples you will find on the web uses continuous sequencing, which is easier to program than "sequence only when needed" but wastes power.  In this code, I kick off a channel sequencing (sampling multiple channels in sequential order) in a system tick timer.  Since the XADC HW should be completely deterministic, the data should be available at a deterministic time WRT when it is needed (extra delay between ADC sampling and control value computation is a seldom discussed but a subtle contributor to a control loop instability).

The XADC wizard HW interface is completely over the AX4Lite shared bus, shown above.  Using the base register address assigned by Vivado, a generic macro to access the 32-bit register is:

#define XADC_WIZARD(offset) (*(volatile uint32_t*)(0x43C10000 + offset))

POST (power-on-self-test)

It is a good practice to perform some POST in FW, by explicitly resetting the HW to the default state and performing a basic "wiggle toe" test.

    XADC_WIZARD(0) = 0xA;//reset is active for 16 clock cycles; see PG091, SRR
    //POST the XADC wizard
    XADC_WIZARD(0x340) = 0x55;//This is the test value the BSP example used
    Q_ASSERT(0x55 == XADC_WIZARD(0x340));
    XADC_WIZARD(0) = 0xA;//reset is active for 16 clock cycles; see PG091, SRR

In case I am writing to the wrong register that does NOT remember the values I wrote, the assert will park the FW in an error state, so that it will be obvious to all.

Initialize the XADC for 16 channels

To configure the sequencer, it must first be put into the default (safe) mode:

#define XADC_SEQUENCE_SAFE      (0 << 12 | 0xFFF)
#define XADC_SEQUENCE_OFF       (3 << 12 | 0xFFF)
#define XADC_SEQUENCE_CONTINOUS (2 << 12 | 0xFFF)
#define XADC_SEQUENCE_1PASS     (1 << 12 | 0xFFF)

    //Disable channel sequencer before configuring the sequencer
    XADC_WIZARD(0x304) = XADC_SEQUENCE_SAFE;

Select the 16 channels.  Given the Zynq's power and flexibility, I hope to read nearly all of the 16 channels.

    XADC_WIZARD(0x320) = 1 << 11;//Vp/Vn pair
    XADC_WIZARD(0x324) = ~(1 << 4);//Enable all Vaux pairs, except Vaux4
    //Leave averaging, input-mode (bi/unipolar), and acquisition time at default

Then enable interrupts:

    XADC_WIZARD(0x60) = XADC_WIZARD(0x60);//Clear any pending interrupts
    XADC_WIZARD(0x5C) = 1 << 31;//PG091, GIER (global interrupt enable) register
    XADC_WIZARD(0x68) = 1 << 4;//PG091, IPIER register, bit EOS

Note that I am only interested in the end of sequencing (vs. end of conversion for each channel).

Enable external mux, which is necessary to save I/O pin counts:

    //Enable external mux and connect to Vp/n
    XADC_WIZARD(0x300) = 0 << 12 // no averaging
    | 1 << 11 //enable mux
    | 0 << 9 //event driven sampling: doesn't work?
    | 3 << 0; // mux input channel = Vp/n

Finally, put the XADC in low power mode until needed:

    //Turn off all alarms and enable calibration; see UG480 Table 3-9
    XADC_WIZARD(0x304) = XADC_SEQUENCE_OFF;//Write to the config register 1

    XADC_WIZARD(0x308) = 3 << 4; //power down ADC to save power

Start ADC acquisition and conversion from a system tick handler

My system tick timer handles timeout driven processing, so I want that to be jitter-free.  Even though starting the sequence itself is deterministic and can potentially be done before I handle the timeout event in the FW, the timeout handling code for all my state machines may take more than the sequencing (of the 16 channels).  If so, the timeout handler may get interrupted by the completion of the ADC sequencing.  While I write the code to be re-entrant in general, avoiding multi-thread contention is a good practice in general.  If sampling ADC and then running a control algorithm is deemed higher priority, I can easily move the following code to BEFORE my timeout handler.

//process the system tick (Q_TIMEOUT_SIG)

//Start the ADC conversion AFTER systick processing is done, to
//keep the Q_TIMEOUT_SIG jitter-free
XADC_WIZARD(0x308) = 0;//Start the XADC clocks

//Start another pass through sequence; PG480
XADC_WIZARD(0x304) = XADC_SEQUENCE_1PASS;

It takes MANY clock cycles to acquire and convert the 16 channels, as you can see in the scope capture below, where the 1st pulse is the timeout interrupt handling (at the end of which I start the acquisition, as explained above):
Eyeballing from the scope capture timeline, it takes about 19 usec for acquisition and conversion to complete, and another 5 usec to read out the ADC values from the registers (code given below).

Handle the EOS (completion of ADC sequencing) interrupt

Please remember from the above discussion that in my HW design that uses an FPGA to CPU interrupt concatenator, the XADC interrupt was mapped to interrupt ID 89, which I enum to INT_ID_XADC.  The first thing I should do is acknowledge the interrupt (turn it off)

case INT_ID_XADC: {
uint32_t status = XADC_WIZARD(0x60);//PG091, IPISR register
XADC_WIZARD(0x60) = status;//acknowledge interrupt
XADC_WIZARD(0x304) = XADC_SEQUENCE_OFF;

If the interrupt is EOS, I can read out the data.

if(status & (1 << 4)) {//EOS (end of sequence)
XADC_val[0] = XADC_WIZARD(0x20C);//PG091 table 2-4, Vp/Vn
XADC_val[1] = XADC_WIZARD(0x240);
XADC_val[2] = XADC_WIZARD(0x244);
XADC_val[3] = XADC_WIZARD(0x248);
XADC_val[4] = XADC_WIZARD(0x24C);
XADC_val[5] = XADC_WIZARD(0x254);
XADC_val[6] = XADC_WIZARD(0x258);
XADC_val[7] = XADC_WIZARD(0x25C);
XADC_val[8] = XADC_WIZARD(0x260);
XADC_val[9] = XADC_WIZARD(0x264);
XADC_val[10] = XADC_WIZARD(0x268);
XADC_val[11] = XADC_WIZARD(0x26C);
XADC_val[12] = XADC_WIZARD(0x270);
XADC_val[13] = XADC_WIZARD(0x274);
XADC_val[14] = XADC_WIZARD(0x278);
XADC_val[15] = XADC_WIZARD(0x27C);

XADC_WIZARD(0x304) = XADC_SEQUENCE_OFF;
XADC_WIZARD(0x308) = 3 << 4;//Stop the XADC clocks
}

It was not obvious at first why the sequencer could not be left in a single pass mode, until I found a cryptic passage in the datasheet while debugging: the sequencing starts when the mode CHANGES to single pass mode!  Stopping the clock is the PRINCIPLE method of saving power, but I have to wait until the sequencing is complete.

Result without any circuit connected to the Vp/n input (just reading noise)

This is what I see in the JTAG debugger:

XADC_val long unsigned int [16] [0x000020b7, 0x000020d2, 0x000020c3, 0x000020d1, 0x000020c7, 0x000020d0, 0x000020d5, 0x000020d1, 0x000020bd, 0x000020ca, 0x000020b5, 0x000020c6, 0x000020d7, 0x000020ca, 0x000020bd, 0x000020c9]

Since the XADC is a 12-bit ADC, the leading '2' is strange, and probably an invalid bit that should be thrown away.  The values change with every read, so I think the read values are indeed noise.

Note that the Xilinx documentation is WRONG: it said the data is MSB justified.  Clearly, the data is LSB justified!

Removing the XADC Linux device driver from the kernel

The  XADC device driver can be removed from the DTS file, but that will leave unused in the kernel.  For a thorough excision, I changed the kernel config like this:

# CONFIG_XILINX_XADC is not set

May 27, 2015

Xilinx FMC105 breakout board JTAG chain breakage workaround

I love the Avnet Zedboard; for learning a computing system from ground up, this medium priced board is the best thing I found so far; the flexibility to modify the HW forces me to really understand the low level concepts.  The Zedboard has lots of I/O possibilities, but exposes ALL I/O pins available on Zynq (that's the CPU/FPGA package that runs the Zedboard) to the FMC connector.  For serious dorking with Zedboard, a breakout board to access all those pins is necessary.  For $150, Xilinx FMC105 board does exactly that.  But when I mounted the FMC105 board on my Zedboard, I found that the JTAG stopped working, and discovered that it is because the FMC105 board also brings out the JTAG TDI/TDO pins, but leave them open--leading to a broken JTAG chain.  Since I want to keep using the USB JTAG connector on the Zedboard, I had to short the TDI/TDO pins with a jumper, as you can see on the lower left corner of my setup, shown below:

Another handy thing to know: although it is hard to get at 3.3V on the Zedboard (as a protection measure, VADJ can only be either 1.8 V or 2.5 V--unless you solder on a jumper), there are a few 3.3 V on the FMC105.  One of them is shown above, where I connected a red cable to my voltage regulator chip (not shown above).  The JTAG pins above also has a 3.3 V source.

May 20, 2015

Debugging Zedboard HDMI resolution 1360x768

Problem

I don't see the penguin logo on the 1360x768 HDMI monitor.  After debugging, I discovered that I can actually see something (can run the Qt pathstroke demo), but if the monitor is NOT on when the board boots, the resolution initialization is incorrect (1024x768 instead of 1360x768).

dmesg shows:


[drm] Initialized drm 1.1.0 20060810
drivers/gpu/drm/adi_axi_hdmi/axi_hdmi_drv.c:axi_hdmi_platform_probe[176]
platform 70e00000.axi_hdmi: Driver axi-hdmi requests probe deferral
...
adv7511-hdmi-snd fpga-axi@0:adv7511_hdmi_snd: adv7511 <-> 75c00000.axi-spdif-tx mapping ok
...
Console: colour dummy device 80x30
Console: switching to colour frame buffer device 128x48

axi-hdmi 70e00000.axi_hdmi: fb0:  frame buffer device
axi-hdmi 70e00000.axi_hdmi: registered panic notifier
[drm] Initialized axi_hdmi_drm 1.0.0 20120930 on minor 0

The "Console: switching to colour..." log above is from drivers/tty/vt/vt.c, do_bind_con_driver()

ADI diagnostic tool

To run this tool, get python

git clone https://github.com/analogdevicesinc/diagnostic_report.git

sed 's/@PREFIX@/\/usr/' adi-diagnostic-report.desktop.in > adi-diagnostic-report.desktop

install -d /usr/bin
install -d /usr/share/adi_diagnostic_report/
install ./adi_diagnostic_report /usr/bin/
install ./adi_diagnostic_report.glade /usr/share/adi_diagnostic_report/
xdg-desktop-menu install adi-diagnostic-report.desktop

In the end, it just wants to gather the following output:

dmesg

...

uname -a

Linux zed 3.19.0 #1 SMP PREEMPT Sat May 9 10:20:22 PDT 2015 armv7l GNU/Linux

/etc/os-release:
NAME=Buildroot
VERSION=2015.05-rc1-00029-g89f96ea-dirty
ID=buildroot
VERSION_ID=2015.05-rc1
PRETTY_NAME="Buildroot 2015.05-rc1"

Bitstream information: /sys/kernel/debug/adi_diagnostic/info

does NOT exist

/sys/kernel/debug

Nothing here!
Clock information: /sys/kernel/debug/clk/clk_summary
/sys/kernel/debug/adi_diagnostic/clock_monitor
Board status signals: /sys/kernel/debug/adi_diagnostic/status_monitor

Video out information: /sys/class/drm/*/status

# find  /sys/class/drm/
# find  /sys/class/drm/ -exec file {} \;
/sys/class/drm/: directory
/sys/class/drm/card0-HDMI-A-1: symbolic link to ../../devices/soc0/fpga-axi@0/70e00000.axi_hdmi/drm/card0/card0-HDMI-A-1
/sys/class/drm/card0: symbolic link to ../../devices/soc0/fpga-axi@0/70e00000.axi_hdmi/drm/card0
/sys/class/drm/controlD64: symbolic link to ../../devices/soc0/fpga-axi@0/70e00000.axi_hdmi/drm/controlD64
/sys/class/drm/version: drm 1.1.0 20060810

# file *
device:    symbolic link to ../../card0
dpms:      ASCII text
edid:      empty
enabled:   ASCII text
modes:     empty
power:     directory
status:    ASCII text
subsystem: symbolic link to ../../../../../../../class/drm

# cd /sys/devices/soc0/fpga-axi@0/70e00000.axi_hdmi/drm/card0/card0-HDMI-A-1
# cat dpms
On
# cat enabled
enabled
# cat status
disconnected

IIO device information: iio_info

/proc/config.gz
/proc/interrupts
/proc/iomem
/proc/cmdline

Device register settings (regmap): /sys/kernel/debug/regmap/*


ADI tools source revisions:
mount
/media/boot/VERSION: does NOT exist
FMC FRU EEPROMs: /sys/devices/*/eeprom
/var/log/Xorg.0.log
/proc/device-tree
/sbin/ifconfig -a
/sbin/route -n
/sys/bus/{platform,i2c,spi}/

Mar 31, 2015

State machine based Qt5 GUI on Zedboard

In a previous blog entry, I explored creating a minimal embedded Linux distribution containing the Qt5 framework, and writing and debugging a "Hello world" Qt GUI application.  Whenever possible, I write all my SW within an event-driven, hierarchical state machine framework called QP.  But since Qt is also an event-driven framwork in its own right, meshing the 2 together is not straight-forward.  When creating a WPF MVVM (model-view-view model) GUI application with state machines, I could update the WPF view model from a special active object (I called it the GuiStateMachine) in response to any update events (of interest to the GUI) from ALL other active objects.  Apparently, you cannot do that in Qt, because in the official Qt-QP integration example, the singleton GUI state machine runs in Qt context.  So unlike in my WPF-QP integration, the events delivered to the GUI state machine (active object, really) are transformed into a Qt event and shoved into the Qt's event delivery mechanism.  The Qt-QP reference application is available for mingw, but I cross-compile for the Zynq (ARM Cortex A9), so I am going to modify the reference application for my situation.

Create the DPP Qt Widgets project

The reference application creates the QP Qt library first.  But on my system, one Qt GUI is the only application (I am an embedded SW engineer, not a desktop SW engineer), so I will not bother with a separate library, and just put all code in 1 Qt widgets application, in the qpcpp/example/qt/arm/buildroot folder.

~/work/Dorking/QP/qpcpp/examples/qt$ mkdir -p arm/buildroot

Then in Qt Creator (the previous blog entry discussed how to get and install the Qt Creator FROM qt.io rather than as a Debian package)
  1. Click "New Project" button, and then choose the "Qt Widgets Application" template.
  2. Following the reference application example, I create a project called "dpp-gui" in the /mnt/work/Dorking/QP/qpcpp/examples/qt/arm/buildroot folder just created.
  3. Next, I choose the zedbr2 kit I created in the  previous blog entry.
  4. In a departure from the example, I create my GUI as a QMainWindow (vs. QDialog).  Also unlike the example, I WILL use the form.  But I will still call the main class "Gui", to follow the example.
Qt Creator can ready build this empty main class, which is always a good first step.

Preprocessor include path and defines in qmake project file

At minimum, the project must include the QP include/, qep/source/, qf/source/, and  the QP port folders.  Unlike other IDEs, the build variables like include paths are NOT a project property; I write these are directly into the project (.pro) file in a text editor, using a qmake variable, like this:

QP_ROOT = ../../../../..

INCLUDEPATH += $$QP_ROOT/include $$QP_ROOT/qep/source \$$QP_ROOT/qep/source \ $$QP_ROOT/qf/source \$$QP_ROOT/qf/source \ $$QP_ROOT/ports/qt




Qt itself has a state machine infrastructure, which is redundant for a QP state machine application, so I turn off the Qt's state machine feature in the qmake .pro file:

DEFINES += QT_NO_STATEMACHINE

Add sources to the project and tailor to my needs

QP platform independent sources

In Qt Creator, right click on Sources --> Add Existing Directory --> Browse to the qpcpp/qep/source/ folder --> Start Parsing, to expand the folder and unselect the unnecessary files, as shown below (I do not use FSM, only HSM):
I later learned that you can also include the header files, and Qt Creator will correctly pull them into the HEADERS variable, so qep_pkg.h should have been checked in the above screenshot.

I add qpcpp/qf/source folder similarly, without leaving out any files this time.

Note on updating to the QP 5 API

When copying examples written for QP API 4.5 or earlier, the following changes are required:
  • Delete the deprecated call to QS_RESET()
  • QTimeEvt ctor now takes the owning active object as the 1st argument.  In C++, that would show up as the "this" pointer if the timer belongs to an active object.  In exchange, the armX method of the QTimerEvt--which should be used instead of postIn() method--now does NOT take an active object.
  • Q_NEW now takes ctor arguments, to call the PLACEMENT new operator (i.e. unlike the new does NOT hit the heap) of the type being created.  While this is great for a single process usage of the memory pool, the virtual table you get with the new operator is dangerous when the memory pool spans multiple processes (through shared memory)--as will be the case for me.  The danger lies in the possibility for different compiler versions laying out the virtual table differently (C++ compilers are notorious for this, even among different versions).  I decide to play it safe here, turn off QEvent's CTOR and VIRTUAL features in qep_port.h, as shown below (and pay the price of having to initialize the memory pool objects myself):
// don't define QEvent to avoid conflict with Qt
#define Q_NQEVENT    1

// provide QEvt constructors
#undef Q_EVT_CTOR

// provide QEvt virtual destructor
#undef Q_EVT_VIRTUAL

QP Qt port sources

Because Qt is a multi-platform code, the example QP port to mingw Qt still works for embedded ARM.  I just have to include the qpcpp/ports/qt/ folder, like I have done for the qep/ and qf/ folders above.  But since the PixelLabel is only necessary for the fly-and-shoot example, I excluded them.

SOURCES += \...
$$QP_ROOT/ports/qt/guiapp.cpp \
$$QP_ROOT/ports/qt/qf_port.cpp


HEADERS += gui.h \
$$QP_ROOT/ports/qt/qep_port.h \
$$QP_ROOT/ports/qt/qf_port.h \
$$QP_ROOT/ports/qt/tickerthread.h \
$$QP_ROOT/ports/qt/aothread.h \
$$QP_ROOT/ports/qt/guiapp.h \
$$QP_ROOT/ports/qt/guiactive.h

Unlike the example Qt integration on mingw, setting a stack size to 4 KB is preventing QThread start, so I commented them out and let QThread use the default thread stack size for now.

   //thread->setStackSize(stkSize);

Application support files

The final step in mating QP to an application is to specify functions that QP calls for certain events (startup, onClockTick, onAssert, etc) and the application state machine calls (like updating the philosopher stats from the Table state machine).  Unlike the port files, which can theoretically be shared between different QP-Qt projects (again, I will only have 1), the application specific files are coupled to the application logic.  For the DPP application, dpp.h and the bsp header/source files are such files, so I add them to the first lines of SOURCES and HEADERS in the qmake pro file:

SOURCES += main.cpp gui.cpp bsp.cpp philo.cpp table.cpp \
...


HEADERS += gui.h bsp.h dpp.h \
...



dpp.h contains the application specific event class TableEvt.  To turn off the event polymorphism feature, I take in only the signal number in the TableEvt constructor.

When I examine bsp.cpp, I see that the philosopher states (THINKING/HUNGRY/EATING) are displayed with QPixmaps showing 3 different PNG files, and the table state (PAUSED/SERVING) is displayed with a text on a button.  The images for the philosopher states are in res folder,  pointed to by the gui.qrc (Qt resource) file.  So I add this file to the project (Add Existing File).  I also copied the entire res/ folder from the mingw example folder, so that when I click on one of the PNG files in the resource, I see the image in the Qt Creator, like this:

In the qmake pro file, the resource shows up like this:

RESOURCES += gui.qrc

To update the files to the latest QP API, I make the changes discussed above, in "Note on updating to the QP 5 API" section.

UI

Instead of just blindly copying the QDialog based UI from the example, I went through the trouble of copying the buttons and labels from the example UI to the QMainWindow based UI, all to preserve the possibility of using the top menu and the bottom status bars in the future.  In Qt Creator's Designer View, the UI looks like this:
Note that all widgets I copied are in the central widget; that is, the north, south, east, west widget areas do not exist.

I wire the signals emitted from the widgets to the 3 slots defined in gui.cpp constructor:

...
    QObject::connect(m_quitButton, SIGNAL(clicked()), this, SLOT(onQuit()));
    QObject::connect(m_pauseButton, SIGNAL(pressed()), this, SLOT(onPausePressed()));
    QObject::connect(m_pauseButton, SIGNAL(released()), this, SLOT(onPauseReleased()));
    QObject::connect(this, SIGNAL(finished(int)), this, SLOT(onQuit()));
    } // setupUi

The UI designer just lays out the widgets (and possibly statically connects signals to slots).  The code behind the UI is in gui.cpp, which I copied from the example.  After this step, my gui.cpp code is the same as the example, except for Gui parent being QMainWindow instead of QDialog.

State machines

The philosopher and the table state machines drive the application logic.  The Qt integration example has the 2 state machine implementations generated by the QM state charting tool, but I do NOT want to generate my code, so I copy philo.cpp and table.cpp from another example (examples/arm/vanilla/gnu/dpp-at91sam7s-ek) that does not yet use the new style of coding the state transition.  I also added these 2 files to the project.  But I later found out that weird crash can occur if I update the GUI in a non-GUI thread.  Examples of the crash:

QObject::startTimer: Timers cannot be started from another thread
QBasicTimer::stop: Failed. Possibly trying to stop from a different thread
QObject::connect: Cannot queue arguments of type 'QTextBlock'
(Make sure 'QTextBlock' is registered using qRegisterMetaType().)

valgrind  --undef-value-errors=no --leak-check=yes dpp-gui > dpp_valgrind.txt 2>&1

I added the Desktop kit to the project, in the Projects toolbar icon, and reproduced the problem even on Ubuntu.  More errors:



QApplication: Object event filter cannot be in a different thread.
QWidget::repaint: Recursive repaint detected

This is why in the Qt integration example, the table active object it the ONLY active object that derives from GuiQActive class, which is supplied in the port.

class Table : public QP::GuiQActive {
...

Application main

I copied main.cpp verbatim from the example, which gives the table GuiQActive object NO event queue (because events to the GUI go through the Qt event delivery mechanism).  So the following code snippet is correct:

    DPP::AO_Table->start((uint_fast8_t)(N_PHILO + 1),
                         //GuiQActive does not need event queue
                         //&l_tableQueueSto[0], Q_DIM(l_tableQueueSto),
                         (QP::QEvt const **)0, (uint32_t)0,
                         (void *)0, (uint_fast16_t)0);

Build and debug on the target

  1. Leveraging the hard work of setting up the cross-compile in the previous blog entry, I build the target ELF file easily by clicking on the build icon (the hammer).  The debug target is still only 2.3 MB on the disk.
  2. Following the workaround for the cross-debug not working, I copy the ELF file to the target's /root folder.
  3. I start the gdbserver on the copied app, specifying the mouse device (note that this application does NOT use the keyboard, but the keyboard device is event1)

    gdbserver localhost:1234 /root/dpp-gui -plugin evdevmouse:/dev/input/event0
  4. In Qt Creator, attach to the remote gdbserver (menu --> Debug --> Start Debugging --> Attach to Remote Debug Server), specifying the port and the ELF file, as you can see in this example:

I see 5 Homer icons happily taking turns eating, thinking, being hungry!

Feb 26, 2015

Zynq AMP: Linux on CPU0 and bare metal on CPU1

When I first started playing around with Zedboard, I set a goal to investigate ways to integrate all computing that I've ever done in an expensive (I've never worked on something that sold for less than $100K--actually more like $500K) hardware into an SoC.  Studying how to run 2 bare metal C applications on each Zynq ARM CPU (xapp 1079) was the first step, and I learned about some of the Linux kernel and device drivers after that.  When I studied xapp 1079, I had trouble thoroughly understanding its companion reference app xapp 1078, in which the app on CPU1 is kicked off from Linux running on CPU0.  But my half-year long detour through the various Linux subsystems just paid off serendipitously, because I found a Linux kernel module that may obviate the need for xapp 1078 altogether (actually will make xapp 1078 seem like a giant head-fake; maybe not as bad as the James Clark's WebTV venture during the height of the dot-com boom, but still right up there).

remoteproc kernel module

There are 2 reasons to keep zynq_remoteproc as a module rather than compiling into the kernel:
  1. Since I am hosting the root file system on NFS, this module should NOT start until the NFS rootfs is mounted.  Modules seem to start AFTER NFS mounting.
  2. To start/stop CPU1, this module should be probed and removed
NOTE TO SELF: after compiling modifying the kernel module and doing a module_install, the modules still need to be copied to the NFS export!

When Xilinx made a marketing push to AMP (asymmetric multi-processing) a couple of years ago, they put out (rather quietly) an application note ug978 that launched FreeRTOS on CPU1 from Linux running on CPU0.  I will try to use zynq_remoteproc module--the specialization of the generic Linux remoteproc module--as verbatim as possible (<kernel>/drivers/remoteproc/zynq_remoteproc.c), to launch my own bare metal C++ application on CPU1.

Firstly, the module has to be built.  I added the following lines to my kernel defconfig:

CONFIG_RPMSG=y
CONFIG_REMOTEPROC=y
CONFIG_ZYNQ_REMOTEPROC=m

Next, the kernel has to be told about my desire to use the zynq_remoteproc driver, through DTS.  I added the following entry in zynq-zed-adv7511.dts:

remoteproc@1 {
     compatible = "xlnx,zynq_remoteproc";
     reg = < 0x1FE00000 0x200000 >;
     interrupt-parent = <&gic>;
     interrupts = < 0 37 0 0 38 0 >;
     firmware = "cpu1app.elf";
     ipino = <0>; //The only free ipino
     vring0 = <2>;
     vring1 = <3>;
};

Here, I am telling the kernel that I want to use the last 2 MB (out of 512 MB available on Zedboard) of the RAM for the bare metal app running on CPU1.  Please recall that the memory was declared in zynq-zed.dtsi, which is included by zynq-zed-adv7511.dts:

memory {
device_type = "memory";
reg = <0x000000000 0x20000000>;
};

To constrain the Linux kernel to only 510 MB without having to change the above DTS entry, I add "mem=510M" in the U-Boot kernel bootargs.  Without it, the module cannot allocate coherent DMA mapping for the last 2 MB because the following code in zynq_remoteproc probe will fail (I tried it already):

ret = dma_declare_coherent_memory(&pdev->dev, local->mem_start,
local->mem_start, local->mem_end - local->mem_start + 1,
DMA_MEMORY_IO);

In Xilinx document ug978, the CPU1 application was placed in the boot partition, right next to BOOT.bin--which lives on my SD card.  For convenience during development, I want to put the application ELF file on the NFS export.  Many Linux distributions seem to put firmware in /lib/firmware, but according to the hard coded paths in fw_path string array (<>/drivers/base/firmware_class.c), /lib/firmware/updates/ is also a possibility, as well as a custom path specified in the "path" module parameter.  This folder is conveniently accessible on my NFS host, making development iteration easier.

I can just compile this DTS in bash and move the DTB into the TFTP download folder, because I am downloading the kernel over TFTP:

~/work/zed/kernel/arch/arm/boot/dts$ ~/work/zed/kernel/scripts/dtc/dtc -I dts -O dtb -o zynq-zed-adv7511.dtb  zynq-zed-adv7511.dts
~/work/zed/kernel/arch/arm/boot/dts$ sudo mv zynq-zed-adv7511.dtb  /var/lib/tftpboot/

Of course, there is no cpu1app ELF file in /lib/firmware, BUT the modprobe fails for a different reason if I ipino in DTS is anything other than 0:

CPU0: IPI handler 0x5 already registered to ipi_cpu_stop
zynq_remoteproc 1fe00000.remoteproc: IPI handler already registered
zynq_remoteproc 1fe00000.remoteproc: Deleting the irq_list
CPU1: Booted secondary processor
CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
zynq_remoteproc 1fe00000.remoteproc: Can't power on cpu1 -1
zynq_remoteproc: probe of 1fe00000.remoteproc failed with error -1

This code is stopping the probe():

ret = set_ipi_handler(local->ipino, ipi_kick, "Firmware kick");
if (ret) {
dev_err(&pdev->dev, "IPI handler already registered\n");
goto irq_fault;
}

Reading set_ipi_handler(), I realized that 0 (IPI_WAKEUP) is the only available IPI handler number, so I changed DTS.  I do NOT plan to use virtio, so I simply commented out anything related to vring in zynq_remoteproc with CONFIG_ZYNQ_IPC #ifdef.

Simplest bare metal (actually uses the Xilinx stand-alone BSP) CPU1 application: blinks

Since bare metal AMP was demonstrated in xapp 1079, it may be easiest to pick up from there.  But briefly, building a stand-alone (no OS) for CPU1 involves the following high-level steps:
  1. Create a standalone BSP specialized for AMP CPU1 (when creating the Xilinx BSP project in xsdk, select ps_cortexa9_1 as the CPU).  Since I did not install the FreeRTOS template, the only OS choice I get is standalone--hence the project name "standalone_bsp_1".
  2. Compile a ELF executable that targets CPU1 and depends on the BSP just created above, and hard coded to some load address
Since the CPU1 BSP will NOT be used for FSBL, there is an opportunity to reduce the code size (compared to the CPU0 BSP) by NOT selecting any libraries--such as xilffs or xilrsa, as I've done below:
Since I am NOT interested in debugging the BSP, I have an opportunity to increase the optimization level and remove the debug (-g) flag in the BSP setting.  But this is important: USE_AMP=1 preprocessor define in the BSP setting (right click on the BSP project in Eclipse --> Board Support Package settings) changes some BSP code from the default BSP):
  • GIC (generalized interrupt controller?) distributor is disabled
  • L2 cache invalidation is disabled in boot.S, and instead, virtual address 0x20000000 is mapped to 0x0 and marked as non-cacheable (while MMU is disabled of course).  xapp 1079 comments this out, so I did too.
  • Recently, John McDougall added more AMP code in boot.S to:
    • Mark the Linux DDR region as unassigned/reserved to the MMU, which is a private resource of CPU1
    • Mark the CPU1 DDR as inner (L1) cached only
  • L2 cache is NOT turned back on (because it was not invalidated in the first place!)
Marking certain sections of the DDR as reserved and the last part of the DDR as inner cached only is done in boot.S, when USE_AMP=1:

#if USE_AMP==1
// /* In case of AMP, map virtual address 0x20000000 to 0x00000000  and mark it as non-cacheable */
// ldr r3, =0x1ff /* 512 entries to cover 512MB DDR */
// ldr r0, =TblBase /* MMU Table address in memory */
// add r0, r0, #0x800 /* Address of entry in MMU table, for 0x20000000 */
// ldr r2, =0x0c02 /* S=b0 TEX=b000 AP=b11, Domain=b0, C=b0, B=b0 */
//mmu_loop:
// str r2, [r0] /* write the entry to MMU table */
// add r0, r0, #0x4 /* next entry in the table */
// add r2, r2, #0x100000 /* next section */
// subs r3, r3, #1
// bge mmu_loop /* loop till 512MB is covered */

/* Mark Linux DDR [0x00000000, 0x1FE00000) as unassigned/reserved */
ldr r3, =0x1fd  /* counter=509 to cover 510MB DDR */
ldr r0, =TblBase /* MMU Table address in memory */
ldr r2, =0x0000  /* S=b0 TEX=b000 AP=b00, Domain=b0, C=b0, B=b0 */
mmu_loop:
str r2, [r0]    /* write the entry to MMU table */
add r0, r0, #0x4 /* next entry in the table */
add r2, r2, #0x100000 /* next section */
subs r3, r3, #1     //counter--
bge mmu_loop    /* loop till Linux DDR MB covered */

/* Mark CPU1 DDR [0x1FE00000, 0x20000000) as inner cached only */
ldr r3, =0x1  /* counter=1 to cover 2MB DDR */
movw r2, #0x4de6  /* S=b0 TEX=b100 AP=b11, Domain=b1111, C=b0, B=b1 */
movt r2, #0x1FE0      /* S=b0, Section start for address 0x1FE00000 */
mmu_loop1:
str r2, [r0]    /* write the entry to MMU table */
add r0, r0, #0x4 /* next entry in the table */
add r2, r2, #0x100000 /* next section */
subs r3, r3, #1     //counter--
bge mmu_loop1    /* loop till CPU1 DDR MB is covered */
#endif

For the application, I copy the xapp 1079 CPU1 application as a new project "cpu1app" and start modifying.  Besides the application logic itself, the linker script (lscript.ld) specifies where the code/data sections will be placed in memory (DDR, to be specific, by CPU0--but that is not the concern of the linker script).  xapp1079 reserved 0x02000000 through 0x02ffffff (16 MB) for CPU1, but as shown in the DTS above, I want to allocate CPU1 memory at 0x1FE00000.  So I change the ps7_ddr_0_S_AXI_BASEADDR location and size to in the linker script editor, like this:

MEMORY
{
   ps7_ddr_0_S_AXI_BASEADDR : ORIGIN = 0x1fe00000, LENGTH = 0x200000
}

Since the linker places all sections into the DDR, there is no reason to even mention other on-chip memory (BRAM at 0x0 and OCM at 0xFFFC0000).  I don't know the correct stack and heap size yet, so I'll just leave them alone (8 KB each).

_STACK_SIZE = DEFINED(_STACK_SIZE) ? _STACK_SIZE : 0x2000;
_HEAP_SIZE = DEFINED(_HEAP_SIZE) ? _HEAP_SIZE : 0x2000;

The simplest app I can think of is a blinker.  Recently, John McDougall introduced a sleep method using CPU1's private timer (which seems to be called SCU timer--I don't yet see the connection to the snoop control unit).  John McDougall's code for initializing the SCU timer and calling a sleep on it is in this download (in design/src/apps/app_cpu1/scu_sleep.[ch]).  My main() simply calls the SCU timer init and then sleep for 1 second over and over.

#define GPIO_DEVICE_ID   XPAR_XGPIOPS_0_DEVICE_ID
#define LED_DELAY 10000000
#define OUTPUT_PIN 7 /* Pin connected to LED/Output */
XGpioPs Gpio; /* The driver instance for GPIO Device. */

static int GpioOutputExample(void)
{
volatile int Delay;

XGpioPs_SetDirectionPin(&Gpio, OUTPUT_PIN, 1);
XGpioPs_SetOutputEnablePin(&Gpio, OUTPUT_PIN, 1);
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, 0x0);

while(1) {
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, 0x1);
for (Delay = 0; Delay < LED_DELAY; Delay++);
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, 0x0);
for (Delay = 0; Delay < LED_DELAY; Delay++);
}
return XST_SUCCESS;
}

int main(void)
{
int Status;
XGpioPs_Config *ConfigPtr;

ConfigPtr = XGpioPs_LookupConfig(GPIO_DEVICE_ID);
Status = XGpioPs_CfgInitialize(&Gpio, ConfigPtr,
ConfigPtr->BaseAddr);
if (Status != XST_SUCCESS) {
return XST_FAILURE;
}
Status = GpioOutputExample();
if (Status != XST_SUCCESS) {
return XST_FAILURE;
}

return XST_SUCCESS;
}

WITHOUT the USE_AMP=1 modifications I made to boot.S above, I can launch this program from xsdk (Xilinx SW development IDE), and I can see the blinking LED.

xsdk builds the ELF file with ease, and I moved that file into a new folder /lib/firmware within the NFS exported root for the target.  When I rebooted Zedboard, I was greeted with what seems like a minor success in dmesg output:

CPU1: shutdown
 remoteproc0: 1fe00000.remoteproc is available
 remoteproc0: Note: remoteproc is still under development and considered experimental.
 remoteproc0: THE BINARY FORMAT IS NOT YET FINALIZED, and backward compatibility isn't yet guaranteed.

As dmesg suggests, Linux first shut down CPU1.  Silently, it tries to load the firmware through this chain: zynq_remoteproc_probe() --> rproc_add() --> rproc_add_virtio_devices() --> request_firmware_nowait() --> INIT_WORK(&fw_work->work, request_firmware_work_func) --> request_firmware_work_func() --> _request_firmware() --> fw_get_filesystem_firmware() --> fw_read_file_contents().  request_firmware_work_func() should also do post-FW load work (like booting the remote proc) through the fw_work->cont function pointer to rproc_fw_config_virtio(), but that is bombing out because there is no rproc_find_rsc_table <-- rproc_elf_find_rsc_table()

The debugger does NOT respond when CPU1 is halted (as in this case), so I had to rely on printk.  I came to appreciate the value of out-of-tree module compilation:

~/work/zed/kernel/drivers/remoteproc$ make -C /mnt/work/zed/buildroot/output/build/linux-custom ARCH=arm M=`pwd` modules

Having the target's modules folder on NFS export (/export/root/zedbr2/lib/modules/3.15/kernel/drivers/remoteproc in this case) made the otherwise printk based debugging much faster (still took a few days to navigate through all the source and try different hypothesis).  Finally, I realized that my executable does not have the .resource_table section the ELF loader is looking for.  I put an empty resource table (note that num=1 below) as its own section (which is what the remoteproc module looks for after the ELF loader parses the ELF file) in lscript.ld:

.resource_table : {
   __rtable_start = .;
   *(.rtable)
   __rtable_end = .;
} > ps7_ddr_0_S_AXI_BASEADDR

The C program can have the global data as the resource table content:

#define RAM_ADDR 0x1fe00000
struct resource_table {//Just copied from linux/remoteproc.h
u32 ver;//Must be 1 for remoteproc module!
u32 num;
u32 reserved[2];
u32 offset[1];
} __packed;
enum fw_resource_type {
RSC_CARVEOUT = 0,
RSC_DEVMEM = 1,
RSC_TRACE = 2,
RSC_VDEV = 3,
RSC_MMU = 4,
RSC_LAST = 5,
};
struct fw_rsc_carveout {
u32 type;//from struct fw_rsc_hdr
u32 da;
u32 pa;
u32 len;
u32 flags;
u32 reserved;
u8 name[32];
} __packed;

__attribute__ ((section (".rtable")))
const struct rproc_resource {
    struct resource_table base;
    //u32 offset[4];
    struct fw_rsc_carveout code_cout;
} ti_ipc_remoteproc_ResourceTable = {
.base = { .ver = 1, .num = 1, .reserved = { 0, 0 },
.offset = { offsetof(struct rproc_resource, code_cout) },
},
.code_cout = {
   .type = RSC_CARVEOUT, .da = RAM_ADDR, .pa = RAM_ADDR, .len = 1<<19,
   .flags=0, .reserved=0, .name="CPU1CODE",
},
};

With this change, my program is copied to the correct location in the DRAM, and I can dynamically start/stop Linux on CPU1 by probing and removig the module, like this:

# rmmod zynq_remoteproc
# modprobe kernel/drivers/remoteproc/zynq_remoteproc.ko

This driver shows up in sys/module/zynq_remoteproc/  and /sys/devices/1fe00000.remoteproc.  But  zynq_remoteproc probe does NOT call rproc; it merely loads the firmware.  Indeed, it cannot because the firmware loading completes asynchronously from module probing. Supposedly, the rpmsg module probe should call rproc_boot(), so I tried the following

# modprobe kernel/drivers/rpmsg/virtio_rpmsg_bus.ko

But the module's probe does still NOT get called (note that I crossed CONFIG_RPMSG=y from my defconfig above)!  I could not figure out how to get the virtio device probed, and for that matter, another determined engineer could not either, so I just added in a single-threaded work queue to call rproc_boot after the firmware is loaded.

struct zynq_rproc_pdata {
struct irq_list mylist;
struct rproc *rproc;
u32 ipino;
#ifdef CONFIG_ZYNQ_IPC
u32 vring0;
u32 vring1;
#endif
u32 mem_start;
u32 mem_end;

//Need my own workqueue rather than a shared work queue because I will block for completion
struct workqueue_struct* wq;
struct work_struct boot_work;
};

static void boot_cpu1(struct work_struct *work) {
struct zynq_rproc_pdata* local =
container_of(work, struct zynq_rproc_pdata, boot_work);
struct rproc* rproc = local->rproc;
int err;

wait_for_completion(&rproc->firmware_loading_complete);
dev_info(&rproc->dev, "firmware_loading_complete\n");
err = rproc_boot(rproc);
if(err)
dev_err(&rproc->dev, "rproc_boot %d\n", err);
}

static int zynq_remoteproc_probe(struct platform_device *pdev)
{
...
ret = rproc_add(local->rproc);
if (ret) {
dev_err(&pdev->dev, "rproc registration failed\n");
goto rproc_fault;
}

INIT_WORK(&local->boot_work, boot_cpu1);
local->wq = create_singlethread_workqueue("znq_remoteproc boot");
if(IS_ERR(local->wq)) {
dev_err(&pdev->dev, "create_singlethread_workqueue %ld\n",
PTR_ERR(local->wq));
goto rproc_fault;
}
queue_work(local->wq, &local->boot_work);
...
}


static int zynq_remoteproc_remove(struct platform_device *pdev)
{
struct zynq_rproc_pdata *local = platform_get_drvdata(pdev);
u32 ret;

dev_info(&pdev->dev, "%s\n", __func__);
rproc_shutdown(local->rproc);
destroy_workqueue(local->wq);
...

With this change, the my cpu1app runs on boot:

 remoteproc0: firmware_loading_complete
 remoteproc0: powering up 1fe00000.remoteproc
 remoteproc0: Read /lib/firmware/cpu1app.elf 0
 remoteproc0: firmware: direct-loading firmware cpu1app.elf
 remoteproc0: assign_firmware_buf, flag 5 state 0
 remoteproc0: Booting fw image cpu1app.elf, size 150445
zynq_remoteproc 1fe00000.remoteproc: iommu not found
 remoteproc0: rsc: type 0
 remoteproc0: phdr: type 1 da 0x1fe00000 memsz 0xd890 filesz 0x8058
 remoteproc0: rproc_da_to_va 1fe00000 -->   (null) remoteproc0: rproc_da_to_va 1fe0800c -->   (null)
zynq_remoteproc 1fe00000.remoteproc: zynq_rproc_start
 remoteproc0: remote processor 1fe00000.remoteproc is now up

I can also debug my app in xsdk JTAG debugger.  This debugger stack trace is a proof that I can running Linux on CPU0 and my bare metal application on CPU1:

ARM Cortex-A9 MPCore #0 (Suspended)
0xc0020428 cpu_v7_do_idle(): arch/arm/mm/proc-v7.S, line 74
0xc0013d1c arm_cpuidle_simple_enter(): arch/arm/kernel/cpuidle.c, line 18
0xc03d08b8 cpuidle_enter_state(): drivers/cpuidle/cpuidle.c, line 104
0xc03d09ac cpuidle_enter(): drivers/cpuidle/cpuidle.c, line 159
0xc0060ad0 cpu_startup_entry(): kernel/sched/idle.c, line 154
0xc0573fac rest_init(): init/main.c, line 397
0xc07ebba4 start_kernel(): init/main.c, line 652
0x00008074
0x00008074
ARM Cortex-A9 MPCore #1 (Suspended)
0x1fe00594 GpioOutputExample(): ../src/xgpiops_polled_example.c, line 93
0x1fe005f4 main(): ../src/xgpiops_polled_example.c, line 113
0x1fe02264 _start()

rmmod zynq_remoteproc does not work; remove() method is not even getting called.  As a result, I cannot stop cpu1app; it just starts at the system bootup, and keeps running--which is OK for an embedded application.  Another approach would be to create another module that boots and stops zynq_remoteproc, but I don't know how to get a handle to the existing zynq_remoteproc instance...

Better alternative: provide "up" device attribute to read/write

If I provide a sysfs file for the userspace to write to, the firmware will probably have been loaded already by the time the user writes '1' to the attribute file.  So I created the store/show methods of "up" attribute as shown here:

ssize_t up_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count) {
struct rproc *rproc = container_of(dev, struct rproc, dev);
//struct platform_device *pdev = to_platform_device(dev);
//struct zynq_rproc_pdata *local = platform_get_drvdata(pdev);
if(buf[0] == '0') { //want to shut down
rproc_shutdown(rproc);
} else { // bring up
rproc_boot(rproc);
}
return count;
}
static ssize_t up_show(struct device *dev,
    struct device_attribute *attr, char *buf) {
struct rproc *rproc = container_of(dev, struct rproc, dev);
return sprintf(buf, "%d\n", rproc->state);
}
static DEVICE_ATTR_RW(up);

And in probe, I can register this file:

... ret = rproc_add(local->rproc);
if (ret) {
dev_err(&pdev->dev, "rproc registration failed\n");
goto rproc_fault;
}

ret = device_create_file(&local->rproc->dev, &dev_attr_up);
return ret;

When I probe this module, I can read the "up" file

# cat  /sys/devices/1fe00000.remoteproc/remoteproc0/up
 0

I then start the cpu1app by writing 1 to the file:

# echo 1 > /sys/devices/1fe00000.remoteproc/remoteproc0/up
 remoteproc0: powering up 1fe00000.remoteproc
 remoteproc0: Read /lib/firmware/cpu1app.elf 0
 remoteproc0: firmware: direct-loading firmware cpu1app.elf
 remoteproc0: assign_firmware_buf, flag 5 state 0
 remoteproc0: Booting fw image cpu1app.elf, size 150445
zynq_remoteproc 1fe00000.remoteproc: iommu not found
 remoteproc0: rsc: type 0
 remoteproc0: phdr: type 1 da 0x1fe00000 memsz 0xd890 filesz 0x8058
 remoteproc0: rproc_da_to_va 1fe00000 -->   (null) remoteproc0: rproc_da_to_va 1fe0800c -->   (null)
zynq_remoteproc 1fe00000.remoteproc: zynq_rproc_start
 remoteproc0: remote processor 1fe00000.remoteproc is now up

And the up file now reads 0, which means RPROC_RUNNING (and the LED is bliking!).

# cat  /sys/devices/1fe00000.remoteproc/remoteproc0/up
 2

To stop CPU1, I have to do 2 things in succession: write 0 to the "up" file, and then remove the module:

# echo 0 > /sys/devices/1fe00000.remoteproc/remoteproc0/up
zynq_remoteproc 1fe00000.remoteproc: zynq_rproc_stop
 remoteproc0: stopped remote processor 1fe00000.remoteproc

# rmmod zynq_remoteproc
zynq_remoteproc 1fe00000.remoteproc: zynq_remoteproc_remove
zynq_remoteproc 1fe00000.remoteproc: Deleting the irq_list
 remoteproc0: releasing 1fe00000.remoteproc
CPU1: Booted secondary processor

At this point, Linux has been restarted on the 2nd processor; if I do things in this way, I can restart the app again by modprobing and then writing 1 to the "up" file again.