Feb 28, 2015

Zynq inter-process interrupts

I started thinking about AMP (asymmetric multi-processing) communicating via OCM (on-chip-memory) when I first started playing around with Linux on Zynq.  Although I made sure that the Zynq OCM already had a device driver, tt took me all this time to get comfortable with Linux kernel and device drivers to get to this point, where I can start a bare metal application on CPU1 from Linux on CPU0.  In this blog, I study the logical next step: inter-process interrupts.

Learning from existing code

<kernel>/drivers/irqchip/irq-gic.c: interrupt related functions in Linux kernel

What can I learn from existing kernel functions?  Firstly, all IRQ register change is done in a spinlock (irq_controller_lock).

To disable interrupt (git_mask_irq), '1' bit is written to appropriate bit in ICDICER0 (0xF8F01180 ; 0x180 relative to the ICD base 0xF8F01000) ~ ICDICER2 (0xF8F01188 ; 0x188 relative to the ICD base 0xF8F01000).  Writing 0 enables forwarding the interrupt again, as shown in this example:

static void gic_unmask_irq(struct irq_data *d)
{
u32 mask = 1 << (gic_irq(d) % 32);

raw_spin_lock(&irq_controller_lock);
if (gic_arch_extn.irq_unmask)
gic_arch_extn.irq_unmask(d);
writel_relaxed(mask, gic_dist_base(d) + GIC_DIST_ENABLE_SET + (gic_irq(d) / 32) * 4);
raw_spin_unlock(&irq_controller_lock);
}

On Zynq (arch/arm/mach-zynq), irq_mask/irq_unmask methods are mask_msi_irq()/unmask_msi_irq() in <>/drivers/pci/msi.c, which handles the plain MSI (message signalled interrupt) case and MISX case.  Zynq does NOT seem to use these extensions.

ICCIAR/GIC_INT_ACK (0xF8F0010C): interrupt acknowledge register; reading the ID acknowledges the pending interrupt

ICCEOIR/GIC_EOI (0xF8F00110): end of interrupt register; write the interrupt ID from GIC_INT_ACK.

Interrupt handling in XSDK standalone BSP interrupt

Being more level, the Xilinx BSP may give a better example of IRQ handling.  I started with an example interrupt driven program auto-generated from the BSP summary page: interrupt driven GPIO example.  The 1st interrupt related function is SetupInterruptSystem(), which is specific to GPIO (i.e. not generic for all interrupts).  But most of the lower level calls inside it are generic.
  1. Fill all XSCUGIC_MAX_NUM_INTR_INPUTS (95) number of interrupt handlers to the stub handler (just increments the interrupt controller's UnhandledInterrupts counter)
  2.  DistInit (initialize distributor): do nothing if USE_AMP (Linux is the interrupt distributor master)
  3. Write 0xF0 to ICCPMR (CPU interrupt priority mas register; 0x4 relative to the CPU interface base address 0x00000100).  Why would we set the interrupt priority threshold to 0xF0?
  4. Write 0x7 to ICCIC (CPU interface control register), having to do with secure interrupts (don't care to learn about this for now).
  5. Set the interrupt handler as the ISR for HW's IRQ vector (vs other HW defined interrupts, such as FIQ, RESET, ABORT, SWI).
    1. Therefore, we know that on Zynq, interrupt handling is a 2 step process: ALL GIC interrupts (95 of them) are handled by this ISR, which then multiplexs into the handlers that will be defined for different types of interrupts.
    2. In xparameters, shared interrupt IDs start at 32 (saw this before, where the interrupt number defined in HW design shows up with 32 added to it).
  6. GPIO interrupt handler is 52 (XPAR_XGPIOPS_0_INTR defined in xparameters.h), registered with XScuGic_Connect()
    1. Q: is there a number set aside for the 16 software generated interrupts?
  7. Some peripherals like GPIO can configure the interrupt type (edge/level) through peripheral specific register(s).
  8. The 3rd leg of chained interrupt handler is the peripheral specific ISR, written by the application, which does NOT seem to have to acknowledge the interrupt (done by the 1st and 2nd ISRs).
  9. Peripheral specific interrupt is enabled to the 2nd multiplexer
  10. HW IRQ interrupt is enabled by Xil_ExceptionEnableMask(XIL_EXCEPTION_IRQ); 

Interrupting CPU1 from CPU0

Exposing a interrupt write attribute to the userspace on CPU1

With the knowledge gained from studying Linux and Xilinx BSP code, let's send the interrupt from Linux.  The Linux kernel already provides a function to raise software interrupt to any CPU.  For example, to raise IRQ number "irqnum" to CPU1:

gic_raise_softirq(cpumask_of(1), irqnum)

Under the hood, this writes "gic_cpu_map[1] | irqnum" to address "GIC0 data base address + 0xF00".  Linux kernel code is valid for Zynq GIC because it is based on the ARM GIC architecture.  It is NOT vectored in HW, so therefore there is an interrupt distributor that implements (configurable) priority (and serializes interrupts targeting multiples CPUs).  The SGI (software generated inteerupt) being raised above is explained in Zynq TRM section 7.2.1: SGI range from 0 to 15, and is raised by writing to ICDSGIR (Software Generated Interrupt) register at 0xF8F01F00, or 0xF00 relative to the ICD (interrupt control distributor at 0xF8F01000) .  gic_cpu_map[1] above corresponds to the target filter being 0 (specify the target) and the target being CPU1.

[BTW, section 7.4.2 seems VERY important; in particular, I need to better understand this sentence: "If the interrupt is active in the GIC (because the CPU interface has acknowledged the interrupt), then the software ISR determines the cause by checking the GIC registers first and then polling the I/O Peripheral interrupt status registers."]

Assuming that writing to this register does raise the software interrupt to CPU1, there is currently no way for a USERSPACE application to raise this interrupt.  The zynq_remoteproc device with which I flexibly booted a bare metal application on CPU1 now has an attribute file that the userspace can get to, as demonstrated in the last blog.  I can create another attribute for the userspace app to write to, with this code:

#include <linux/irqchip/arm-gic.h>
ssize_t irq_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count) {
u8 irqnum = buf[0] - '0';
if(irqnum >= 16)
dev_err(dev, "Invalid soft IRQ num %u\n", irqnum);
else
gic_raise_softirq(cpumask_of(1), irqnum);
return count;
}
static DEVICE_ATTR_WO(irq);

static int zynq_remoteproc_probe(struct platform_device *pdev)
{
...
ret = device_create_file(&local->rproc->dev, &dev_attr_irq);
if (ret) {
dev_err(&pdev->dev, "device_create_file %s %d\n",
dev_attr_irq.attr.name, ret);
goto attr_up_err;
}
return ret;
attr_up_err:
device_remove_file(&local->rproc->dev, &dev_attr_up);
...

The device sysfs folder now has "irq" file (next to the "up" file created in the last blog entry):

# ls /sys/devices/1fe00000.remoteproc/remoteproc0/
irq     power   uevent  up

Catching the interrupt on CPU1 bare metal application

The bare metal cpu1app will have to install interrupt handler and enable HW interrupt, with this code copied mostly from the BSP auto-generated GPIO example:

#include "xil_exception.h"
#include "xscugic.h"

int ledon = 1;
static void on_SGI(void*CallBackRef) {
//reading interrupt status acknowledges pending interrupt
#define ICCIAR (XPAR_PS7_SCUGIC_0_BASEADDR | 0x10C)
u32 status = Xil_In32(ICCIAR);
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, ledon ^= 1);//toggle LED
}

//Mostly copied from BSP auto-generated xgpiops_int_example
#define INTC_DEVICE_ID XPAR_PS7_SCUGIC_0_DEVICE_ID
static int SetupInterruptSystem() {
int Status;
XScuGic_Config *IntcConfig; //GIC config

Xil_ExceptionInit();

IntcConfig = XScuGic_LookupConfig(INTC_DEVICE_ID);
XScuGic_CfgInitialize(&Intc, IntcConfig, IntcConfig->CpuBaseAddress);

//connect to the HW
Xil_ExceptionRegisterHandler(XIL_EXCEPTION_ID_INT,//== XIL_EXCEPTION_IRQ
(Xil_ExceptionHandler)XScuGic_InterruptHandler, &Intc);
#define SGI_NUM 2
Status = XScuGic_Connect(&Intc, SGI_NUM,
(Xil_ExceptionHandler)on_SGI,(void *)&Intc);
if (Status != XST_SUCCESS) {
return XST_FAILURE;
}
XScuGic_Enable(&Intc, SGI_NUM);

// Enable interrupts in the Processor.
Xil_ExceptionEnableMask(XIL_EXCEPTION_IRQ);
return XST_SUCCESS;
}
int main(void)
{
...
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, ledon);

SetupInterruptSystem();

while(1) {
volatile int Delay;
for (Delay = 0; Delay < 10000000; Delay++);
}
return XST_SUCCESS;
}

The idea is to initially turn on the LED, and toggle it only in the ISR.

Userspace test

To see if the interrupt can be delivered to CPU1, I first boot the bare metal application as I did in the last blog entry

# echo 1 > /sys/devices/1fe00000.remoteproc/remoteproc0/up

The MIO LED is lit when cpu1app starts.  To get ready to examine the interrupt status registers, I bring up the Xilinx JTAG debugger (see this previous blog entry for how), and then write to the irq file

# echo 2 > /sys/devices/1fe00000.remoteproc/remoteproc0/irq

The LED turns off!  And then on, and off, every time I run the above command!

Bonus: putting the CPU1 into WFE while waiting for the interrupt

My typical real-time SW is completely event driven, so that the main loop does not need to do any work.  In this case, putting CPU1 into sleep waiting for an interrupt will save power.  Changing the main()'s infinite while loop to sleep is trivial, thanks to the WFE instruction available on ARMv7 and on:

while(1) {
asm("WFE" : : : );
}

The LEDs still toggle in response to my writing 2 into the irq attribute file, so WFE works as expected.

In fact, sending an event itself can be a poor man's way of interrupting the bare metal application on CPU1 (poor because SEV instruction wakes up ALL processors; but in a 2 processor situation, 1 is already awake, so not much of a hit except for possibly 1 unnecessary context switch) if CPU1 is normally waiting for a command from CPU1.

Interrupting Linux from bare metal

Sending the interrupt from CPU1

As shown in the Linux code, SENDING the interrupt is much easier than receiving the interrupt.  Xilinx BSP makes it almost trivial:

static void on_SGI(void*CallBackRef) {
//reading interrupt status acknowledges pending interrupt
#define ICCIAR (XPAR_PS7_SCUGIC_0_BASEADDR | 0x10C)
u32 status = Xil_In32(ICCIAR);
XGpioPs_WritePin(&Gpio, OUTPUT_PIN, ledon ^= 1);//toggle LED

status = XScuGic_SoftwareIntr(&Intc, 0, XSCUGIC_SPI_CPU0_MASK);
//TODO check error
}

This test code raises the SW interrupt that (0) zynq_remoteproc is already listening for.

Catching the interrupt in the Linux kernel

In the last blog, I found out (rather painfully) that zynq_remoteproc module already installs a Linux IPI (inter-process-interrupt) handler that doesn't do any work, and that 0 (IPI_WAKEUP) was the only remaining unassigned IPI number (because Linux SMP IPI table only goes up to 7) even though Zynq has a whopping 16 possible software interrupt numbers:

static void ipi_kick(void)
{
dev_info(&remoteprocdev->dev, "KICK Linux because of pending message\n");
//schedule_work(&workqueue);
}

Leaving aside the utility of current kernel module, I wanted to see if the interrupt is caught at all.  So I rebuilt cpu1app and ran it again.  This time, when I sent soft IRQ 2 to CPU1, CPU1 raised an IRQ 0 back to CPU0, and I saw this in the command prompt:

KICK Linux because of pending message

So it does work!

Propagating the interrupt to the userspace: unnecessary?

The ways to alert the userspace application that is waiting for an event from CPU1 might be application specific:
  • If only 1 application were waiting for some kind of data, sending a signal may be the easiest.
  • If the kernel module does not know how many userspace application wants the data, a netlink socket broadcast may be more appropriate.
Perhaps independent of how to wake up a userspace application, if a high data rate, maybe the message should be sent over DMA, and the DMA controller may raise a DMA done interrupt, which the CPU0 can catch and handle.

A caution: Zynq OCM is already used by kernel

Linux kernel suspend (part of pm subsystem) runs the last stage of suspend from OCM (after powering off the DDR?).  In ADI kernel's arch/arm/mach-zynq/pm.c zynq_pm_suspend_init(), zynq_sys_suspend_sz number of bytes are copied into the OCM base.  zynq_sys_suspend_sz is calculated in <kernel>/arch/arm/mach-zynq/suspend.S:

ENTRY(zynq_sys_suspend_sz)
.word . - zynq_sys_suspend

which means: zynq_sys_suspend_sz is the size of the assembly function that starts at ENTRY(zynq_sys_suspend) in the same file (line 50).  Just counting the lines from that point to the .word label above (line 182), and subtracting empty and comment lines, I'd say it's about 100 lines of assembly, so I'd ballpark the suspend code to be ~400 bytes (assuming this code is ARM--I don't see anything that indicates the code is THUMB).
I would guess it'a good practice to avoid the 1st page of the OCM.  Therefore, I will try to constrain my usage of OCM to start at 0xFFFC1000.

13 comments:

  1. Hello! Excellent article! I particularly found the information on how to use software-based interrupts on Standalone useful for my application. I've read some of the material you mentioned in chapter 7 of the Zynq TRM, but I have yet to implement anything. For my work, I run separate copies of FreeRTOS on each core instead of Linux, however. Have you done any work with Standalone on one core and Standalone on the other?

    ReplyDelete
    Replies
    1. I think Xilinx xapp1079 should be an excellent resource for you.

      Delete
    2. I already reviewed xapp1079 when I initially got Standalone to run in asymmetric multiprocessing. FreeRTOS simply extends the functionality of Standalone. I was asking because I wanted to know if you've done work with Standalone running on both cores, using both hardware and software interrupts.

      Delete
    3. I thought maybe you want symmetric-multi-processing stand-alone code, which would be extremely hard to do and I believe unnecessary (that's what a modern OS does!). But reading your question again, it seems you want to run 2 separate stand-alone code on the separate cores--which is what xapp1079 shows.

      To answer your question directly, I don't see myself using multiple cores for stand-alone usage for quite some time, although Xilinx is now trying to market ultra-scale, in which case I would still want to run Linux on at least 2 out of 4 cores. If I need a lot of math, I was thinking about putting the DSP on FPGA, or just going to something like DaVinci...

      Delete
    4. dear Heny Choi,

      Thanking you for presenting excellent resource !!

      Can you please tell me how to develop the program for linux using interrupt related function for zynq!! I tried very much with SDK but it seems i am very far away from success. I will be really grateful for this!

      Delete
    5. Can you be more specific? Are you asking how to raise an interrupt for the Linux device driver for your particular device?

      Delete
    6. My question is I want to write interrupt handlers codes for the interrupt coming from from various resources like Push button, my PL logic, DMA transfer completion using the prebuilt xilinx drivers.

      I want to use xilinx-SDk for this purpose.

      Delete
    7. @Choi

      "I thought maybe you want symmetric-multi-processing stand-alone code, which would be extremely hard to do and I believe unnecessary (that's what a modern OS does!)."

      I wouldn't necessarily say that. There are applications for which smaller operating systems (e.g. real-time OS) could in fact be advantageous over relatively larger operating systems (e.g. Linux).

      "To answer your question directly, I don't see myself using multiple cores for stand-alone usage for quite some time, although Xilinx is now trying to market ultra-scale, in which case I would still want to run Linux on at least 2 out of 4 cores."

      Are you referring to the Zynq Ultrascale? I'll have to look more deeply into this.

      @kaushal

      In case you're still looking for an answer, have you looked into "scugic" examples, assuming you're using Standalone for your application?

      Delete
    8. RTOS (even the FreeRTOS) IS an OS--and a good one to boot--don't you think? It's even supported! My point is that as a high level system integrator, I just don't have the time to be dorking with the OS cores, and I don't have to, because they are so good.

      There are apparently no eval board for the Zynq Ultrascale right now, but if you wind up studying it, I would love to hear about your experience. These days, I am too busy reviewing DSP textbooks to play around with Zynq any more. I cannot seem to find a very high paying gig in big metal processing in deeply embedded space (although it's certainly fun--in a similar sense that VLSI is fun), and thought I should pay more attention to Cypress PSOC for the type of HW I want to make.

      I was rather perplexed by kaushal's question: it seemed to suggest that the Xilinx/ADI supplied device drivers do not work (at least for raising interrupts), which I would doubt based on my experience. I would NOT use them because I want to handle the low level device interface from a hard real-time pure C++ code running on CPU1, but NOT because the Xilinx/ADI supplied device drivers do not work.

      Delete
    9. @Henry

      "There are apparently no eval board for the Zynq Ultrascale right now, but if you wind up studying it, I would love to hear about your experience. These days, I am too busy reviewing DSP textbooks to play around with Zynq any more. I cannot seem to find a very high paying gig in big metal processing in deeply embedded space (although it's certainly fun--in a similar sense that VLSI is fun), and thought I should pay more attention to Cypress PSOC for the type of HW I want to make."

      There is a good chance I will. I'm actually graduate student, and a lot of the work on which I've been doing research has to do with Zynq. What do you mean by "big metal processing"? Assuming you meant "bare metal processing", from your experience, are you finding few people are using Standalone in their applications?

      "I was rather perplexed by kaushal's question: it seemed to suggest that the Xilinx/ADI supplied device drivers do not work (at least for raising interrupts), which I would doubt based on my experience. I would NOT use them because I want to handle the low level device interface from a hard real-time pure C++ code running on CPU1, but NOT because the Xilinx/ADI supplied device drivers do not work."

      I assumed he wanted to write his interrupts using Standalone for the low-level control.

      Delete
    10. dear Henry and Andrew Powell
      First of all thanking you very much for your answers!

      In case you're still looking for an answer, have you looked into "scugic" examples, assuming you're using Standalone for your application?
      yes this is the first thing i have done. But I will be more glad if Henry could put more light on Catching the interrupt in linux.

      I have absoultey NO complain for the driver that xilinx is providing us.
      I am currently using ZC702 and subsequently we have plan to change to Zc706, as far as i think it should not be problem to migrate the design from ZC702 to Zc706.
      ====================================================================
      (i am not an expert in linux.)
      While compiling my linux kernel i could see that Xilinx provides us the kernel-module-drivers for cdma, dma, GIC and many more (for me these are relevant).
      I could also see (/proc/interrupts) that whenever i am raising my interrupt these module driver can recognise it and SOMEHOW counting the number of interrupt occured. WELL NOW using all these I would like to write very basic program say writing Hallo-world whenever I am raising my inteerupt.

      I will be also glad if you could guide me and How could i access *.ko driver modules. for example. if i want to write user space program using xilinx_axidma.ko.

      Please let me if i could not pur my words in clear format!

      Thanking you again for your kind help!.

      Delete
    11. For anyone looking for a solution to propagate SGIs (IPIs) to user space in a UIO manner, you might have a look here: https://github.com/cptn-popcorn/user_sgi. It gets particularly interesting on the Zynq with as many as 16 SGIs available.

      Delete
  2. Hi,

    We are using Zynq7000 in AMP mode, Linux(CPU 0) and Freertos (CPU 1).

    Need to send interrupt from CPU 0 to CPU 1.

    In Linux Application when I use #include

    fatal error: linux/irqchip/arm-gic.h: No such file or directory

    Do you have any suggestion, Is any kernel configuration is needed??

    Thanks
    Prasanna

    ReplyDelete