Dec 25, 2015

Mainline U-Boot on the stm32f429 discovery board

Emcraft has been slowly mainlining the support for STM32F4 to the U-Boot git, for the STM32F429 discovery board.  As a bonus, the mainline Linux kernel now has stm32_defconfig that should work on this board as well. I shelled out $36 (includes tax and shipping) to Avenet and attempted to build for this board, and ran into an error specific to a toolchain with hardware floating point support:

~u-boot$ make stm32f429-discovery_defconfig
~u-boot$ make
arm-buildroot-uclinux-uclibcgnueabihf-ld.bfd: error: /mnt/work/band/uClinux/BRm4/output/host/usr/bin/../lib/gcc/arm-buildroot-uclinux-uclibcgnueabihf/5.3.0/libgcc.a(_udivmoddi4.o) uses VFP register arguments, u-boot does not
arm-buildroot-uclinux-uclibcgnueabihf-ld.bfd: failed to merge target specific data of file /mnt/work/band/uClinux/BRm4/output/host/usr/bin/../lib/gcc/arm-buildroot-uclinux-uclibcgnueabihf/5.3.0/libgcc.a(_udivmoddi4.o)

The culprit was "__aeabi_uldivmod", pointed out by this recent patch, which is already mainlined.  So I searched for source files having that undefined symbol, and found it in arch/arm/cpu/armv7m/stm32f4/timer.c, which I fixed like this:

#include <div64.h>
ulong get_timer(ulong base)
{
return (get_ticks() / (CONFIG_SYS_HZ_CLOCK / CONFIG_SYS_HZ))
        (ulong)lldiv(get_ticks(), CONFIG_SYS_HZ_CLOCK/CONFIG_SYS_HZ)
      - base;
}

Understanding the Kconfig style <board>_defconfig

Just like the Linux kernel, make <board>_defconfig yields the top level .config file--which should not be hand-edited--used in the main make step.  The .config will then control both what source files get pulled into the executable, and what the CPP sees (through #define's).

Along with these scant description in configs/stm32f429-discovery_defconfig:

CONFIG_ARM=y
CONFIG_TARGET_STM32F429_DISCOVERY=y
CONFIG_SYS_PROMPT="U-Boot > "
# CONFIG_CMD_SETEXPR is not set

board/st/stm32f429-discovery/Kconfig has the defaults [if TARGET_STM32F429_DISCOVERY] for SYS_BOARD, SYS_VENDOR, SYS_SOC ("stm32f4"), and SYS_CONFIG_NAME ("stm32f429-discovery").  From this, scripts/Makefile.autoconf generates include/config.h by substituting SYS_CONFIG_NAME into a template string, yielding:

#define CONFIG_BOARDDIR board/st/stm32f429-discovery
#include <config_defaults.h>
#include <config_uncmd_spl.h>
#include <configs/stm32f429-discovery.h>
#include <asm/config.h>
#include <config_fallbacks.h>

include/configs/stm32f429-discovery.h has the board specific configurations, like:

#define CONFIG_SYS_FLASH_BASE 0x08000000
#define CONFIG_SYS_INIT_SP_ADDR 0x10010000
#define CONFIG_SYS_TEXT_BASE 0x08000000
#define CONFIG_SYS_ICACHE_OFF

Linker script is auto-generated from template u-boot.lds

Similar to how the u-boot.lds was generated in the Emcraft case I covered earlier, the top level Makefile looks in several places for the u-boot.lds template:

ifndef LDSCRIPT
ifeq ($(wildcard $(LDSCRIPT)),)
LDSCRIPT := $(srctree)/board/$(BOARDDIR)/u-boot.lds
endif
ifeq ($(wildcard $(LDSCRIPT)),)
LDSCRIPT := $(srctree)/$(CPUDIR)/u-boot.lds
endif
ifeq ($(wildcard $(LDSCRIPT)),)
LDSCRIPT := $(srctree)/arch/$(ARCH)/cpu/u-boot.lds
endif
endif

 In this example, that is the arch/arm/cpu/u-boot.lds (the last option).  But unlike in the old Emcraft build, the explicit MEMORY segment definition is gone from the linker script, and has been replaced by the "-Ttext" argument to the linker command line.  The vector table is supplied in arch/arm/lib/vectors_m.S (note the "_m"):

   .section  .vectors
ENTRY(_start)
.long CONFIG_SYS_INIT_SP_ADDR @ 0 - Reset stack pointer
.long reset @ 1 - Reset
.long __invalid_entry @ 2 - NMI
.long __hard_fault_entry @ 3 - HardFault
.long __mm_fault_entry @ 4 - MemManage
...

where CONFIG_SYS_INIT_SP_ADDR is supplied in the stm32f429-discovery.h, and the reset vector is supplied in arch/arm/cpu/armv7m/start.S:

.globl reset
.type reset, %function
reset:
b _main

Note that ENTRY(_start) was already specified in the linker script.

Reset vector to board init function array

The arch/arm/lib/crt0.S:_main therefore does the heavy lifting of setting up the C runtime, including setting up the global_data structure as before--in board_init_f_mem(ulong top) function [which is passed the (8-byte aligned according to EABI requirement) stack pointer configured by the CONFIG_SYS_INIT_SP_ADDR above].

common/board_f.c:board_init_f(flag) calls the init functions (that can run before code relocation?) defined in the array:

static init_fnc_t init_sequence_f[] = {
...

arch/arm/cpu/stmf4/soc.c:arch_cpu_init

__weak int arch_cpu_init() is the first architecture specific CPU initialization, and starts to configure clocks straight away; the STM32_FLASH register is written INSIDE configure_clocks(), right after the PLL is ready.  In contrast, Emcraft's u-boot for its stm32f7-som board prepared external flash access first and initialized the systick timer, before configuring the clocks.
  • In clock.c:configure_clocks(), the discovery board uses HSI (high speed internal) as the PLL source, rather than the HSE (high speed external) used for the Emcraft SOM board (clock_setup).
  • The mainline u-boot code seems to use the setbits_le32() / clrbits_le32() / writel() macros extensively, whereas the Emcraft code is more of the straight bit-bang against the raw RCC (reset and clock control) registers.
Compared to the Emcraft code, the MPU setup in generic stm32f4/soc.c is much simpler: it just grants a full access to strongly ordered, shareable 4 GB region.

init device model

This wasn't in the Emcraft code.  There doesn't seem to be any stm32f(4) specific device model code?

bootstage_add_record: evolution of show_boot_progress()

In the previous blog entry, show_boot_progress() was just a an optional (weak function) hook for architecture specific handling of the checkpoints.  bootstage_add_record() now records to a static array of boot stage record (BOOTSTAGE_ID_COUNT = 215 currently) before calling the show_boot_progress().

board_early_init_f: setup GPIO for UART

In general, we want to configure a console as soon as possible.  The assignment of the TX/RX pins to the serial console is board-specific, and for the discovery board, is in the stm32f429-discovery.c:

static const struct stm32_gpio_dsc usart_gpio[] = {
{STM32_GPIO_PORT_X, STM32_GPIO_PIN_TX}, /* TX */
{STM32_GPIO_PORT_X, STM32_GPIO_PIN_RX}, /* RX */
};

But with slightly more work, the arch/arm/include/asm/arch-stm32f4/gpio.h accomodates all STM32 UART variants through #define CONFIG_STM_USART, like this:

#if (CONFIG_STM32_USART == 1)
#define STM32_GPIO_PORT_X   STM32_GPIO_PORT_A
#define STM32_GPIO_PIN_TX   STM32_GPIO_PIN_9
#define STM32_GPIO_PIN_RX   STM32_GPIO_PIN_10
#define STM32_GPIO_USART    STM32_GPIO_AF7
...

This works if both the TX and RX pins are on the same GPIO port group, which is not the case for the Emcraft stm32f7-som board!

timer_init

The Emcraft U-Bool uses systick as the timer (reload value = 0xFFFFFF-1, with external clock = 12 MHz), but stm32f4-discovery board uses the TIM2 driven by the internal block.  Both the Emcraft's U-Boot port and the stm32f4-discovery U-Boot wrap compensates as long as the timer's source clock ticks (which is a problem when the processor goes to deep sleep).

dram_init

Emcraft's stm32f7-som put the SDRAM on the 1st bank (at 0xC0000000), while stm32f4-discovery board put the SDRAM on the 2nd bank (at 0xD0000000).  The rest of the code--including the wait time during the setup--are quite similar.  After SDRAM is brought up, we can (optionally) run DRAM test.

New in mainline (vs Emcraft code): relocate U-Boot to SDRAM

gd->mon_len apparently is the RAM space required for U-Boot code, data, and BSS.  gd->start_addr_sp points to the BEGINNING of contiguous pages for the U-Boot:

static int reserve_uboot(void)
{
gd->relocaddr -= gd->mon_len;
gd->relocaddr &= ~(4096 - 1);
gd->start_addr_sp = gd->relocaddr;

return 0;
}

reserve_malloc() decreases the gd->start_addr_sp by TOTAL_MALLOC_LEN from above, and we also need a global bd_t struct before the malloc area that gd->bd will point to.  bi_arch_number is part of the bd_t and may be potentially meaningful for Linux, but apparently is optional?  Below the bd_t is gd_t (global data), fdt blob, and the 16 byte aligned stack (pointer?).

After calculating all these space requirements, relocation begins:
  • reloc_fdt()
  • setup_reloc()
    • copy global data to gd->new_gd
If you are wondering:"where is the rest of relocation?", remember that board_init_f was called from the assembly function _main.

Relocation continues after returning from board_init_f()

_main uses a clever code to accomplish relocation: of course we have to memcpy the contents from the flash/SRAM to SDRAM.  But by changing the LR from the nominal address AFTER relocate_code(in arch/arm/lib/relocate.S) to the corresponding address in the relocated case, the CPU will magically start running from the relocated address after returning from relocate_code().  Vectors relocation requires not only copying the content, but also telling the CPU about the new vector table address by writing to the V7M_SCB_VTOR register.

c_runtime_cpu_setup; is this what finally pulls the relocation trigger, as I described above?

.globl c_runtime_cpu_setup
c_runtime_cpu_setup:
mov pc, lr

After relocation, _main continues with:
  • zeroes out BSS (between __bss_start and __bss_end, defined in the linker)
  • coloured_LED_init()
  • red_led_on()
  • board_init_r(gd_t* new_gd, ulong dest_addr)

board_init_r(): board init AFTER relocation

Like the board_init_f(), board_init_r() just runs through an array of init functions, this time defined in board_r.c.
  • initr_trace: I did not turn on CONFIG_TRACE
  • initr_reloc does NOT relocate (it's done already!), but merely sets the gd->flags: GD_FLG_RELOC | GD_FLG_FULL_MALLOC_INIT
  • initr_caches: just calls weak function enable_caches().  This is my chance to call the Emcraft's cache initialization code!
  • initr_reloc_global_data?
  • init_barrier: no-op for ARM
  • initr_malloc
  • bootstage_relocate: copy the bootstage names, because "name" pointer is still pointing to the strings in the old .text sections (which have been copied, but the pointers are still dangling)
  • board_init: gd->bd->bi_boot_params = CONFIG_SYS_SDRAM_BASE + 0x100
  • stdio_init_tables,
  • initr_serial
  • initr_announce
  • power_nit_board
  • initr_flash: even though U-Boot booted off the discovery board's internal flash, now this internal flash is registered with U-Boot, so it can write to this flash (for U-Boot self-update, for example).
  • initr_secondary_cpu: yet more initialization that requires full environment and flash? [not the discovery board]
  • stdio_add_devices
  • initr_jumptable: I saw the jump table briefly in the previous blog entry, but honestly I still don't understand it.
  • console_init_r
  • misc_init_r: reads the CPU serial number and stores into "serial#" environment variable
  • interrupt_init: no-op
  • initr_enable_interrupts: no-op
  • initr_ethaddr (if CONFIG_CMD_NET)
  • initr_net (if CONFIG_CMD_NET)
Finally, as in the Emcraft code studied in the previous blog entry, run_main_loop is the last init function to be called, and expected never to return.

Dec 24, 2015

u-boot on STM32F7

I am in the wearable gadget industry now, and that means goodbye to bulky and power hungry Xilinx/Altera chips that I have been playing around with for over 5 years.  Even though Google is trying to breathe new life into MIPS, for a much smaller players like myself at Jawbone, it means ARM Cortex M is the only architecture to consider.  My latest dorking interest is driving a sexy display from a Cortex M.  I narrowed down the uC choice to either M4 or M7 because FPU is necessary for faster time to market algorithm development.  I have also chosen uClinux as the FW platform, knowing that I will need a bigger battery to feed power hungry DRAM.
  • Volatile memory
    • ESMT M12L2561616A-6BIG2K: 32 MB 166 MHz 16-bit SDRAM on the STM32F7 SOM sucks 3 mA even during the self refresh mode (which happens at high rate)
    • Micron MT46H32M16: a 64 MB 5 ns 16-bit device sucks 0.3 mA when not in use.
  • Non-volatile memory
    • Micron MT29F1G08ABADAH4, Spansion S34ML01G100BHi000: 128 MB, 8-bit NAND flash.  1 mA current draw when not in use
    • Spansion S29GL128S10DHI010: 16 MB NOR flash draws only 0.1 mA when not in use.  This product family scales up to 128 MB.
This is the 1st blog in a series leading up to the ultimate goal of running a sexy GUI on an ARM Cortex M7.

Emcraft has trail-blazed uClinux on Cortex M3/4/7 for 6+ years (since about the same time I started playing around with FPGA, interestingly).  I ordered the STM32F7 starter kit (~$200 with tax and shipping), and downloaded the source and documentation.

Emcraft's u-boot on STM32F427

The first step in a board bringup is to run the bootloader.  u-boot is the dominant SSBL (2nd stage bootloader) for Linux (FSBL--the 1st stage bootloader--is the on-chip ROM code).  The Emcraft STM32F SoM comes pre-loaded with u-boot and linux, but since I plan to JTAG debug the code to really understand what is going on, I cloned Emcraft u-boot source, which was forked off u-boot mainline about 6 years ago.  Unmodified, the u-boot.bin image can be built with the following steps.  But first I need a toolchain.  Buildroot can create a customized toolchain (for example with hardware floating point turned on), but for now it is convenient to use a prebuilt Code Sourcery toolchain downloaded from Emcraft.

Building u-boot for the stm32f-som board

To cross compile using the code sourcery toolchain, I need to set a few environment variables (like CROSS_COMPILE) and put the binaries in the path.  Emcraft simplified the setup by assuming that I will symlink the toolchain to the released source folder's (linux-cortexm-1.14.2 currently) tools/ folder, with a script called ACTIVATE.sh that sets up the following environment variables:

export PATH=$TOOLS_PATH/bin:$CROSS_PATH:$PATH
export CROSS_COMPILE=arm-uclinuxeabi-
export CROSS_COMPILE_APPS=arm-uclinuxeabi-
export MCU=STM32F7

To configure u-boot build for a given board:

~band/uClinux/u-boot$ make stm32f7-som_config

This produces 2 makefiles (include/autoconf.mk and include/autoconf.mk.dep) from the board configuration file in include/configs/stm32f7-som.h.  This adds the // to CPPFLAGS (through PLATFORM_CPPFLAGS created in cpu/arm_cortexm3/config.mk).

The board specific codes are in board/emcraft/stm32f-som folder.

~/band/uClinux/u-boot$ ls board/emcraft/stm32f7-som/
board.c  board.o  Makefile

Again, note that the CPU specific defines like CONFIG_MEM_NVM_BASE are supplied in the PLATFORM_CPPFLAGS template in board specific makefile cpu/arm_cortexm3/config.mk.  The values are resolved at the compile time through the -D values supplied by autoconfig.mk (product of make <board>_config step mentioned above) as shown here:

~band/uClinux/u-boot$ make
...
arm-uclinuxeabi-gcc  -g  -Os   -g2 -mthumb -mcpu=cortex-m3 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8 -D__KERNEL__ -I/mnt/work/band/uClinux/u-boot/include -fno-builtin -ffreestanding -isystem /mnt/work/band/uClinux/arm-2010q1/bin/../lib/gcc/arm-uclinuxeabi/4.4.1/include -pipe  -DCONFIG_ARM -D__ARM__ -DCONFIG_MEM_NVM_BASE="0x08000000" -DCONFIG_MEM_NVM_LEN="(1024 * 1024 * 1)" -DCONFIG_MEM_NVM_UBOOT_OFF="0x0" -DCONFIG_MEM_RAM_BASE="0x20000000" -DCONFIG_MEM_RAM_LEN="(20 * 1024)" -DCONFIG_MEM_RAM_BUF_LEN="(88 * 1024)" -DCONFIG_MEM_MALLOC_LEN="(16 * 1024)" -DCONFIG_MEM_STACK_LEN="(4 * 1024)" -I/mnt/work/band/uClinux/u-boot/cpu/arm_cortexm3 -Wall -Wstrict-prototypes -fno-stack-protector   -o board.o board.c -c

I noticed that Emcraft has removed -nostdinc from CPPFLAGS.  There seems to be no need for the extra features in Cortex M4/M7, so targeting M3 is OK.  We don't even use the "-thumb2" option.  Also note the 1 MB on-chip flash memory is at 0x0800_0000 rather than the customary 0x0, as dictated by the STM32F7's memory map (from the device datasheet):
U-boot is modular, and the build creates a library for each modules.  The libraries given to the linker are the u-boot submodules, many of them consisting of platform specific sources:
  • libcommon: cmd_bdinfo.o cmd_boot.o cmd_bootm.o cmd_flash.o cmd_help.o cmd_load.o cmd_mem.o cmd_net.o cmd_nvedit.o cmd_pcmcia.o cmd_version.o command.o console.o dlmalloc.o env_common.o env_envm.o exports.o flash.o image.o lcd.o main.o memsize.o s_record.o stdio.o xyzModem.o
  • libgeneric: CRC. display, div64, gunzip, lmb, ldiv, net_utils, string, strmhz, time, vsprintf, zlib
  • libarm_cortexm3: cpu, cmd_cptf, timer
  • libstm32: clock, cpu, envm, wdt, fsmc, soc
  • libarm: board, bootm, cache, cache-cp15, interrupts, reset
  • libnet: bootp, eth, net, rarp, tftp, stm32_eth?
  • libgpio: stm32f2_gpio
  • libmtd: cfi_flash
  • libpcmcia rpx_pcmcia tqm8xx_pcmcia
  • librtc: date
  • libserial: stm32_usart
  • libdisk: part
  • libusb_phy: twl4030
  • libvideo: stm32f4_lcdfb videomodes
In the above list, omitted empty libraries; u-boot build seems to create libraries for ALL possible submodules, and let the library be empty if the feature is not selected.

Finally, cross ld links the startup code cpu/arm_cortexm3/start.o is linked with the above architecture and SoC specific libraries and architecture-independent code into the final executable u-boot:

arm-uclinuxeabi-ld -Bstatic -T u-boot.lds $UNDEF_SYM cpu/arm_cortexm3/start.o --start-group cpu/arm_cortexm3/libarm_cortexm3.a cpu/arm_cortexm3/stm32/libstm32.a lib_arm/libarm.a  board/emcraft/stm32f7-som/libstm32f7-som.a ...  --end-group -L /mnt/work/band/uClinux/arm-2010q1/bin/../lib/gcc/arm-uclinuxeabi/4.4.1/thumb2 -lgcc -Map u-boot.map -o u-boot

The start/end-group semantics relieves us from keeping track of the object/library order (else you may get an unresolved symbol even though you have the full list of objects).  

The linker script used above (u-boot.lds) is generated from the template cpu/arm_cortexm3/u-boot.lds linker script, filled with the variable values resolved by the CPP (through a helper header file include/u-boot/u-boot.lds.h), as you can see below (note the -E option, which stops gcc right after the preprocess stage):

arm-uclinuxeabi-gcc -E [whole bunch of CPP options] -include /mnt/work/band/uClinux/u-boot/include/u-boot/u-boot.lds.h  - </mnt/work/band/uClinux/u-boot/cpu/arm_cortexm3/u-boot.lds > u-boot.lds

The u-boot.lds.h is NOT a template file (in that there are no variables to fill); we just needed SOME header file to feed to "gcc -E".  The value of linker script variables are defined in board specific header file include/configs/stm32f7-som.h, as in this example:

#define CONFIG_MEM_NVM_BASE 0x08000000

The resulting linker script just defines the NVM, RAM, MALLOC (heap) and STACK sections.  The final executable u-boot.bin itself is just an objcopy'ed binary of u-boot that can be stored in the on-chip flash of the uC. 

arm-uclinuxeabi-objcopy --gap-fill=0xff -O binary u-boot u-boot.bin

The resulting u-boot.bin is 100 KB, which is tiny if you are used to targets capable of full-blown embedded Linux, but huge if you are used to writing a bare metal C deeply embedded FW. 

The volatile memory sections in RAM are mapped across the physical device boundary, as you can see below:
Note that the stack begins where the heap ends; the _start() reset vector calls it
"_armboot_start", and sets it to _mem_stack_base defined in cpu/arm_cortexm3/u-boot.lds.  The board config needs to tell the heap size u-boot, as stm32f7-som.h has done:

#define CONFIG_SYS_MALLOC_LEN CONFIG_MEM_MALLOC_LEN

The rest of RAM holds global/static data (modifiable variables); during C runtime initialization (in _start() reset vector), data is memcpy'ed from NVM (&_data_lma_start) to RAM (&_data_start), because the ".data" section in the cpu/arm_cortexm3/u-boot.lds declares both the load address and the target address this way:

.data :
{
_data_start = .;
_data_lma_start = LOADADDR(.data);
*(.data)
. = ALIGN(4);
#if ! (defined(CONFIG_MEM_RAMCODE_BASE) && defined(CONFIG_MEM_RAMCODE_LEN))
*(.ramcode)
#endif
_data_end = .;
} >RAM AT>NVM

Because of the ordering in the linker script, ".data" section comes before the ".bss" section.  There is a variable "monitor_flash_len", which is the number of bytes between the end of the ".bss" and _armboot_start (where the heap ends); what is it used for??

If the code has CONFIG_MEM_RAMCODE_BASE, then the ".ramcode" will be memcpy'ed to RAM (at the address CONFIG_MEM_RAMCODE_BASE) as well.  But this BSP (stm32f7-som) does NOT use RAMCODE.

Next, to understand the board specific code better, let's use the Eclipse CDT indexer.

Browsing u-boot code in Eclipse CDT

I studied Linux kernel and device drivers partly by browsing the code in Eclipse CDT, letting the CDT do the heavy lifting of searching and indexing the code.  I could just setup a C Makefile project using the host GNU toolchain to analyze the code, but I was curious if I could get a better result using the cross arm toolchain.  If I have a successfully built Buildroot rootfs, I could register the toolchain used during that build with Eclipse (AKA Buildroot Eclipse integration feature).  But until then, I can get by with a generic GNU ARM Eclipse cross toolchain, which can be obtained in Eclipse menu --> Help --> Install new software --> Add, and then specifying the plugin URI (http://gnuarmeclipse.sourceforge.net/updates) as shown below:
 I have a Segger J-Link ARM debugger (~ $400), so I check that package, but other than the cross compiler and the STM32Fx support, other plugins are optional.  The cross compiler plugin allows me to choose the "Cross ARM GCC" toolchain while creating an out-of-tree (to avoid polluting the u-boot repository with the Eclipse project file) C makefile project for u-boot, as shown below:
For some reason, the cross ARM GCC plugin does not find the binaries for the toolchain (or maybe the plugin does NOT include the binaries in the first place?), so I just give specify the Code Sourcery toolchain downloaded from Emcraft, as shown below.
The key to creating an out-of-tree project is to add the (actual) source folder(s) as linked folders by right clicking on the project --> New --> Folder --> Advanced --> Link to alternate location, as shown in the example below
In the resource filters, I often specify a recursive file exclude filter for "*test*.  Since a multi-platform project like u-boot has many sources that are irrelevant for my platform, I hide them from the indexer by right-clicking on those folders --> Resource configurations --> Exclude from build.  The cpu/ and board/ folders (reorganized to arch/ folder in the u-boot mainline) contain everything u-boot supports (MANY!), so it is essential to exclude all folders under those EXCEPT my platform, for the indexing to be of value.  With these care, the indexer will run through the code in < 10 seconds, which is almost 2 orders of magnitude faster than indexing the latest Linux kernel tree, so I can appreciate the complexity of the Linux kernel.  After indexing completes, I can start reading the code from the PoR reset vector defined in the .vectors section of the ROM (placed at 0x08000000 by the u-boot.lds linker script discussed above) in cpu/arm_cortexm3/start.c:

unsigned int vectors[] __attribute__((section(".vectors"))) = {
[0] = (unsigned long)&_mem_stack_end,
[1] = (unsigned int)&_start,
[2 ... 165] = (unsigned int)&default_isr
};

The 2nd entry is the PoR vector, defined right below this simple vector table, and the default_isr is just an while(1) loop--which hangs the uC.  Unless more vectors are registered later, u-boot will not handle interrupt.  _start() is a C function in cpu/arm_cortexm3/start.c that sets up the most basic C environment (copies .data and .ramcode, zeros .bss) before yielding to lib_arm/board.c:start_armboot().  start_armboot() runs the init functions hard coded in init_sequence array, as you can see:

for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) {
if ((*init_fnc_ptr)() != 0) {
hang ();
}
}

This array is also in lib_arm/board.c, and does NOT use weak reference, so the only thing a board can do is use the #define to control what to run/not.  I show the elements that stm32f7-som does NOT define in a gray font.

init_fnc_t *init_sequence[] = {
#if defined(CONFIG_ARCH_CPU_INIT)
arch_cpu_init, /* basic arch cpu dependent setup */
#endif
board_init, /* basic board dependent setup */
#if defined(CONFIG_USE_IRQ)
interrupt_init, /* set up exceptions */
#endif
#if !defined(CONFIG_ARCH_CPU_INIT)
/*
* `arch_cpu_init` always calls `timer_init`,
* no need to call it twice
*/
timer_init, /* initialize timer */
#endif
#ifdef CONFIG_FSL_ESDHC
get_clocks,
#endif
env_init, /* initialize environment */
init_baudrate, /* initialze baudrate settings */
serial_init, /* serial communications setup */
console_init_f, /* stage 1 init of console */
display_banner, /* say that we are here */
#if defined(CONFIG_DISPLAY_CPUINFO)
print_cpuinfo, /* display cpu info (and speed) */
#endif
#if defined(CONFIG_DISPLAY_BOARDINFO)
checkboard, /* display board info */
#endif
#if defined(CONFIG_HARD_I2C) || defined(CONFIG_SOFT_I2C)
init_func_i2c,
#endif
dram_init, /* configure available RAM banks */
#if defined(CONFIG_CMD_PCI) || defined (CONFIG_PCI)
arm_pci_init,
#endif
display_dram_config,
NULL,
};

I am not sure why nand_init(), onenand_init(), AT91F_DataflashInit() are not handled in the above list.  But stdio and console (console is inited in 2 stages) are inited next--as soon as possible after the above "minimal" components are ready.

Next, even though u-boot design principle claimed u-boot is single threaded, interrupts ARE enabled.  U-Boot does NOT blindly load Linux kernel; it merely sets loads the environment variable "loadaddr" and "bootfile", and spins in a background loop, waiting for console command:

for (;;) {
main_loop ();
}

In the common/main.c:main_loop(), if bootcmd is already defined, the boot command can run right away:

s = getenv ("bootcmd");
if (bootdelay >= 0 && s && !abortboot (bootdelay)) {
... run_command (s, 0);
}

Building a GNU toolchain for ARM Cortex M7 with Buildroot

The Code Sourcery toolchain downloaded from Emcraft website is for M3 and 5 years old.  I plan to build a lot of software for the target (like the kernel, the libc, and Qt).  Buildroot is a nice framework to consolidate all those configuration and build activities in 1 place.  The only question is whether Buildroot can handle MMU-less configurations, and I aim to find out in this blog entry.

I first download the latest Buildroot:

~/band/uClinux$ git clone git://git.buildroot.net/buildroot
~/band/uClinux$ cd buildroot
~/band/uClinux/buildroot$ git checkout 2015.11

I actually tried the top of the tree, but there is a problem with noMMU case there, so I am just using the 2015.11 for now.

Buildroot stores the several reference board configs--such as the Zedboard that I have been playing around with for the past 5 years--in the config/ folder, which does not contain any reference Cortex M boards.  So I just copied the Zedboard config as the Emcraft eval board I bought, and start modifying.

~band/uClinux/buildroot/configs$ cp zedboard_defconfig emcraft_defconfig
~band/uClinux/buildroot/configs$ cd ..
~band/uClinux/buildroot$ make emcraft_defconfig
~band/uClinux/buildroot$ make xconfig

Currently (December 2015), only Cortex M3 is supported, so let's try this option.

Without an MMU support, shared (AKA dynamic) libraries are impossible, which has far reaching consequences such as precluding glibc--in turn a prerequisite for many of the packages I am interested in: mesa3D, Qt5.  Emcraft has ported Qt4 widgets modules to uClinux, and I will first try to understand their u-boot, uClinux, and rootfs first before looking into whether porting Qt5 to uClinux is desirable.

Add Cortex M7 Buildroot architecture

I patched arch/Config.in.arm this way:

@@ -156,6 +159,13 @@ config BR2_cortex_m3
        bool "cortex-M3"
        select BR2_ARM_CPU_HAS_THUMB
        select BR2_ARM_CPU_HAS_THUMB2
+config BR2_cortex_m4
+       bool "cortex-M4"
+       select BR2_ARM_CPU_HAS_THUMB
+       select BR2_ARM_CPU_HAS_VFPV4
+       select BR2_ARM_CPU_HAS_THUMB2

 config BR2_fa526
        bool "fa526/626"
        select BR2_ARM_CPU_HAS_ARM
@@ -426,6 +436,7 @@ config BR2_GCC_TARGET_CPU
        default "cortex-a12"    if BR2_cortex_a12
        default "cortex-a15"    if BR2_cortex_a15
        default "cortex-m3"     if BR2_cortex_m3
+       default "cortex-m4"     if BR2_cortex_m4
        default "fa526"         if BR2_fa526
        default "marvell-pj4"   if BR2_pj4

        default "strongarm"     if BR2_strongarm

The main idea with this mod is that Cortex M4 is similar to Cortex M3, except for the hardware floating point addition.  While Cortex M4 implements only FPV4 (according to the ARM reference manual), Cortex M7 implements the FPV5, which gcc currently does not seem to support.

I stay away from fancy features like the DSP (which is kind of like Neon, but I don't know how much).

A patch for using the LDMIA correctly on Cortex M

Buildroot was apparently forcing ARM mode for Cortex M (and consequently using LDMIA instruction inappropriately for thumb2; the constraint is explained on the ARM reference, but copied here for convenience "If an LDM instruction is used to load the PC, ensure that bit 0 of the loaded value is set to 1; otherwise a fault exception occurs.  For both the STM and LDM instructions, the stack pointer should not be included in the register list.  Also be aware that if you have an LDM instruction that has the LR in the register list, you cannot include the PC in the list."), requiring a recently published patch to work around the problem.

Minimal Buildroot config for Cortex M4 development

Here's my Buildroot defconfig that makes use of the newly defined Cortex M4 to build just the toolchain:

BR2_arm=y
BR2_cortex_m4=y
BR2_ARM_FPU_VFPV4=y
BR2_ENABLE_DEBUG=y
BR2_TOOLCHAIN_BUILDROOT_LOCALE=y
# BR2_UCLIBC_INSTALL_UTILS is not set
BR2_BINUTILS_VERSION_2_25_X=y
BR2_GCC_VERSION_5_X=y
BR2_TOOLCHAIN_BUILDROOT_CXX=y
BR2_PACKAGE_HOST_ELF2FLT=y
BR2_PACKAGE_HOST_GDB=y
BR2_PACKAGE_HOST_GDB_TUI=y
BR2_PACKAGE_HOST_GDB_PYTHON=y
BR2_GDB_VERSION_7_10=y
BR2_ENABLE_LOCALE_PURGE=y
BR2_ENABLE_LOCALE_WHITELIST="en_US"
BR2_ECLIPSE_REGISTER=y
BR2_TARGET_GENERIC_HOSTNAME="uClinux"
BR2_TARGET_GENERIC_ISSUE="Welcome!"
BR2_ROOTFS_DEVICE_CREATION_STATIC=y
BR2_TARGET_GENERIC_ROOT_PASSWD="****"
BR2_TARGET_GENERIC_GETTY_PORT="ttyPS0"
BR2_PACKAGE_BUSYBOX_SHOW_OTHERS=y
BR2_TARGET_ROOTFS_CPIO=y
BR2_TARGET_ROOTFS_CPIO_GZIP=y
BR2_TARGET_ROOTFS_CPIO_UIMAGE=y

The BR2_ECLIPSE_REGISTER option registers the toolchain to Eclipse, so that Eclipse will present it as a toolchain choice during project creation (even for the Makefile project).  Note that I did not even enable the kernel or the bootloader yet.  Although not captured above (why??), I've turned on the following toolchain features:
  • EABIhf (hardware floating point)
  • Do NOT strip symbols (useful when debugging libraries like Qt later)
  • NPTL (Native POSIX threading)
  • Enable compiler tls (thread-local-storage) support
I also wanted the following features, but could not get past build errors:
  • No separate code and data (BR2_BINFMT_FLAT_SEP_DATA) => have to keep the code and data in the same section, to avoid "unsupported compiler option" related to keeping the code and data in different sections.
After the build, the toolchain is in <BR2>/output/host/usr/bin.

~/band/uClinux/BRm4$ ls output/host/usr/bin/
2to3                                                     arm-linux-gcc-nm
aclocal                                                  arm-linux-gcc-ranlib
aclocal-1.15                                             arm-linux-gcov
arm-buildroot-uclinux-uclibcgnueabihf-addr2line          arm-linux-gcov-tool
...

The long preamble before the short toolname (like addr2line above) is what the CROSS_COMPILE environment should be set to when cross-compiling using this toolchain.  I already built U-Boot with the Code Sourcery toolchain above, but let's rebuild it with this toolchain.

Modifying Emcraft's u-boot for the Buildroot GCC toolchain

Unlike for the Linux kernel, Buildroot does not provide an option to build a local u-boot folder (only git repositories or tarballs), so I will build my local clone of the Emcraft's u-boot using the Emcraft makefile rather than Buildroot for now.  First, I make a sym link in the Emcraft's tools/ folder:

~/band/uClinux/linux-cortexm-1.14.2/tools$ ln -s /mnt/work/band/uClinux/BRm4/output/host/usr br2

Then I copy the existing ACTIVATE.sh script to modify for the Buildroot toolchain:

TOOLCHAIN=br2
export INSTALL_ROOT=`pwd`
TOOLS_PATH=$INSTALL_ROOT/tools
CROSS_PATH=$TOOLS_PATH/$TOOLCHAIN/bin
export PATH=$TOOLS_PATH/bin:$CROSS_PATH:$PATH

# Path to cross-tools
export CROSS_COMPILE=arm-buildroot-uclinux-uclibcgnueabihf-
export CROSS_COMPILE_APPS=arm-buildroot-uclinux-uclibcgnueabihf-

# Define the MCU architecture
export MCU=STM32F7

Note that the shortcut softlinks "arm-linux-" does NOT work.
I can invoke this script, to set up the environment.

henry@w540:~/band/uClinux/linux-cortexm-1.14.2$ . ./br2.sh 

Before running the same make commands as before, I want to avoid stripping out the debugging symbols.

Including debugging symbols into u-boot

I confirmed that the example/standalone/hello_world has debugging symbols, but u-boot does not.  Running the following command shows that debug symbol has been stripped from the final ELF.

$ BRm4/output/host/usr/bin/arm-linux-readelf --debug-dump  u-boot

This is because the linker script is dumping all other sections than those explicitly accounted for in the script:

SECTIONS
{
...
 } >STACK

/DISCARD/ :
{
*(*)
}
}

The compile effort shows that the Emcraft u-boot needs to be modified slightly for the latest gcc: the functions pointed to by the weakly aliased functions cannot be inlined any more, so for example, I removed "inline" in the following function:

void inline __show_boot_progress (int val) {}
void show_boot_progress (int val) __attribute__((weak, alias("__show_boot_progress")));

The other problem is that nonconstant expression is no longer allowed for linker script memory section origin, as in cpu/arm_cortexm3/u-boot.lds.  So I did the addition in stm32f7-som.h:

#define CONFIG_MEM_RAM_BUF_BASE (CONFIG_MEM_RAM_BASE + CONFIG_MEM_RAM_LEN)
#define CONFIG_MEM_MALLOC_BASE (CONFIG_MEM_RAM_BUF_BASE + \
CONFIG_MEM_RAM_BUF_LEN)
#define CONFIG_MEM_STACK_BASE (CONFIG_MEM_MALLOC_BASE + CONFIG_MEM_MALLOC_LEN)

These defines are exported to make through cpu/arm_cortexm3/config.mk:

PLATFORM_CPPFLAGS += -DCONFIG_MEM_RAM_BASE=$(CONFIG_MEM_RAM_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_RAM_BUF_BASE=$(CONFIG_MEM_RAM_BUF_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_MALLOC_BASE=$(CONFIG_MEM_MALLOC_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_STACK_BASE=$(CONFIG_MEM_STACK_BASE)

Finally, the linker script cpu/arm_cortexm3/u-boot.lds can make use of these defines

MEMORY
{
NVM (r): ORIGIN = CONFIG_MEM_NVM_BASE, LENGTH = NVM_LEN
RAM (rw): ORIGIN = CONFIG_MEM_RAM_BASE,
    LENGTH = CONFIG_MEM_RAM_LEN
RAM_BUF (r): ORIGIN = CONFIG_MEM_RAM_BUF_BASE,  /* <board>.h */
LENGTH = CONFIG_MEM_RAM_BUF_LEN
MALLOC (r): ORIGIN = CONFIG_MEM_MALLOC_BASE, /* <board>.h */
      LENGTH = CONFIG_MEM_MALLOC_LEN
STACK (r): ORIGIN = CONFIG_MEM_STACK_BASE, /* <board>.h */
      LENGTH = CONFIG_MEM_STACK_LEN
#if defined(CONFIG_MEM_RAMCODE_BASE) && defined(CONFIG_MEM_RAMCODE_LEN)
RAMCODE (rw): ORIGIN = CONFIG_MEM_RAMCODE_BASE, \
LENGTH = CONFIG_MEM_RAMCODE_LEN
#endif
}

SECTIONS
{
.vectors CONFIG_MEM_NVM_BASE :
...

Note that the resulting linker script merely has the symbols substituted with values; it did NOT perform the math itself--but this is apparently OK!

MEMORY
{
 NVM (r): ORIGIN = 0x08000000, LENGTH = ((1024 * 1024 * 1) - 0x0)
 RAM (rw): ORIGIN = 0x20000000,
       LENGTH = (20 * 1024)
 RAM_BUF (r): ORIGIN = (0x20000000 + (20 * 1024)),
   LENGTH = (88 * 1024)
 MALLOC (r): ORIGIN = ((0x20000000 + (20 * 1024)) + (88 * 1024)),
          LENGTH = (16 * 1024)
 STACK (r): ORIGIN = (((0x20000000 + (20 * 1024)) + (88 * 1024)) + (16 * 1024)),
         LENGTH = (4 * 1024)
}

I also changed the target CPU to cortex-m4 (Buildroot compiled cross gcc 5.x did not support -mcpu=cortex-m7):

PLATFORM_RELFLAGS += -g2 -mthumb -mcpu=cortex-m4 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8

The newly built u-boot is 633 KB (more than 6x original size) due to the debug symbols, but u-boot.bin is still the same size.  Loading up this in gdb lets you view functions by name.  Will this successfully run on the target?  Let's find out within an ICD (in-circuit debugger).

JTAG Debugging Emcraft U-Boot in gdb

The STM32F7-SOM starter kit does NOT include an SWD debugger HW.  But I prefer to learn code by setting breakpoints (in HW, if possible), so skip this section if you don't want to mess with HW debugger.

Using the Segger J-Link

I already have a Segger J-Link (rougly $400 before tax and shipping), so I tried to connect the JLinkGDBServer (64-bit DEB installer downloaded from the Segger website) to the target, like this:

henry@w540:~$ JLinkGDBServer -device STM32F746NG
...
Listening on TCP/IP port 2331
Connecting to target...Cache: Separate I- and D-cache.

J-Link found 2 JTAG devices, Total IRLen = 9
JTAG ID: 0x5BA00477 (Cortex-M4)
Connected to target
Waiting for GDB connection...

Before I could start the cross gdb, I had to install libexpat1:i386 (because emcraft toolchain is 32-bit)

$ sudo apt-get install libexpat1:i386

Then I pointed cross-gdb (-tui not supported) at the u-boot ELF file:

$ arm-uclinuxeabi-gdb u-boot

Within gdb, I could connect to the JLinkGDBServer:

(gdb) target remote localhost:2331

And then I reset the target and load the ELF file

(gdb) mo reset
(gdb) load
(gdb) where
#0  0x08000374 in _start ()

Then I can single-step through code.  While the Segger J-Link is nice, ST-Link is an order of magnitude cheaper, so I looked into it.

Using the ST-Link through openocd (0.9.0) on Ubuntu

ST only supports Windows, so I tried openocd--but with a couple of patches from Emcraft support. After unzipping openocd-0.9.0, I applied the 2 patches before auto configuring and building it

~/band/uC/openocd-0.9.0$ patch -p3 < ../../uClinux/emcraft/openocd-0.9.0-stmf7-p1.patch
~/band/uC/openocd-0.9.0$ patch -p3 < ../../uClinux/emcraft/openocd-0.9.0-stmf7-p2.patch
~/band/uC/openocd-0.9.0$ ./configure; make; sudo make install

Then you point openocd to a configuration file ("-f" argument):

source [find ../scripts/interface/stlink-v2.cfg]

set WORKAREASIZE 0x40000

set CHIPNAME STM32F756
source [find target/stm32f7x.cfg]

adapter_khz 150
#reset_config srst_only

$_TARGETNAME configure -event gdb-attach {
   echo "Halting target"
   halt
}

Then the openocd is connects to the target through JTAG, and listens for telnet connection on port 4444.  On Ubuntu, "sudo" is required to run openocd because of access to the HW.

/mnt/work/band/uClinux/emcraft$ sudo openocd -f olimex-arm-usb-ocd-h-stm32f7.cfg

The patched configuration files are in /usr/local/share/openocd/scripts.  While command line driven debugging is possible, graphical debugging is more productive/

Debugging u-boot in Eclipse through ST-Link v2

If possible, it is convenient to launch openocd directly from Eclipse.  But because of the "sudo" requirement, I started openocd outside Eclipse with the command given in the previous section, and specified explicit path to the cross-gdb executable, leaving all other options at default.  This works because the default GDB port for openocd is 3333, and Eclipse uses the same port to connect from the gdb to openocd.

After that, the Eclipse debugger front end works the same as any other target.  Iike to use the HW function breakpoint, available from the inverted triangle at the upper-right corner of the Breakpoints tab shown below.
As explained before, _start is the PoR vector for u-boot on ARM.

Debugging u-boot in Eclipse through Segger J-Link

JLink Eclipse debugging works similarly as the ST-Link

Single-stepping the Emcraft u-boot built with the Buildroot toolchain

After downloading u-boot.bin to the target over JTAG, I saw that the target trips a hard fault and reset.  These are the notes I took while stepping through u-boot init functions in Eclipse debugger to learn about the FW.  U-Boot would print more to the console if DEBUG is defined, but it seems to be undefined in individual C files.
  • init_fnc array
    • arch_cpu_init(): architecture (ARM) specific CPU (STM32F7) init
      • prepare for external flash access
        • Unlock flash command register: Write the 2 hard coded key values to the KEYR
        • Set flash parallelism size to x32 (requires Vdd > 2.7 V)
      • Initialize the systick timer
      • Initialize clock
        • After reset, CPU clock is 16 MHz, and its flash wait state is 0.
        • "gd" is the pointer to the global data struct, holding (among others) a board-specific struct pointer "bd".
        • STM32F7-SOM runs off external 12 MHz oscillator, and achieves 200 MHz (if over-drive is turned on) with PLL
        • systick load register set to 0xFFFFFF - 1 (defined in <>/include/asm-arm/arch-cortexm3/hardware.h
        • STM32F7 chips can over-drive to reach 200 MHz.  But the over-drive mode is not available when 1.8 V < Vdd < 2.1 V--which is the typically used voltage when running off battery.  Since enable_over_drive() spins on an infinite loop waiting for ODRDY and ODSWRDY, it means either that:
          • These ready bits are set no matter what when running on 1.8 V, or
          • Vdd is set briefly > 2.1 V even when running on battery
          • Vdd is 1.2 V when running on battery
        • FW spins on an infinite loop when changing the clock source (clock.c: clock_setup())
        • ART Accelerator is just an instruction cache to hide the flash wait states.
      • cortex_m3_soc_init(): MPU and cache are enabled.  Cortex MPU regions can be overlapped, and the policy resolved using the region number (higher priority wins).  Emcraft used these regions:
enum mpu_rgn {
    MPU_RGN_4GB= 0, /* 4GB address space region */
      MPU_RGN_SDRAM_CA, /* Cacheable SDRAM */
        MPU_RGN_SDRAM_NC, /* Non-cacheable SDRAM */
          MPU_RGN_ENVM_CA, /* Cacheable eNVM */

            MPU_RGN_MAX= 15 /* RBAR[REGION] bits; actually - less */
              };
              Q: why is there no region for the external flash?
              The attributes for each region are:
                    • 4 GB background:
                      • Access permission: 3, meaning full access (RW for both privileged and unprivileged)
                      • TEX=0, S=1, C=0, B=0 => strongly ordered, shareable  
                      • all sub-regions are disabled
                      • size = 31, meaning 4 GB
                      • enable = 0 at first
                    • 32 MB SDRAM
                      • base address: 0xC000000
                      • Access permission: 3 (full)
                      • TEX=1, S=0, C=1, B=1 => normal, non-shareable, cached, buffered (see TEX, C, B, anad S encoding table).  This is possible because STM32F7 has the D/I cache.
                      • all sub-regions disabled
                      • size = 24 => N=25, so 2^N = 32 MB
                      • enable 
                    • 1 MB DMA SRAM
                      • base address: 0xC0000000 + 16 MB (half the size of the SDRAM) - 1 MB
                      • Access permission: 3 (full)
                      • TEX=1, S=0, C=0, B=0 => nonrmal, non-shareable, non cached or buffered.
                      • all sub-regions disabled
                      • size = 19 => 1 MB
                      • enable
                    • 1 MB internal flash
                      • base address = 0x08000000
                      • access permission: 3 (full)
                      • if device memory type, TEX=0, S=0, C=0, B=1 => strongly ordered, shareable; otherwise, TEX=1, S=0, C=1, B=1 => normal, not shareable, cached, buffered.
                      • all sub-regions disabled
                      • size = 19 => 1 MB
                      • enable
                    • MPU is finally enabled when MPU_CTRL[1:0] are asserted.  Bit 1 enables MPU even in hard fault/NMI handler.  Follow up with DSB and ISB.
                  • stm32f7_enable_cache(): follows closely the STM32F7 programming manual Cache Maintenance Operations section
                    • Invalidate I cache: write 0 to ICIALLU (0xE000EF50)
                    • Invalidate D cache:
                      • select D-cache CCSELR = 0
                      • DSB
                      • Find the number of cache ways and sets from CCSIDR:
                        • # sets = CCSIDR[27:13]
                        • associativity = CCSIDR[12:3]
                        • # lines =CCSIDR[2:0]
                      • For each set and way combination, write 0 to DCISW[0] to invalidate
                    • Assert CCR (system control and configuration) bits DC and IC bits to turn on L1 cache.
                • board_init()
                  • The biggest board specific component is the external flash and SDRAM.  TODO: correlate the Emcraft code against the NOR flash and SDRAM datasheet. 
                  • FMC (flexible memory controller--which is different from FSMC: flexible static memory controller) selects among NOR/PSRAM, NAND, and SDRAM with chip select
                    • GPIO pins routed to the memory device must be configured for that role (enum stm32f2_gpio_role:STM32F2_GPIO_ROLE_FMC)
                    • FMC is organized into individually selectable (up to) 64 MB banks, whose timings and wait states are programmed through separate registers.
                    • By compariong the GPIO pins routed to FMC functions in ext_ram_fsmc_fmc_gpio array against uC STM32F746NGH6 datasheet Table 11: FMC pin definition, I realize that the NOR flash and the SDRAM are both 16 bit data, and 23 bit address device.
                • env_init
                • init_baudrate: default is CONFIG_BAUDRATE = 115200 defined in stm32f7-som.h
                • serial_init: board config's CONFIG_STM32_USART_PORT counts from 1 when mapping to the USART peripheral base register address.
                  • 1 start bit, 8 data bits, 1 stop bit
                  • overrun detection diabled
                • console_init_f (called before relocation) only flips the gd->have_console flag?
                • display_banner
                • print_cpuinfo
                • checkboard: just print the board rev
                • dram_init: TODO: cross reference against RM0385 chapter 13 and the device datasheet.  Since the board has only 1 SDRAM (on bank1), all commands have the CTB1 bit set.
                  • start clock, udelay(200), and wait for !FMC_SDSR[BUSY] get deasserted.  udelay is a busy loop (but one that avoids tripping watchdog)
                  • precharge, udelay(100), and wait for !FMC_SDSR[BUSY]
                  • 7 cycles (why?) auto-refresh, 100 usec udelay, and wait for !FMC_SDSR[BUSY]
                  • Load BL and CAS to the SDRAM through the "load mode register" command, udelay(100), and wait for !FMC_SDSR[BUSY]
                  • Set to normal mode, and wait for !FMC_SDSR[BUSY]
                  • Write refresh timer value
                • display_dram_config
              • mem_malloc_init
              • flash_init: the only flash is at 0x6000000 (each subsequent NOR flash banks are at 0x4000000 from there)
                • flash init (drivers/mtd/cfi_flash.c) uses the industry standard (either Intel or AMD) for querying the MTD device.  This code gets pulled in becuase stm32f7-som.h defines CONFIG_SYS_FLASH_CFI and CONFIG_FLASH_CFI_DRIVER.
                • DRAM is stopped (put into self-refresh mode) while writing to flash, and then goes through precharge before returning to normal mode.
                • hard fault (exception number 3 in xPSR[8:0]) seems to trip in flash_init() --> flash_get_size(): info->ext_addr = le16_to_cpu(qry.p_adr); when sp = 0x2001ff28.  In the mixed view, it looks like:
              1773       info->vendor = le16_to_cpu(qry.p_id);
              08004066:   ldrb.w  r2, [sp, #35]   ; 0x23
              0800406a:   ldrb.w  r0, [sp, #36]   ; 0x24
              1774       info->ext_addr = le16_to_cpu(qry.p_adr);
              0800406e:   ldrh.w  r3, [sp, #37]   ; 0x25

              So why is ldrb.w OK but not ldrh.w?  When deadling with a half-word (16-bit), the address CANNOT be odd (#37): unaligned access!  This was caused because gcc 4.7 made the hardware support unaligned the default.  This bug was already patched in the mainline a long time ago, and I the hard fault disappeared when I picked up the patch AND added "no-unaligned-access" directive to gcc in cpu/arm_cortexm3/config.mk:

              PLATFORM_RELFLAGS += -mthumb -mcpu=cortex-m4 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8 -mno-unaligned-access

              After this fix, start_armboot() flies through the rest of the initialization:
              • env_relocate: copy environment from text segment (internal flash) to the SRAM.
              • stdio_init
              • jumptable_init: global data struct's "jt" points to the jump table
              • console_init_r
              • enable_interrupts: does not actually do anything on this board?
              • eth_initialize: finally calls checkpoint (value 64)
              • infinite main_loop: displays the familiar "Hit any key to stop autoboot" prompt.
                • Within the mainloop, various commands are handled.  One of them is the "bootm" command, which loads the kernel into RAM (memmove_wd, which just calls memmove; this means that after the memory devices--no matter the kind--are setup, ARM just copies a byte at a time)
                • lmb_reserve
                • stm32f7_cache_sync_range: why?  And there was not a more generic support for cache invalidation?
                • boot_fn() = boot_os[images.os.os] = do_bootm_linux
                  • Setup various tags (defined in the stm32f7-som.h: memory,  cmdline, dmamem tag)
                  • announce_and_cleanup
                  • kernel_entry: literally the kernel's entry function.  This is an example of the U-Boot's hand-over interface to the Linux kernel.
              When I looked into the Linux kernel startup (the kernel main() function) almost 2 years ago, I knew that Linux could not boot itself, but did not understand the u-boot to Linux handover.

              initrd,txt (initial ramdisk) --> initial ramfs.  One valid version of an initramfs buffer is thus a single .cpio.gz file.  You can create a cpio archive that contains the early userspace image.  Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it will be used directly.  Or you can build the early userspace image directly from source (CONFIG_INITRAMFS_SOURCE; this does not mean *.[ch] files, but actually filesystem contents).

              early-userspace/
              If all required device and filesystem drivers are compiled into the kernel, no need for initrd.  init/main.c:init() will call prepare_namespace() to mount the final root filesystem, based on the root= option and optional init= to run some other init binary than listed at the end of init/main.c:init().   Otherwise (initramfs), prepare_namespace() must be skipped.  This means that [/init] must do all the work.  To maintain backwards compatibility, the /init binary will only run if it comes via an initramfs cpio archive.  If this is not the case, init/main.c:init() will run prepare_namespace() to mount the final root and exec one of the predefined init binaries.

              u-boot to Linux interface

              <>/include/asm-arm/u-boot.h:

              typedef struct bd_info {
                  int bi_baudrate; /* serial console baudrate */
                  unsigned long bi_ip_addr; /* IP Address */
                  struct environment_s        *bi_env;
                  ulong         bi_arch_number; /* unique id for this board */
                  ulong         bi_boot_params; /* where this board expects params */
                  struct /* RAM configuration */
                  {
              ulong start;
              ulong size;
                  } bi_dram[CONFIG_NR_DRAM_BANKS];
              } bd_t;

              The boot params are stored at the beginning of the external DRAM (0xC0000000 on this board), which is what bi_dram[0].start is pointing to.  The kernel will reside at offset 0x8000 from there.

              Using checkpoint to understand u-boot

              I was overwhelmed by the large body of code in u-boot, and spent the next week looking for online documentation, until I read about #define CONFIG_SHOW_BOOT_PROGRESS and checkpoints in the top level README.  When I read in u-boot design principle ("keep it debuggable") that u-boot is single threaded, I got the idea of setting a breakpoint in a centralized boot progress function, to understand u-boot in digestible chunks.  I was worried about understanding the interrupt until I saw this in include/configs/stm32f7-som.h:

              #undef CONFIG_USE_IRQ

              The various checkpoints mentioned in README are raised like this example:

              show_boot_progress (65);

              u-boot supplied a stub for this in common/main.c:

              /*
               * Board-specific Platform code can reimplement show_boot_progress () if needed
               */
              void inline __show_boot_progress (int val) {}
              void show_boot_progress (int val) __attribute__((weak, alias("__show_boot_progress")));

              To set a hardware breakpoint and examine the boot progress number, I overrode the above weak function in board/emcraft/stm32f7-som/board.c:

              void show_boot_progress(int val) {
              val = val;
              }

              Emcraft's u-boot config stm32f7-som.h does NOT turn on CONFIG_SHOW_BOOT_PROGRESS, so I inserted it and rebuilt u-boot.bin.  The init functions discussed in the previous section do NOT trip checkpoints, so it's good that I single-stepped through those function.

              The first thing I noticed about the checkpoints are that the value does NOT increase strictly monotonically; these are just numbers to be referenced against documentation, and there was no promise of monotonicity in the first place.

              Diffing the Emcraft's u-boot against vanilla 2.6.33 u-boot

              One strategic place to start learning how u-boot runs on an MMU-less platform like Cortex M4/M7 is to understand Emcraft's port of u-boot--forked off from the mainline at v2.6.33.  So I git cloned the mainline u-boot and checked out that version:

              ~band/uClinux/u-boot$ git checkout v2.6.33

              I then created C makefile projects in Eclipse for the above branc and Emcraft's u-boot, and compared the 2 projects (highlight the 2 projects --> right click --> Compare --> Against each other).

              By placing reset_cpu() in RAM, there is a possibility for self-upgrading u-boot (how?)

              #ifdef CONFIG_ARMCORTEXM3_RAMCODE
              __attribute__((section(".ramcode")))
              __attribute__ ((long_call))
              #endif
              reset_cpu     (ulong addr);

              But perhaps self-upgrading u-boot is an advanced topic that can be deferred.

              In addition to the u-boot image, SPIFILIB is added to u-boot.bin in Makefile:

              ifeq ($(CONFIG_SYS_LPC18XX)$(CONFIG_SPIFI),yy)
              ifeq ($(CONFIG_SPIFILIB_IN_ENVM),y)
              SPIFILIB_DEP = cpu/arm_cortexm3/lpc18xx/spifilib/spifilib-envm.bin
              else
              SPIFILIB_DEP = cpu/arm_cortexm3/lpc18xx/spifilib/spifilib-dram.bin
              endif
              $(SPIFILIB_DEP): depend
              $(MAKE) -C cpu/arm_cortexm3/lpc18xx/spifilib $(notdir $(SPIFILIB_DEP)) -f spifilib.mk
              endif

              $(obj)u-boot.bin: $(obj)u-boot $(SPIFILIB_DEP)
              $(OBJCOPY) ${OBJCFLAGS} -O binary $< $@

              But to get more out of this, I need to get my hands dirtier.  Perhaps I should port the latest mainline U-Boot to the STM32F7-SOM board to learn more about U-Boot.  Since this blog entry is getting long, I will start another one.