I am in the wearable gadget industry now, and that means goodbye to bulky and power hungry Xilinx/Altera chips that I have been playing around with for over 5 years. Even though Google is trying to breathe new life into MIPS, for a much smaller players like myself at Jawbone, it means ARM Cortex M is the only architecture to consider. My latest dorking interest is driving a sexy display from a Cortex M. I narrowed down the uC choice to either M4 or M7 because FPU is necessary for faster time to market algorithm development. I have also chosen uClinux as the FW platform, knowing that I will need a bigger battery to feed power hungry DRAM.
- Volatile memory
- ESMT M12L2561616A-6BIG2K: 32 MB 166 MHz 16-bit SDRAM on the STM32F7 SOM sucks 3 mA even during the self refresh mode (which happens at high rate)
- Micron MT46H32M16: a 64 MB 5 ns 16-bit device sucks 0.3 mA when not in use.
- Non-volatile memory
- Micron
MT29F1G08ABADAH4, Spansion S34ML01G100BHi000: 128 MB, 8-bit NAND flash. 1 mA current draw when not in use
- Spansion S29GL128S10DHI010: 16 MB NOR flash draws only 0.1 mA when not in use. This product family scales up to 128 MB.
This is the 1st blog in a series leading up to the ultimate goal of running a sexy GUI on an ARM Cortex M7.
Emcraft has trail-blazed uClinux on Cortex M3/4/7 for 6+ years (since about the same time I started playing around with FPGA, interestingly). I ordered the STM32F7 starter kit (~$200 with tax and shipping), and downloaded the source and
documentation.
Emcraft's u-boot on STM32F427
The first step in a board bringup is to run the bootloader. u-boot is the dominant SSBL (2nd stage bootloader) for Linux (FSBL--the 1st stage bootloader--is the on-chip ROM code). The Emcraft STM32F SoM comes pre-loaded with u-boot and linux, but since I plan to JTAG debug the code to really understand what is going on, I cloned
Emcraft u-boot source, which was forked off u-boot mainline about 6 years ago. Unmodified, the u-boot.bin image can be built with the following steps. But first I need a toolchain. Buildroot can create a customized toolchain (for example with hardware floating point turned on), but for now it is convenient to use a prebuilt Code Sourcery toolchain downloaded from Emcraft.
Building u-boot for the stm32f-som board
To cross compile using the code sourcery toolchain, I need to set a few environment variables (like CROSS_COMPILE) and put the binaries in the path. Emcraft simplified the setup by assuming that I will symlink the toolchain to the released source folder's (linux-cortexm-1.14.2 currently) tools/ folder, with a script called ACTIVATE.sh that sets up the following environment variables:
export PATH=$TOOLS_PATH/bin:$CROSS_PATH:$PATH
export CROSS_COMPILE=arm-uclinuxeabi-
export CROSS_COMPILE_APPS=arm-uclinuxeabi-
export MCU=STM32F7
To configure u-boot build for a given board:
~band/uClinux/u-boot$ make stm32f7-som_config
This produces 2 makefiles (include/autoconf.mk and include/autoconf.mk.dep) from the board configuration file in include/configs/stm32f7-som.h. This adds the // to CPPFLAGS (through PLATFORM_CPPFLAGS created in cpu/arm_cortexm3/config.mk).
The board specific codes are in board/emcraft/stm32f-som folder.
~/band/uClinux/u-boot$ ls board/emcraft/stm32f7-som/
board.c board.o Makefile
Again, note that the CPU specific defines like CONFIG_MEM_NVM_BASE are supplied in the PLATFORM_CPPFLAGS template in board specific makefile cpu/arm_cortexm3/config.mk. The values are resolved at the compile time through the -D values supplied by autoconfig.mk (product of make <board>_config step mentioned above) as shown here:
~band/uClinux/u-boot$ make
...
arm-uclinuxeabi-gcc -g -Os -g2 -mthumb -mcpu=cortex-m3 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8 -D__KERNEL__ -I/mnt/work/band/uClinux/u-boot/include -fno-builtin -ffreestanding -isystem /mnt/work/band/uClinux/arm-2010q1/bin/../lib/gcc/arm-uclinuxeabi/4.4.1/include -pipe -DCONFIG_ARM -D__ARM__ -DCONFIG_MEM_NVM_BASE="0x08000000" -DCONFIG_MEM_NVM_LEN="(1024 * 1024 * 1)" -DCONFIG_MEM_NVM_UBOOT_OFF="0x0" -DCONFIG_MEM_RAM_BASE="0x20000000" -DCONFIG_MEM_RAM_LEN="(20 * 1024)" -DCONFIG_MEM_RAM_BUF_LEN="(88 * 1024)" -DCONFIG_MEM_MALLOC_LEN="(16 * 1024)" -DCONFIG_MEM_STACK_LEN="(4 * 1024)" -I/mnt/work/band/uClinux/u-boot/cpu/arm_cortexm3 -Wall -Wstrict-prototypes -fno-stack-protector -o board.o board.c -c
I noticed that Emcraft has removed -nostdinc from CPPFLAGS. There seems to be no need for the extra features in Cortex M4/M7, so targeting M3 is OK. We don't even use the "-thumb2" option. Also note the 1 MB on-chip flash memory is at 0x0800_0000 rather than the customary 0x0, as dictated by the STM32F7's memory map (from the device datasheet):
U-boot is modular, and the build creates a library for each modules. The libraries given to the linker are the u-boot submodules, many of them consisting of platform specific sources:
- libcommon: cmd_bdinfo.o cmd_boot.o cmd_bootm.o cmd_flash.o cmd_help.o cmd_load.o cmd_mem.o cmd_net.o cmd_nvedit.o cmd_pcmcia.o cmd_version.o command.o console.o dlmalloc.o env_common.o env_envm.o exports.o flash.o image.o lcd.o main.o memsize.o s_record.o stdio.o xyzModem.o
- libgeneric: CRC. display, div64, gunzip, lmb, ldiv, net_utils, string, strmhz, time, vsprintf, zlib
- libarm_cortexm3: cpu, cmd_cptf, timer
- libstm32: clock, cpu, envm, wdt, fsmc, soc
- libarm: board, bootm, cache, cache-cp15, interrupts, reset
- libnet: bootp, eth, net, rarp, tftp, stm32_eth?
- libgpio: stm32f2_gpio
- libmtd: cfi_flash
- libpcmcia rpx_pcmcia tqm8xx_pcmcia
- librtc: date
- libserial: stm32_usart
- libdisk: part
- libusb_phy: twl4030
- libvideo: stm32f4_lcdfb videomodes
In the above list, omitted empty libraries; u-boot build seems to create libraries for ALL possible submodules, and let the library be empty if the feature is not selected.
Finally, cross ld links the startup code cpu/arm_cortexm3/start.o is linked with the above architecture and SoC specific libraries and architecture-independent code into the final executable u-boot:
arm-uclinuxeabi-ld -Bstatic -T u-boot.lds $UNDEF_SYM cpu/arm_cortexm3/start.o --start-group cpu/arm_cortexm3/libarm_cortexm3.a cpu/arm_cortexm3/stm32/libstm32.a lib_arm/libarm.a board/emcraft/stm32f7-som/libstm32f7-som.a ... --end-group -L /mnt/work/band/uClinux/arm-2010q1/bin/../lib/gcc/arm-uclinuxeabi/4.4.1/thumb2 -lgcc -Map u-boot.map -o u-boot
The start/end-group semantics relieves us from keeping track of the object/library order (else you may get an unresolved symbol even though you have the full list of objects).
The linker script used above (u-boot.lds) is generated from the template cpu/arm_cortexm3/u-boot.lds linker script, filled with the variable values resolved by the CPP (through a helper header file include/u-boot/u-boot.lds.h), as you can see below (note the -E option, which stops gcc right after the preprocess stage):
arm-uclinuxeabi-gcc -E [whole bunch of CPP options] -include /mnt/work/band/uClinux/u-boot/include/u-boot/u-boot.lds.h - </mnt/work/band/uClinux/u-boot/cpu/arm_cortexm3/u-boot.lds > u-boot.lds
The u-boot.lds.h is NOT a template file (in that there are no variables to fill); we just needed SOME header file to feed to "gcc -E". The value of linker script variables are defined in board specific header file include/configs/stm32f7-som.h, as in this example:
#define CONFIG_MEM_NVM_BASE 0x08000000
The resulting linker script just defines the NVM, RAM, MALLOC (heap) and STACK sections. The final executable u-boot.bin itself is just an objcopy'ed binary of u-boot that can be stored in the on-chip flash of the uC.
arm-uclinuxeabi-objcopy --gap-fill=0xff -O binary u-boot u-boot.bin
The resulting u-boot.bin is 100 KB, which is tiny if you are used to targets capable of full-blown embedded Linux, but huge if you are used to writing a bare metal C deeply embedded FW.
The volatile memory sections in RAM are mapped across the physical device boundary, as you can see below:
Note that the stack begins where the heap ends; the _start() reset vector calls it
"_armboot_start", and sets it to _mem_stack_base defined in cpu/arm_cortexm3/u-boot.lds. The board config needs to tell the heap size u-boot, as stm32f7-som.h has done:
#define CONFIG_SYS_MALLOC_LEN CONFIG_MEM_MALLOC_LEN
The rest of RAM holds global/static data (modifiable variables); during C runtime initialization (in _start() reset vector), data is memcpy'ed from NVM (&_data_lma_start) to RAM (&_data_start), because the ".data" section in the cpu/arm_cortexm3/u-boot.lds declares both the load address and the target address this way:
.data :
{
_data_start = .;
_data_lma_start = LOADADDR(.data);
*(.data)
. = ALIGN(4);
#if ! (defined(CONFIG_MEM_RAMCODE_BASE) && defined(CONFIG_MEM_RAMCODE_LEN))
*(.ramcode)
#endif
_data_end = .;
} >RAM AT>NVM
Because of the ordering in the linker script, ".data" section comes before the ".bss" section. There is a variable "monitor_flash_len", which is the number of bytes between the end of the ".bss" and _armboot_start (where the heap ends); what is it used for??
If the code has CONFIG_MEM_RAMCODE_BASE, then the ".ramcode" will be memcpy'ed to RAM (at the address CONFIG_MEM_RAMCODE_BASE) as well. But this BSP (stm32f7-som) does NOT use RAMCODE.
Next, to understand the board specific code better, let's use the Eclipse CDT indexer.
Browsing u-boot code in Eclipse CDT
I studied Linux kernel and device drivers partly by browsing the code in Eclipse CDT, letting the CDT do the heavy lifting of searching and indexing the code. I could just setup a C Makefile project using the host GNU toolchain to analyze the code, but I was curious if I could get a better result using the cross arm toolchain. If I have a successfully built Buildroot rootfs, I could register the toolchain used during that build with Eclipse (AKA Buildroot Eclipse integration feature). But until then, I can get by with a generic GNU ARM Eclipse cross toolchain, which can be obtained in Eclipse menu --> Help --> Install new software --> Add, and then specifying the plugin URI (http://gnuarmeclipse.sourceforge.net/updates) as shown below:
I have a Segger J-Link ARM debugger (~ $400), so I check that package, but other than the cross compiler and the STM32Fx support, other plugins are optional. The cross compiler plugin allows me to choose the "Cross ARM GCC" toolchain while creating an out-of-tree (to avoid polluting the u-boot repository with the Eclipse project file) C makefile project for u-boot, as shown below:
For some reason, the cross ARM GCC plugin does not find the binaries for the toolchain (or maybe the plugin does NOT include the binaries in the first place?), so I just give specify the Code Sourcery toolchain downloaded from Emcraft, as shown below.
The key to creating an out-of-tree project is to add the (actual) source folder(s) as linked folders by right clicking on the project --> New --> Folder --> Advanced --> Link to alternate location, as shown in the example below
In the resource filters, I often specify a recursive file exclude filter for "*test*. Since a multi-platform project like u-boot has many sources that are irrelevant for my platform, I hide them from the indexer by right-clicking on those folders --> Resource configurations --> Exclude from build. The cpu/ and board/ folders (reorganized to arch/ folder in the u-boot mainline) contain everything u-boot supports (MANY!), so it is essential to exclude all folders under those EXCEPT my platform, for the indexing to be of value. With these care, the indexer will run through the code in < 10 seconds, which is almost 2 orders of magnitude faster than indexing the latest Linux kernel tree, so I can appreciate the complexity of the Linux kernel. After indexing completes, I can start reading the code from the PoR reset vector defined in the .vectors section of the ROM (placed at 0x08000000 by the u-boot.lds linker script discussed above) in cpu/arm_cortexm3/start.c:
unsigned int vectors[] __attribute__((section(".vectors"))) = {
[0] = (unsigned long)&_mem_stack_end,
[1] = (unsigned int)&_start,
[2 ... 165] = (unsigned int)&default_isr
};
The 2nd entry is the PoR vector, defined right below this simple vector table, and the default_isr is just an while(1) loop--which hangs the uC. Unless more vectors are registered later, u-boot will not handle interrupt. _start() is a C function in cpu/arm_cortexm3/start.c that sets up the most basic C environment (copies .data and .ramcode, zeros .bss) before yielding to lib_arm/board.c:start_armboot(). start_armboot() runs the init functions hard coded in init_sequence array, as you can see:
for (init_fnc_ptr = init_sequence; *init_fnc_ptr; ++init_fnc_ptr) {
if ((*init_fnc_ptr)() != 0) {
hang ();
}
}
This array is also in lib_arm/board.c, and does NOT use weak reference, so the only thing a board can do is use the #define to control what to run/not. I show the elements that stm32f7-som does NOT define in a gray font.
init_fnc_t *init_sequence[] = {
#if defined(CONFIG_ARCH_CPU_INIT)
arch_cpu_init, /* basic arch cpu dependent setup */
#endif
board_init, /* basic board dependent setup */
#if defined(CONFIG_USE_IRQ)
interrupt_init, /* set up exceptions */
#endif
#if !defined(CONFIG_ARCH_CPU_INIT)
/*
* `arch_cpu_init` always calls `timer_init`,
* no need to call it twice
*/
timer_init, /* initialize timer */
#endif
#ifdef CONFIG_FSL_ESDHC
get_clocks,
#endif
env_init, /* initialize environment */
init_baudrate, /* initialze baudrate settings */
serial_init, /* serial communications setup */
console_init_f, /* stage 1 init of console */
display_banner, /* say that we are here */
#if defined(CONFIG_DISPLAY_CPUINFO)
print_cpuinfo, /* display cpu info (and speed) */
#endif
#if defined(CONFIG_DISPLAY_BOARDINFO)
checkboard, /* display board info */
#endif
#if defined(CONFIG_HARD_I2C) || defined(CONFIG_SOFT_I2C)
init_func_i2c,
#endif
dram_init, /* configure available RAM banks */
#if defined(CONFIG_CMD_PCI) || defined (CONFIG_PCI)
arm_pci_init,
#endif
display_dram_config,
NULL,
};
I am not sure why nand_init(), onenand_init(), AT91F_DataflashInit() are not handled in the above list. But stdio and console (console is inited in 2 stages) are inited next--as soon as possible after the above "minimal" components are ready.
Next, even though u-boot design principle claimed u-boot is single threaded, interrupts ARE enabled. U-Boot does NOT blindly load Linux kernel; it merely sets loads the environment variable "loadaddr" and "bootfile", and spins in a background loop, waiting for console command:
for (;;) {
main_loop ();
}
In the common/main.c:main_loop(), if bootcmd is already defined, the boot command can run right away:
s = getenv ("bootcmd");
if (bootdelay >= 0 && s && !abortboot (bootdelay)) {
... run_command (s, 0);
}
Building a GNU toolchain for ARM Cortex M7 with Buildroot
The Code Sourcery toolchain downloaded from Emcraft website is for M3 and 5 years old. I plan to build a lot of software for the target (like the kernel, the libc, and Qt).
Buildroot is a nice framework to consolidate all those configuration and build activities in 1 place. The only question is whether Buildroot can handle MMU-less configurations, and I aim to find out in this blog entry.
I first download the
latest Buildroot:
~/band/uClinux$ git clone git://git.buildroot.net/buildroot
~/band/uClinux$ cd buildroot
~/band/uClinux/buildroot$ git checkout 2015.11
I actually tried the top of the tree, but there is a problem with noMMU case there, so I am just using the 2015.11 for now.
Buildroot stores the several reference board configs--such as the Zedboard that I have been playing around with for the past 5 years--in the config/ folder, which does not contain any reference Cortex M boards. So I just copied the Zedboard config as the Emcraft eval board I bought, and start modifying.
~band/uClinux/buildroot/configs$ cp zedboard_defconfig emcraft_defconfig
~band/uClinux/buildroot/configs$ cd ..
~band/uClinux/buildroot$ make emcraft_defconfig
~band/uClinux/buildroot$ make xconfig
Currently (December 2015), only Cortex M3 is supported, so let's try this option.
Without an MMU support, shared (AKA dynamic) libraries are impossible, which has far reaching consequences such as precluding glibc--in turn a prerequisite for many of the packages I am interested in: mesa3D, Qt5. Emcraft has ported Qt4 widgets modules to uClinux, and I will first try to understand their u-boot, uClinux, and rootfs first before looking into whether porting Qt5 to uClinux is desirable.
Add Cortex M7 Buildroot architecture
I patched arch/Config.in.arm this way:
@@ -156,6 +159,13 @@ config BR2_cortex_m3
bool "cortex-M3"
select BR2_ARM_CPU_HAS_THUMB
select BR2_ARM_CPU_HAS_THUMB2
+config BR2_cortex_m4
+ bool "cortex-M4"
+ select BR2_ARM_CPU_HAS_THUMB
+ select BR2_ARM_CPU_HAS_VFPV4
+ select BR2_ARM_CPU_HAS_THUMB2
config BR2_fa526
bool "fa526/626"
select BR2_ARM_CPU_HAS_ARM
@@ -426,6 +436,7 @@ config BR2_GCC_TARGET_CPU
default "cortex-a12" if BR2_cortex_a12
default "cortex-a15" if BR2_cortex_a15
default "cortex-m3" if BR2_cortex_m3
+ default "cortex-m4" if BR2_cortex_m4
default "fa526" if BR2_fa526
default "marvell-pj4" if BR2_pj4
default "strongarm" if BR2_strongarm
The main idea with this mod is that Cortex M4 is similar to Cortex M3, except for the hardware floating point addition. While Cortex M4 implements only FPV4 (according to
the ARM reference manual),
Cortex M7 implements the FPV5, which gcc currently does not seem to support.
I stay away from fancy features like the DSP (which is kind of like Neon, but I don't know how much).
A patch for using the LDMIA correctly on Cortex M
Buildroot was apparently forcing ARM mode for Cortex M (and consequently using LDMIA instruction inappropriately for thumb2; the constraint is explained on the ARM reference, but copied here for convenience "If an LDM instruction is used to load the PC, ensure that bit 0 of the loaded value is set to 1; otherwise a fault exception occurs. For both the STM and LDM instructions, the stack pointer should not be included in the register list. Also be aware that if you have an LDM instruction that has the LR in the register list, you cannot include the PC in the list."), requiring
a recently published patch to work around the problem.
Minimal Buildroot config for Cortex M4 development
Here's my Buildroot defconfig that makes use of the newly defined Cortex M4 to build just the toolchain:
BR2_arm=y
BR2_cortex_m4=y
BR2_ARM_FPU_VFPV4=y
BR2_ENABLE_DEBUG=y
BR2_TOOLCHAIN_BUILDROOT_LOCALE=y
# BR2_UCLIBC_INSTALL_UTILS is not set
BR2_BINUTILS_VERSION_2_25_X=y
BR2_GCC_VERSION_5_X=y
BR2_TOOLCHAIN_BUILDROOT_CXX=y
BR2_PACKAGE_HOST_ELF2FLT=y
BR2_PACKAGE_HOST_GDB=y
BR2_PACKAGE_HOST_GDB_TUI=y
BR2_PACKAGE_HOST_GDB_PYTHON=y
BR2_GDB_VERSION_7_10=y
BR2_ENABLE_LOCALE_PURGE=y
BR2_ENABLE_LOCALE_WHITELIST="en_US"
BR2_ECLIPSE_REGISTER=y
BR2_TARGET_GENERIC_HOSTNAME="uClinux"
BR2_TARGET_GENERIC_ISSUE="Welcome!"
BR2_ROOTFS_DEVICE_CREATION_STATIC=y
BR2_TARGET_GENERIC_ROOT_PASSWD="****"
BR2_TARGET_GENERIC_GETTY_PORT="ttyPS0"
BR2_PACKAGE_BUSYBOX_SHOW_OTHERS=y
BR2_TARGET_ROOTFS_CPIO=y
BR2_TARGET_ROOTFS_CPIO_GZIP=y
BR2_TARGET_ROOTFS_CPIO_UIMAGE=y
The BR2_ECLIPSE_REGISTER option registers the toolchain to Eclipse, so that Eclipse will present it as a toolchain choice during project creation (even for the Makefile project). Note that I did not even enable the kernel or the bootloader yet. Although not captured above (why??), I've turned on the following toolchain features:
- EABIhf (hardware floating point)
- Do NOT strip symbols (useful when debugging libraries like Qt later)
- NPTL (Native POSIX threading)
- Enable compiler tls (thread-local-storage) support
I also wanted the following features, but could not get past build errors:
- No separate code and data (BR2_BINFMT_FLAT_SEP_DATA) => have to keep the code and data in the same section, to avoid "unsupported compiler option" related to keeping the code and data in different sections.
After the build, the toolchain is in <BR2>/output/host/usr/bin.
~/band/uClinux/BRm4$ ls output/host/usr/bin/
2to3 arm-linux-gcc-nm
aclocal arm-linux-gcc-ranlib
aclocal-1.15 arm-linux-gcov
arm-buildroot-uclinux-uclibcgnueabihf-addr2line arm-linux-gcov-tool
...
The long preamble before the short toolname (like addr2line above) is what the CROSS_COMPILE environment should be set to when cross-compiling using this toolchain. I already built U-Boot with the Code Sourcery toolchain above, but let's rebuild it with this toolchain.
Modifying Emcraft's u-boot for the Buildroot GCC toolchain
Unlike for the Linux kernel, Buildroot does not provide an option to build a local u-boot folder (only git repositories or tarballs), so I will build my local clone of the Emcraft's u-boot using the Emcraft makefile rather than Buildroot for now. First, I make a sym link in the Emcraft's tools/ folder:
~/band/uClinux/linux-cortexm-1.14.2/tools$ ln -s /mnt/work/band/uClinux/BRm4/output/host/usr br2
Then I copy the existing ACTIVATE.sh script to modify for the Buildroot toolchain:
TOOLCHAIN=br2
export INSTALL_ROOT=`pwd`
TOOLS_PATH=$INSTALL_ROOT/tools
CROSS_PATH=$TOOLS_PATH/$TOOLCHAIN/bin
export PATH=$TOOLS_PATH/bin:$CROSS_PATH:$PATH
# Path to cross-tools
export CROSS_COMPILE=arm-buildroot-uclinux-uclibcgnueabihf-
export CROSS_COMPILE_APPS=arm-buildroot-uclinux-uclibcgnueabihf-
# Define the MCU architecture
export MCU=STM32F7
Note that the shortcut softlinks "arm-linux-" does NOT work.
I can invoke this script, to set up the environment.
henry@w540:~/band/uClinux/linux-cortexm-1.14.2$ . ./br2.sh
Before running the same make commands as before, I want to avoid stripping out the debugging symbols.
Including debugging symbols into u-boot
I confirmed that the example/standalone/hello_world has debugging symbols, but u-boot does not. Running the following command shows that debug symbol has been stripped from the final ELF.
$ BRm4/output/host/usr/bin/arm-linux-readelf --debug-dump u-boot
This is because the linker script is dumping all other sections than those explicitly accounted for in the script:
SECTIONS
{
...
} >STACK
The compile effort shows that the Emcraft u-boot needs to be modified slightly for the latest gcc: the functions pointed to by the weakly aliased functions cannot be inlined any more, so for example, I removed "inline" in the following function:
void inline __show_boot_progress (int val) {}
void show_boot_progress (int val) __attribute__((weak, alias("__show_boot_progress")));
The other problem is that nonconstant expression is no longer allowed for linker script memory section origin, as in cpu/arm_cortexm3/u-boot.lds. So I did the addition in stm32f7-som.h:
#define CONFIG_MEM_RAM_BUF_BASE (CONFIG_MEM_RAM_BASE + CONFIG_MEM_RAM_LEN)
#define CONFIG_MEM_MALLOC_BASE (CONFIG_MEM_RAM_BUF_BASE + \
CONFIG_MEM_RAM_BUF_LEN)
#define CONFIG_MEM_STACK_BASE (CONFIG_MEM_MALLOC_BASE + CONFIG_MEM_MALLOC_LEN)
These defines are exported to make through cpu/arm_cortexm3/config.mk:
PLATFORM_CPPFLAGS += -DCONFIG_MEM_RAM_BASE=$(CONFIG_MEM_RAM_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_RAM_BUF_BASE=$(CONFIG_MEM_RAM_BUF_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_MALLOC_BASE=$(CONFIG_MEM_MALLOC_BASE)
PLATFORM_CPPFLAGS += -DCONFIG_MEM_STACK_BASE=$(CONFIG_MEM_STACK_BASE)
Finally, the linker script cpu/arm_cortexm3/u-boot.lds can make use of these defines
MEMORY
{
NVM (r): ORIGIN = CONFIG_MEM_NVM_BASE, LENGTH = NVM_LEN
RAM (rw): ORIGIN = CONFIG_MEM_RAM_BASE,
LENGTH = CONFIG_MEM_RAM_LEN
RAM_BUF (r): ORIGIN = CONFIG_MEM_RAM_BUF_BASE, /* <board>.h */
LENGTH = CONFIG_MEM_RAM_BUF_LEN
MALLOC (r): ORIGIN = CONFIG_MEM_MALLOC_BASE, /* <board>.h */
LENGTH = CONFIG_MEM_MALLOC_LEN
STACK (r): ORIGIN = CONFIG_MEM_STACK_BASE, /* <board>.h */
LENGTH = CONFIG_MEM_STACK_LEN
#if defined(CONFIG_MEM_RAMCODE_BASE) && defined(CONFIG_MEM_RAMCODE_LEN)
RAMCODE (rw): ORIGIN = CONFIG_MEM_RAMCODE_BASE, \
LENGTH = CONFIG_MEM_RAMCODE_LEN
#endif
}
SECTIONS
{
.vectors CONFIG_MEM_NVM_BASE :
...
Note that the resulting linker script merely has the symbols substituted with values; it did NOT perform the math itself--but this is apparently OK!
MEMORY
{
NVM (r): ORIGIN = 0x08000000, LENGTH = ((1024 * 1024 * 1) - 0x0)
RAM (rw): ORIGIN = 0x20000000,
LENGTH = (20 * 1024)
RAM_BUF (r): ORIGIN = (0x20000000 + (20 * 1024)),
LENGTH = (88 * 1024)
MALLOC (r): ORIGIN = ((0x20000000 + (20 * 1024)) + (88 * 1024)),
LENGTH = (16 * 1024)
STACK (r): ORIGIN = (((0x20000000 + (20 * 1024)) + (88 * 1024)) + (16 * 1024)),
LENGTH = (4 * 1024)
}
I also changed the target CPU to cortex-m4 (Buildroot compiled cross gcc 5.x did not support -mcpu=cortex-m7):
PLATFORM_RELFLAGS += -g2 -mthumb -mcpu=cortex-m4 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8
The newly built u-boot is 633 KB (more than 6x original size) due to the debug symbols, but u-boot.bin is still the same size. Loading up this in gdb lets you view functions by name. Will this successfully run on the target? Let's find out within an ICD (in-circuit debugger).
JTAG Debugging Emcraft U-Boot in gdb
The STM32F7-SOM starter kit does NOT include an SWD debugger HW. But I prefer to learn code by setting breakpoints (in HW, if possible), so skip this section if you don't want to mess with HW debugger.
Using the Segger J-Link
I already have a Segger J-Link (rougly $400 before tax and shipping), so I tried to connect the JLinkGDBServer (64-bit DEB installer downloaded from the
Segger website) to the target, like this:
henry@w540:~$ JLinkGDBServer -device STM32F746NG
...
Listening on TCP/IP port 2331
Connecting to target...Cache: Separate I- and D-cache.
J-Link found 2 JTAG devices, Total IRLen = 9
JTAG ID: 0x5BA00477 (Cortex-M4)
Connected to target
Waiting for GDB connection...
Before I could start the cross gdb, I had to install libexpat1:i386 (because emcraft toolchain is 32-bit)
$ sudo apt-get install libexpat1:i386
Then I pointed cross-gdb (-tui not supported) at the u-boot ELF file:
$ arm-uclinuxeabi-gdb u-boot
Within gdb, I could connect to the JLinkGDBServer:
(gdb) target remote localhost:2331
And then I reset the target and load the ELF file
(gdb) mo reset
(gdb) load
#0 0x08000374 in _start ()
Then I can single-step through code. While the Segger J-Link is nice, ST-Link is an order of magnitude cheaper, so I looked into it.
Using the ST-Link through openocd (0.9.0) on Ubuntu
ST only supports Windows, so I tried openocd--but with a couple of patches from Emcraft support. After unzipping openocd-0.9.0, I applied the 2 patches before auto configuring and building it
~/band/uC/openocd-0.9.0$ patch -p3 < ../../uClinux/emcraft/openocd-0.9.0-stmf7-p1.patch
~/band/uC/openocd-0.9.0$ patch -p3 < ../../uClinux/emcraft/openocd-0.9.0-stmf7-p2.patch
~/band/uC/openocd-0.9.0$ ./configure; make; sudo make install
Then you point openocd to a configuration file ("-f" argument):
source [find ../scripts/interface/stlink-v2.cfg]
set WORKAREASIZE 0x40000
set CHIPNAME STM32F756
source [find target/stm32f7x.cfg]
adapter_khz 150
#reset_config srst_only
$_TARGETNAME configure -event gdb-attach {
echo "Halting target"
halt
}
Then the openocd is connects to the target through JTAG, and listens for telnet connection on port 4444. On Ubuntu, "sudo" is required to run openocd because of access to the HW.
/mnt/work/band/uClinux/emcraft$ sudo openocd -f olimex-arm-usb-ocd-h-stm32f7.cfg
The patched configuration files are in /usr/local/share/openocd/scripts. While command line driven debugging is possible, graphical debugging is more productive/
Debugging u-boot in Eclipse through ST-Link v2
If possible, it is convenient to launch openocd directly from Eclipse. But because of the "sudo" requirement, I started openocd outside Eclipse with the command given in the previous section, and specified explicit path to the cross-gdb executable, leaving all other options at default. This works because the default GDB port for openocd is 3333, and Eclipse uses the same port to connect from the gdb to openocd.
After that, the Eclipse debugger front end works the same as any other target. Iike to use the HW function breakpoint, available from the inverted triangle at the upper-right corner of the Breakpoints tab shown below.
As explained before, _start is the PoR vector for u-boot on ARM.
Debugging u-boot in Eclipse through Segger J-Link
JLink Eclipse debugging works similarly as the ST-Link
Single-stepping the Emcraft u-boot built with the Buildroot toolchain
After downloading u-boot.bin to the target over JTAG, I saw that the target trips a hard fault and reset. These are the notes I took while stepping through u-boot init functions in Eclipse debugger to learn about the FW. U-Boot would print more to the console if DEBUG is defined, but it seems to be undefined in individual C files.
- init_fnc array
- arch_cpu_init(): architecture (ARM) specific CPU (STM32F7) init
- prepare for external flash access
- Unlock flash command register: Write the 2 hard coded key values to the KEYR
- Set flash parallelism size to x32 (requires Vdd > 2.7 V)
- Initialize the systick timer
- Initialize clock
- After reset, CPU clock is 16 MHz, and its flash wait state is 0.
- "gd" is the pointer to the global data struct, holding (among others) a board-specific struct pointer "bd".
- STM32F7-SOM runs off external 12 MHz oscillator, and achieves 200 MHz (if over-drive is turned on) with PLL
- systick load register set to 0xFFFFFF - 1 (defined in <>/include/asm-arm/arch-cortexm3/hardware.h
- STM32F7 chips can over-drive to reach 200 MHz. But the over-drive mode is not available when 1.8 V < Vdd < 2.1 V--which is the typically used voltage when running off battery. Since enable_over_drive() spins on an infinite loop waiting for ODRDY and ODSWRDY, it means either that:
- These ready bits are set no matter what when running on 1.8 V, or
- Vdd is set briefly > 2.1 V even when running on battery
- Vdd is 1.2 V when running on battery
- FW spins on an infinite loop when changing the clock source (clock.c: clock_setup())
- ART Accelerator is just an instruction cache to hide the flash wait states.
- cortex_m3_soc_init(): MPU and cache are enabled. Cortex MPU regions can be overlapped, and the policy resolved using the region number (higher priority wins). Emcraft used these regions:
enum mpu_rgn {
MPU_RGN_4GB= 0, /* 4GB address space region */
MPU_RGN_SDRAM_CA, /* Cacheable SDRAM */
MPU_RGN_SDRAM_NC, /* Non-cacheable SDRAM */
MPU_RGN_ENVM_CA, /* Cacheable eNVM */
MPU_RGN_MAX= 15 /* RBAR[REGION] bits; actually - less */
};
Q: why is there no region for the external flash?
The attributes for each region are:
- 4 GB background:
- Access permission: 3, meaning full access (RW for both privileged and unprivileged)
- TEX=0, S=1, C=0, B=0 => strongly ordered, shareable
- all sub-regions are disabled
- size = 31, meaning 4 GB
- enable = 0 at first
- 32 MB SDRAM
- base address: 0xC000000
- Access permission: 3 (full)
- TEX=1, S=0, C=1, B=1 => normal, non-shareable, cached, buffered (see TEX, C, B, anad S encoding table). This is possible because STM32F7 has the D/I cache.
- all sub-regions disabled
- size = 24 => N=25, so 2^N = 32 MB
- enable
- 1 MB DMA SRAM
- base address: 0xC0000000 + 16 MB (half the size of the SDRAM) - 1 MB
- Access permission: 3 (full)
- TEX=1, S=0, C=0, B=0 => nonrmal, non-shareable, non cached or buffered.
- all sub-regions disabled
- size = 19 => 1 MB
- enable
- 1 MB internal flash
- base address = 0x08000000
- access permission: 3 (full)
- if device memory type, TEX=0, S=0, C=0, B=1 => strongly ordered, shareable; otherwise, TEX=1, S=0, C=1, B=1 => normal, not shareable, cached, buffered.
- all sub-regions disabled
- size = 19 => 1 MB
- enable
- MPU is finally enabled when MPU_CTRL[1:0] are asserted. Bit 1 enables MPU even in hard fault/NMI handler. Follow up with DSB and ISB.
- stm32f7_enable_cache(): follows closely the STM32F7 programming manual Cache Maintenance Operations section
- Invalidate I cache: write 0 to ICIALLU (0xE000EF50)
- Invalidate D cache:
- select D-cache CCSELR = 0
- DSB
- Find the number of cache ways and sets from CCSIDR:
- # sets = CCSIDR[27:13]
- associativity = CCSIDR[12:3]
- # lines =CCSIDR[2:0]
- For each set and way combination, write 0 to DCISW[0] to invalidate
- Assert CCR (system control and configuration) bits DC and IC bits to turn on L1 cache.
- board_init()
- The biggest board specific component is the external flash and SDRAM. TODO: correlate the Emcraft code against the NOR flash and SDRAM datasheet.
- FMC (flexible memory controller--which is different from FSMC: flexible static memory controller) selects among NOR/PSRAM, NAND, and SDRAM with chip select
- GPIO pins routed to the memory device must be configured for that role (enum stm32f2_gpio_role:STM32F2_GPIO_ROLE_FMC)
- FMC is organized into individually selectable (up to) 64 MB banks, whose timings and wait states are programmed through separate registers.
- By compariong the GPIO pins routed to FMC functions in ext_ram_fsmc_fmc_gpio array against uC STM32F746NGH6 datasheet Table 11: FMC pin definition, I realize that the NOR flash and the SDRAM are both 16 bit data, and 23 bit address device.
- env_init
- init_baudrate: default is CONFIG_BAUDRATE = 115200 defined in stm32f7-som.h
- serial_init: board config's CONFIG_STM32_USART_PORT counts from 1 when mapping to the USART peripheral base register address.
- 1 start bit, 8 data bits, 1 stop bit
- overrun detection diabled
- console_init_f (called before relocation) only flips the gd->have_console flag?
- display_banner
- print_cpuinfo
- checkboard: just print the board rev
- dram_init: TODO: cross reference against RM0385 chapter 13 and the device datasheet. Since the board has only 1 SDRAM (on bank1), all commands have the CTB1 bit set.
- start clock, udelay(200), and wait for !FMC_SDSR[BUSY] get deasserted. udelay is a busy loop (but one that avoids tripping watchdog)
- precharge, udelay(100), and wait for !FMC_SDSR[BUSY]
- 7 cycles (why?) auto-refresh, 100 usec udelay, and wait for !FMC_SDSR[BUSY]
- Load BL and CAS to the SDRAM through the "load mode register" command, udelay(100), and wait for !FMC_SDSR[BUSY]
- Set to normal mode, and wait for !FMC_SDSR[BUSY]
- Write refresh timer value
- display_dram_config
- mem_malloc_init
- flash_init: the only flash is at 0x6000000 (each subsequent NOR flash banks are at 0x4000000 from there)
- flash init (drivers/mtd/cfi_flash.c) uses the industry standard (either Intel or AMD) for querying the MTD device. This code gets pulled in becuase stm32f7-som.h defines CONFIG_SYS_FLASH_CFI and CONFIG_FLASH_CFI_DRIVER.
- DRAM is stopped (put into self-refresh mode) while writing to flash, and then goes through precharge before returning to normal mode.
- hard fault (exception number 3 in xPSR[8:0]) seems to trip in flash_init() --> flash_get_size(): info->ext_addr = le16_to_cpu(qry.p_adr); when sp = 0x2001ff28. In the mixed view, it looks like:
1773 info->vendor = le16_to_cpu(qry.p_id);
08004066: ldrb.w r2, [sp, #35] ; 0x23
0800406a: ldrb.w r0, [sp, #36] ; 0x24
1774 info->ext_addr = le16_to_cpu(qry.p_adr);
0800406e: ldrh.w r3, [sp, #37] ; 0x25
So why is ldrb.w OK but not ldrh.w? When deadling with a half-word (16-bit), the address CANNOT be odd (#37): unaligned access! This was caused because gcc 4.7 made the hardware support unaligned the default. This bug was already
patched in the mainline a long time ago, and I the hard fault disappeared when I picked up the patch AND added "no-unaligned-access" directive to gcc in cpu/arm_cortexm3/config.mk:
PLATFORM_RELFLAGS += -mthumb -mcpu=cortex-m4 -fsigned-char -O2 -fno-builtin-puts -fno-common -ffixed-r8 -mno-unaligned-access
After this fix, start_armboot() flies through the rest of the initialization:
- env_relocate: copy environment from text segment (internal flash) to the SRAM.
- stdio_init
- jumptable_init: global data struct's "jt" points to the jump table
- console_init_r
- enable_interrupts: does not actually do anything on this board?
- eth_initialize: finally calls checkpoint (value 64)
- infinite main_loop: displays the familiar "Hit any key to stop autoboot" prompt.
- Within the mainloop, various commands are handled. One of them is the "bootm" command, which loads the kernel into RAM (memmove_wd, which just calls memmove; this means that after the memory devices--no matter the kind--are setup, ARM just copies a byte at a time)
- lmb_reserve
- stm32f7_cache_sync_range: why? And there was not a more generic support for cache invalidation?
- boot_fn() = boot_os[images.os.os] = do_bootm_linux
- Setup various tags (defined in the stm32f7-som.h: memory, cmdline, dmamem tag)
- announce_and_cleanup
- kernel_entry: literally the kernel's entry function. This is an example of the U-Boot's hand-over interface to the Linux kernel.
When
I looked into the Linux kernel startup (the kernel main() function) almost 2 years ago, I knew that Linux could not boot itself, but did not understand the u-boot to Linux handover.
initrd,txt (initial ramdisk) --> initial ramfs. One valid version of an initramfs buffer is thus a single .cpio.gz file. You can create a cpio archive that contains the early userspace image. Your cpio archive should be specified in CONFIG_INITRAMFS_SOURCE and it will be used directly. Or you can build the early userspace image directly from source (CONFIG_INITRAMFS_SOURCE; this does not mean *.[ch] files, but actually filesystem contents).
early-userspace/
If all required device and filesystem drivers are compiled into the kernel, no need for initrd. init/main.c:init() will call prepare_namespace() to mount the final root filesystem, based on the root= option and optional init= to run some other init binary than listed at the end of init/main.c:init(). Otherwise (initramfs), prepare_namespace() must be skipped. This means that [/init] must do all the work. To maintain backwards compatibility, the /init binary will only run if it comes via an initramfs cpio archive. If this is not the case, init/main.c:init() will run prepare_namespace() to mount the final root and exec one of the predefined init binaries.
u-boot to Linux interface
<>/include/asm-arm/u-boot.h:
typedef struct bd_info {
int bi_baudrate; /* serial console baudrate */
unsigned long bi_ip_addr; /* IP Address */
struct environment_s *bi_env;
ulong bi_arch_number; /* unique id for this board */
ulong bi_boot_params; /* where this board expects params */
struct /* RAM configuration */
{
ulong start;
ulong size;
} bi_dram[CONFIG_NR_DRAM_BANKS];
} bd_t;
The boot params are stored at the beginning of the external DRAM (0xC0000000 on this board), which is what bi_dram[0].start is pointing to. The kernel will reside at offset 0x8000 from there.
Using checkpoint to understand u-boot
I was overwhelmed by the large body of code in u-boot, and spent the next week looking for online documentation, until I read about
#define CONFIG_SHOW_BOOT_PROGRESS and checkpoints in the top level README. When I read in
u-boot design principle ("keep it debuggable") that u-boot is single threaded, I got the idea of setting a breakpoint in a centralized boot progress function, to understand u-boot in digestible chunks. I was worried about understanding the interrupt until I saw this in include/configs/stm32f7-som.h:
The various checkpoints mentioned in README are raised like this example:
show_boot_progress (65);
u-boot supplied a stub for this in common/main.c:
/*
* Board-specific Platform code can reimplement show_boot_progress () if needed
*/
void inline __show_boot_progress (int val) {}
void show_boot_progress (int val) __attribute__((weak, alias("__show_boot_progress")));
To set a hardware breakpoint and examine the boot progress number, I overrode the above weak function in board/emcraft/stm32f7-som/board.c:
void show_boot_progress(int val) {
val = val;
}
Emcraft's u-boot config stm32f7-som.h does NOT turn on CONFIG_SHOW_BOOT_PROGRESS, so I inserted it and rebuilt u-boot.bin. The init functions discussed in the previous section do NOT trip checkpoints, so it's good that I single-stepped through those function.
The first thing I noticed about the checkpoints are that the value does NOT increase strictly monotonically; these are just numbers to be referenced against documentation, and there was no promise of monotonicity in the first place.
Diffing the Emcraft's u-boot against vanilla 2.6.33 u-boot
One strategic place to start learning how u-boot runs on an MMU-less platform like Cortex M4/M7 is to understand Emcraft's port of u-boot--forked off from the mainline at v2.6.33. So I git cloned the mainline u-boot and checked out that version:
~band/uClinux/u-boot$ git checkout v2.6.33
I then created C makefile projects in Eclipse for the above branc and Emcraft's u-boot, and compared the 2 projects (highlight the 2 projects --> right click --> Compare --> Against each other).
By placing reset_cpu() in RAM, there is a possibility for self-upgrading u-boot (how?)
#ifdef CONFIG_ARMCORTEXM3_RAMCODE
__attribute__((section(".ramcode")))
__attribute__ ((long_call))
#endif
reset_cpu (ulong addr);
But perhaps self-upgrading u-boot is an advanced topic that can be deferred.
In addition to the u-boot image, SPIFILIB is added to u-boot.bin in Makefile:
ifeq ($(CONFIG_SYS_LPC18XX)$(CONFIG_SPIFI),yy)
ifeq ($(CONFIG_SPIFILIB_IN_ENVM),y)
SPIFILIB_DEP = cpu/arm_cortexm3/lpc18xx/spifilib/spifilib-envm.bin
else
SPIFILIB_DEP = cpu/arm_cortexm3/lpc18xx/spifilib/spifilib-dram.bin
endif
$(SPIFILIB_DEP): depend
$(MAKE) -C cpu/arm_cortexm3/lpc18xx/spifilib $(notdir $(SPIFILIB_DEP)) -f spifilib.mk
endif
$(obj)u-boot.bin: $(obj)u-boot $(SPIFILIB_DEP)
$(OBJCOPY) ${OBJCFLAGS} -O binary $< $@
But to get more out of this, I need to get my hands dirtier. Perhaps I should port the latest mainline U-Boot to the STM32F7-SOM board to learn more about U-Boot. Since this blog entry is getting long, I will start another one.