Nov 1, 2014

Understanding /dev/urandom

The Linux /dev/urandom pseudo file generates pseudo random bytes, as you can see in this example, where 4 random bytes is strung together to give an impression of a random ulong:

henry@Zotac64:~$ od -vAn -N4 -tx4 /dev/urandom
 9e4a1c93

The od command dumps file in octal and other formats "-N4" gets 4 bytes, and "-tx4" formats it as an 4 byte hex number.  strace shows that the system call used is read(), to grab 4 bytes:
...
open("/dev/urandom", O_RDONLY)          = 3
read(3, "zv_\22", 4)                    = 4
...

How does it actually work under the hood?  Firstly, it is a mem device, as you can see under /sys folder:

henry@Zotac64:~$ ls -o /sys/class/mem/urandom
... /sys/class/mem/urandom -> ../../devices/virtual/mem/urandom

It is a char mem device (major number 1), as you can see below:

henry@Zotac64:~$ ls -lh --time-style=+ /dev/*random
crw-rw-rw- 1 root root 1, 8  /dev/random
crw-rw-rw- 1 root root 1, 9  /dev/urandom

The memory devices are defined <kernel>/drivers/char/mem.c:

static const struct memdev {
const char *name;
umode_t mode;
const struct file_operations *fops;
struct backing_dev_info *dev_info;
} devlist[] = {
 [3] = { "null", 0666, &null_fops, NULL },
 [5] = { "zero", 0666, &zero_fops, &zero_bdi },
 [7] = { "full", 0666, &full_fops, NULL },
 [8] = { "random", 0666, &random_fops, NULL },
 [9] = { "urandom", 0666, &urandom_fops, NULL },
#ifdef CONFIG_PRINTK
[11] = { "kmsg", 0644, &kmsg_fops, NULL },
#endif
};

The random device driver source is found in <kernel>/drivers/char/random.c:

const struct file_operations urandom_fops = {
.read  = urandom_read,
.write = random_write,
.unlocked_ioctl = random_ioctl,
.fasync = random_fasync,
.llseek = noop_llseek,
};

Note that writing to /dev/urandom is the same as writing to /dev/random, which means that the random pool for random and urandom are same.  I could start diving into the code like I did for /dev/zero, but urandom code is clearly more complicated than /dev/zero, and besides, I can use trace-cmd show me the callgraph.

Detour: Buildroot change for trace-cmd

Firstly, ftrace must be enabled in the kernel config:
  • CONFIG_FTRACE=y
  • CONFIG_FUNCTION_TRACER=y and CONFIG_DYNAMIC_FTRACE=y: need both
  • CONFIG_FTRACE_SYSCALLS=y: I want to see the read system call
  • CONFIG_PSTORE_FTRACE (persistent function trace) may be valuable when debugging a hang, reset, and panic.  But this requires CONFIG_PSTORE (persistent RAM buffer store) platform driver.  But Zynq doesn't have such HW, so it looks like a wishful thinking.
I added the above 3 CONFIGs into <kernel>/arch/arm/configs/zynq_xcomm_adv7511_nfs_defconfig, which I've been using for my Zedboard dorking starting with the NFS RFS post.

Back in Buildroot, trace-cmd itself depends on: BR2_LARGEFILE && BR2_TOOLCHAIN_HAS_THREADS && !BR2_avr32 && BR2_USE_MMU.  Large file support is enabled in Buildroot Toolchain options.  I also enabled trace-cmd package.  I have to blow away <BR2>/output/build and rebuild because the toolchain option change.  After 3 hours, I got the new buildroot RFS, including the kernel image.
  • I copy uImage and the DTB to the TFTP folder as discussed in this blog and this blog.
  • I expand out the new RFS to the NFS exports folder as discussed in this blog

trace-cmd on Zedboard

After the above detour, let's take the trace-cmd out for a spin on the Zedboard
trace-cmd record -p function od -vAn -N4 -tx4 /dev/urandom
...
CPU0 data recorded at offset=0x1b0000
    278528 bytes in size
CPU1 data recorded at offset=0x1f4000
    245760 bytes in size

trace-cmd record emits a binary file--trace.dat by default, but overridable with "-o" option.  kernelshark is a nice attempt at graphical representation of this binary report file, but I find it lacking compared to the Saleae logic analyzer UI I like quite a bit, it just doesn't feel like it's ready for prime time.  Zooming in/out using the scroll wheel, and auto snap to event are just a few of the usability factor issues.  Until kernelshark becomes usable, trace-cmd report is a usable text based alternative.

trace-cmd report | grep -v trace-cmd > urandom_read.txt

Grepping for "random" (command: cut -d " " -f 22,42- urandom_read.txt | grep random) shows:

...
57.740070:             add_interrupt_randomness
57.740733: get_random_int
57.741692:       add_interrupt_randomness
...
</dev/urandom is opened>
57.751332: random_ioctl <strace shows od did NOT call ioctl>
57.751369: urandom_read
57.751424:             add_interrupt_randomness
57.752511:       add_interrupt_randomness
57.753046:             add_interrupt_randomness
 add_device_randomness

Observations:
  • A lot of add_interrupt_randomness()--which is not strange by itself (there is a lot of interrupt on a typical desktop system) but only because they are all occuring in the "od" task (remove the last cut command to see for yourself)
  • get_random_int(), get_random_bytes() called way ahead of (more than 100 ms) the actual urandom_read()
  • random_ioctl() called right before urandom_read()
Adding randomness in the IRQs marked with SA_SAMPLE_RANDOM in request_irq() is covered in Linux Device Drivers, 3rd Edition Interrupt Handling chapter.  The interrupt randomness--to be distinguished from other source of randomness such as device or disk--is based on the source IRQ number and the timing of the interrupt.  urandom_read trace is here:

                   urandom_read
                      extract_entropy_user
                         xfer_secondary_pool
                         account
                            __wake_up
                               _raw_spin_lock_irqsave
                               __wake_up_common
                               _raw_spin_unlock_irqrestore
                            kill_fasync
                         extract_buf
                            _raw_spin_lock_irqsave
                            __mix_pool_bytes
                               _mix_pool_bytes
                            _raw_spin_unlock_irqrestore

Compare to the simple trace for /dev/zero read; most of the complexity is in extracting from nonblocking entry pool (vs. blocking_pool for /dev/random read).  Entropy pool is critical to obtaining numbers that are in fact random.  In fact /dev/random calls wait_event_interruptible() (vs. the fast spinlock we see above) when the entropy pool is depleted, although I am not sure how fast I have to read from /dev/random to trigger this behavior, but for now, let's just press on with urandom.

Appendix: random seed management across reboot

Since the entropy pool is empty on system power-up, the recommendation in random.c is to save the random seed during shutdown, and load it back in during startup, with scripts like these:

shutdown script

echo "Saving random seed..."
random_seed=/var/run/random-seed
touch $random_seed
chmod 600 $random_seed
dd if=/dev/urandom of=$random_seed count=1 bs=512

Startup script

echo "Initializing random number generator..."
random_seed=/var/run/random-seed
if [ -f $random_seed ]; then
cat $random_seed >/dev/urandom
else
touch $random_seed
fi
chmod 600 $random_seed
#Guard against ungraceful shutdown
dd if=/dev/urandom of=$random_seed count=1 bs=512