The Far Side

1. Introduction

Linux/uClinux is something I still have fun exploring, and sometimes I try to find out how far one can push this little kernel. Currently, user programs for uClinux are cross-compiled, a quite laborious process. It would be nice if we could simply untar a source distribution, and type

sh ./configure; make; make install
To do so, one needs:

This is a tall order by any standard. Let's take it step by step.

2. General: uClinux and flat binaries.

2.1 Executing a program on systems without an mmu

uClinux is a port of linux to systems lacking a memory management unit (mmu). The native executable format of uClinux is a 'flat binary'. The main advantage of flat binaries versus other binary formats (aout, coff, elf) is a smaller size of the executable. Irrespective of the binary format - a.out, coff, elf, flat - a user program consists of three 'segments':

As bss is filled with zeroes at program start-up, the executable needs only contain its size and address. Contrast this with the data segment, where the executable needs to contain the initial values of the variables as well. Also, data and bss contain only global variables. Variables local to functions are allocated on the stack as the function executes.

Not having virtual memory has its implications on program execution. On systems with a mmu, all programs can begin execution at (virtual) address 0. The mmu will translate the virtual address 0 to the effective load address of the program. On systems without an mmu, this is not possible; the user program has to be adapted to reflect the (real) load address of the program. Two different uClinux ports - uClinux and uClinux/coldfire - offer two different solutions to this problem:

Comparing the two different flat binary formats, one gets the following:

Basically, both have their merits: uClinux tries to minimize memory use; uClinux/coldfire tries to maximize flexibility. The cisco port uses uClinux/coldfire-style binaries; one can run programs of arbitrary size. The remainder of this document assumes an uClinux/coldfire environment.

2.2 Creating an executable for uClinux/coldfire

2.2.1 crt0.o

Let's assume we have compiled our program into object files. What do we need to link it with to create an executable?

crt0.o is a small assembler stub, which is linked with every program. crt0 is that part of the program to which the kernel transfers control upon program startup. crt0 is responsible for calling main(), and cleaning up after exit().

crt* stubs are standard in unix/linux systems. To see what eg. you linux pc uses as crt, just compile the following small program:

foo.c:
main()
{
}

gcc -v foo.c

You will see a few /usr/lib/crt* files linked in.

The source of the uClinux/coldfire crt0.o is in user/arch/coldfire/crt0.S

2.2.2 user.ld

Together with crt0.o and the c libraries, we have all objects needed to link our program. How do we link these objects together?

user.ld is the linker script which informs the linker of the layout of a flat binary. Again, a linker script is not something specific to uClinux - the linker always uses a linker script, even if you do not specify one.

To see what eg. your linux pc uses as linker script, compile the above foo.c with "gcc -Wl,--verbose foo.c" .

The uClinux/coldfire linker script user.ld is in user/arch/coldfire/user.ld. We now link our foo.c program using the user.ld linker script:

m68k-linux-gnu-ld -r -T user.ld -o foo.elf crt0.o foo.o libc.a libgcc.a

Note the -r linker flag: we want to keep the relocation table in the object. The program loader will need this table when starting up the program.

2.2.3 elf2flt

Now we have a (hopefully) completely linked user program. The last step is converting the linker output - which is in elf format - to the more compact flat format:

elf2flt -o foo foo.elf

This can be seen as stripping the binary of all information which is not strictly necessary to run the program. It converts the foo.elf program to a flat binary 'foo' which can be executed on uClinux/coldfire.

3. Requirements for running 'large' programs.

3.1 kernel: BIGALLOCS

When the program is loaded into memory, a contiguous memory space needs to be reserved for it. This memory space needs to be sufficient to contain the text (executable code proper), data (inititalized variables) and bss (uninitialized variables) segments.

The kernel allocates memory in blocks, the size of which is a power to two.The largest block which can be allocated determines the maximum size of a program.

As an example, a cisco 2500 can have up to 16 mb ram. About 450 k is needed for the kernel itself, leaving about 15.5 mb for user programs. The largest power of two which fits in 15.5 mb is 8 mb. Let's patch the cisco kernel to allow for programs up to 8 mb in size.

First, note the -DBIGALLOCS flag is on when compiling the kernel. It's defined in linux/arch/m68knommu/platform/68EC030/Rules.make for the cisco platform (which uses a 68ec030 processor):

CFLAGS := ... -DBIGALLOCS ...

Now in linux/mmnommu/kmalloc.c change:

#ifdef BIGALLOCS
524288 - 16,
1048576 - 16,
#endif
to:
#ifdef BIGALLOCS
524288 - 16,
1048576 - 16,
2097152 - 16,
4194304 - 16,
8388608 - 16,
#endif
and:
#ifdef BIGALLOCS
{NULL, NULL, 1, 0, 0, 0, 0, 7},
{NULL, NULL, 1, 0, 0, 0, 0, 8},
#endif
to:
#ifdef BIGALLOCS
{NULL, NULL, 1, 0, 0, 0, 0, 7},
{NULL, NULL, 1, 0, 0, 0, 0, 8},
{NULL, NULL, 1, 0, 0, 0, 0, 9},
{NULL, NULL, 1, 0, 0, 0, 0, 10},
{NULL, NULL, 1, 0, 0, 0, 0, 11},
#endif
and in linux/mmnommu/page_alloc.c , change:
#ifdef BIGALLOCS
#define NR_MEM_LISTS 9
#else
#define NR_MEM_LISTS 7
#endif
to:
#ifdef BIGALLOCS
#define NR_MEM_LISTS 12
#else
#define NR_MEM_LISTS 7
#endif
Recompile the kernel the usual way. Congratulations, you've just increased the maximum start-up size of user programs from 1mb to 8mb.

3.2 user.ld

Now we need to inform the linker of the happy news. Remember user.ld? In arch/coldfire/user.ld, change:

MEMORY {
flatmem : ORIGIN = 0x0, LENGTH = 0x40000
}

to:

MEMORY {
flatmem : ORIGIN = 0x0, LENGTH = 0x800000
}

This informs the linker user programs can get up to 8mb allocated at start-up. Note we're only talking about the startup size (text + data + bss); once running programs can allocate additional memory as needed by calling malloc().

3.3 Kernel: booting with nfs root.

When developing programs, you'll probably want to run your cisco with the root system on nfs. This way, your files reside on the cross-compiling Linux pc.

As the kernel does not have to include a romdisk image, more memory is available; also it is very convenient to be able to update user programs without having to recompile the kernel.

To run nfs root, install patch 0.1.1-CB, available on the web site and hardcode theip address of the cisco and the nfs-exporting pc in the kernel. In src/linux/arch/m68knommu/kernel/setup.c, modify

#ifdef CONFIG_ROOT_NFS
strcpy(command_line, "root=/dev/nfs "
"nfsroot=172.22.1.74:/opt/uClinux/nfs "
"nfsaddrs=172.22.1.7:172.22.1.74:172.22.1.1:255.255.255.0:c2500");
#endif
Replace 172.22.1.7 with the ip address of your cisco, 172.22.1.74 with the ip address of the nfs-exporting pc, 172.22.1.1 with the address of your default router,and /opt/uClinux/nfs with the directory the pc is exporting via nfs. Other parameters are documented in src/linux/Documentation/nfsroot.txt .

3.4 Examples of large programs

As an example of large programs, these are the native compiler/assembler/linker:

[koen@w2k bin]$ ls -l cc cpp cc1 collect2 as ld
-rwxr--r-- 1 root root 408936 Dec 17 01:42 as
-rwxr--r-- 1 root root 61696 Dec 17 01:43 cc
-rwxr--r-- 1 root root 1452116 Dec 17 01:43 cc1
-rwxr--r-- 1 root root 51616 Dec 17 01:43 collect2
-rwxr--r-- 1 root root 88928 Dec 17 01:43 cpp
-rwxr--r-- 1 root root 327356 Dec 17 01:42 ld

cc1 - the c compiler code generation step - is larger than 1mb. Without the above changes, it cannot run.

4. Cross-compiling egcs and binutils

4.1 uClibc changes needed

Before attempting to compile egcs/binutils for uClinux/coldfire, make the following change to src/lib/libc/include/string.h : Change:

#define index strchr
#define rindex strrchr

to:

#ifndef UCLIBCBUG
#define index strchr
#define rindex strrchr
#else
#define index(x,y) strchr(x,y)
#define rindex(x,y) strrchr(x,y)
#endif

The reason behind this is that binutils/egcs use a variable called "index", and without the above changes the variable index also gets translated to strchr in those pieces of code which include string.h.

4.2 cross-compiling and installing the compiler

Untar the egcs/binutils code into the user source tree, creating directories user/egcs and user/binutils. egcs/Makefile contains the following line:

LDLIBS += /opt/uClinux-cisco2500/tools/lib/gcc-lib/m68k-linux-gnu/egcs-2.91.66/libgcc.a

verify the file .../egcs-2.91.66/libgcc.a exists and contains references for __umoddi3 and __udivdi3 .

/opt/uClinux-cisco2500/tools/bin/m68k-linux-gnu-nm /opt/uClinux-cisco2500/tools/lib/gcc-lib/m68k-linux-gnu/egcs-2.91.66/libgcc.a| grep umoddi3
_umoddi3.o:
00000000 T __umoddi3
/opt/uClinux-cisco2500/tools/bin/m68k-linux-gnu-nm /opt/uClinux-cisco2500/tools/lib/gcc-lib/m68k-linux-gnu/egcs-2.91.66/libgcc.a| grep udivdi3
_udivdi3.o:
00000000 T __udivdi3

umoddi3 and udivdi3 are routines for long long integer arithmetic multiply and divide, used internally by the compiler. Rebuild the user environment. In your cisco root disk, create directories usr/bin, usr/src/linux, usr/include and usr/lib . copy the following files from binutils:

addr2line elf2flt nm ranlib strip ar gasp objcopy size as ld objdump strings

and the following files from egcs:

cc  cc1  collect2  cpp

in the usr/bin directory of your cisco root disk.

In order to allow native compilation, one has to add include headers and libraries to the root disk.

At this point, you have populated the cisco root disk with include files, libraries, and the compiler/assembler/linker toolchain. Alternatively, grab the nfsroot tar from the download section - it contains a prebuilt nfs root disk.

Export your cisco root disk via nfs. Boot your cisco in nfs root. Logged in on the cisco, create a small test program foo.c in /tmp:

/var/tmp> cat foo.c
#include <stdio.h>
main ()
{
printf ("hello, world!\n");
exit(0);
}

Also, create a small script "compile" :

/var/tmp> cat compile
#!/bin/sh
setenv GCC_EXEC_PATH /usr/bin
cc -m68030 -Wa,-m68030 -msoft-float -fno-builtin -static -nostartfiles -nostdlib -Wl,-r,-T,/usr/local/nommu/user.ld foo.c /usr/lib/crt0.o /usr/lib/libc.a /usr/lib/libgcc.a /usr/lib/libmf.a

Now you are ready to compile your first program:

/var/tmp> ./compile
Shell invoked to run file: ./compile
Command: #!/bin/sh
Command: setenv GCC_EXEC_PATH /usr/bin
Command: cc -m68030 -Wa,-m68030 -msoft-float -fno-builtin -static -nostartfiles -nostdlib -Wl,-r,-T,/usr/local/nommu/user.ld foo.c /usr/lib/crt0.o /usr/lib/libc.a /usr/lib/libgcc.a /usr/lib/libmf.a
Execution Finished, Exiting
/var/tmp> # .. I run elf2flt -o foo a.out on the nfs-exporting pc ...
/var/tmp> ./foo
hello, world!
/var/tmp>

Note: at the time of writing, the port of elf2flt to cisco is broken, so I still run elf2flt on the nfs-exporting pc

5. Porting issues

Some notes about porting programs to uClinux. Mainly memory allocation and process creation differ from 'vanilla' linux.

5.1. Memory allocation

5.1.1. sbrk()

sbrk(2) is a system call which - on systems with mmu - increases the end of the data segment (the heap). On uClinux this call doesn't work - data segment size is determined at link time, and cannot be changed at run time. If you encounter sbrk(), chances are your program is trying to do it's own memory allocation. Don't. Use the libc-provided malloc instead.

5.1.2. alloca()

alloca allocates data on the process's stack. The main feature is that if one allocates data on the stack within a function, the memory is automatically reclaimed when the function finishes.

On systems with mmu, stack size is not a problem - there is always virtual memory available. On uClinux, however, maximum stack size is determined at link time and cannot be changed at run time. The desired maximum stack size can be set using the -s flag of elf2flt, eg.:

elf2flt -o foo foo.elf -s 524288

reserves 512k for the stack. The default maximum stack size is 4k; if your program uses alloca or is heavily recursive please set the stack size to something higher. The native compiler, linker and assembler all use alloca. Their stack size is set to 512k, which ought to be sufficient for compiling most programs. YMMV.

5.2 Process creation

On standard unix/linux systems, fork() is the standard way to create other processes. uClinux does not implement fork(); instead it implements vfork(). Quoting the comp.unix.programmer faq <http://www.erlenstar.demon.co.uk/unix/ >

What's the difference between fork() and vfork()?

...

The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.

This means that the child process of a vfork() must be careful to avoid unexpectedly modifying variables of the parent process. In particular, the child process must not return from the function containing the vfork() call, and it must not call exit() (if it needs to exit, it should use _exit(); actually, this is also true for the child of a normal fork()).

Please beware. If your child process messes things up, your parent process will suffer as well.

As vfork()-ed processes share data and stack with their parent, writing vfork()-safe code is not obvious. The bash port uses a wrapper, nfork(), around vfork() which copies data, bss and stack before calling vfork(), and restores the parent's data, bss and stack after the child has finished executing. While not perfect, nfork() makes life easier; but in hindsight it really should be implemented as a system call rather than as user code.

6. Observations

A number of gnu tools were cross-compiled to uClinux. Results are mixed. Some programs - bison, flex - are surprisingly easy to port. Shells seem rather difficult - a tangle of output redirection, signal handlers, longjmps and process creation which is hard to debug. A full-featured fork() would make uClinux much more attractive.

The bash port is able to execute simple commands, but fails at executing shell scripts.

The compiler manages at least to compile "hello, world!" natively but still has bugs in the register allocation when compiling complex programs.

6.1 Memory requirements

The assembler and compiler certainly require at least 512k stack size; and the shell can use a fair amount of stack too when executing complex scripts. But most programs can do with much less. When compiling natively, 8 mb ram seems tight. 16 mb works fine. Anything less than 8 mb ram seems a bit too small to compile in right now.

6.2 nfs

While running nfs root, one sometimes gets "segmentation fault" while running a program. This seems nfs-related; if the program does not need to be fetched over the network but already is in file system buffers on the target the error does not occur. Also, it feels as though nfs might still have a memory leak somewhere.

When running nfs root, make sure the clocks of the target and the nfs-exporting system are more or less in sync. Make relies on files' timestamps to determine whether they need rebuilding from source. If running "make" on the target produces errors such as "file has modification time in the future" or "your build may be incomplete" your clocks are not in sync.

6.3 uClibc

The c library provided, uClibc, is good at producing small user binaries, but is not full-featured. Other libraries exist - newlib, glibc - which provide more functionality, but add up to 200k to a program. Having a native glibc would make porting gnu tools easier, and still leave the option open of linking with uClibc if space is at a premium.

7. Opinion

With hindsight, it should have been obvious a native compiler on uClinux was possible. PDP-11s, early Macintoshes, Amigas, Ataris, XT and AT PCs all lack an MMU and all had native compilers. The size of the software ported is a bit of a shock: the compiler easily gobbles 3-4 mb, and this is two orders of magnitude more than standard uClinux programs. Sometimes the system decidedly feels shaky. Perhaps compiling natively is pushing uClinux a little bit beyond its limits at times.

8. Downloads

8.1. Open issues

8.1.1 bash

Interactive shells work fine, non-interactive don't. That is, this doesn't work:
> bash ./configure
but if you source the same shell script into an interactive shell, it works:
> bash
bash-2.03# . ./configure
Best guess is that shell input is clobbered in non-interactive shells; after executing a program the parser gets confused. Seems to be inside parse.y/y.tab.c, with_input_from.

8.1.2 elf2flt

The native elf2flt - part of binutils - is broken. Best guess is a big/little endian issue.

8.1.3 egcs

Compiling small programs works fine; compiling larger programs doesn't. ("Internal compiler error: cannot find a spill register"). Best guess is this is a subtle compiler bug, triggered by the way the compiler is compiled. Error message popped up in linux-m68k a few years ago, too.

8.2. Sources

These sources are to be untarred in src/user. Compile by adding the directory names to the DIRS variable in src/user/Makefile, and rebuilding the user environment. Treat everything down here as beta. When using these programs, you are treading on fresh snow.

An nfs root disk with a populated /lib and /usr/include, and beta's of gcc and make. These are binaries, compiled for 68ec030 (cisco). As root, untar with bzcat nfsroot-00.tar.bz2 | tar xvpf - in the nfs root directory of your cisco.

koen