Linux on an 8-bit micro?

UPDATE #4

Some fixes in GPIO,CPU,and LCD code, use SDL for display in PC build

UPDATE #3

Apr 16, 2012, 2am PDT: Small schematic fix (SD card's SCK & VDD were switched)

UPDATE #2

Apr 3, 2012, 1am PDT: new source code archive uploaded. Sped up emulator (6.5KHz->10KHz) using FPM mode on RAM & changed icache config, updated porting guide, included kernel image & new smaller ramdisk, new full image

UPDATE #1

Mar 29, 2012, 7pm PDT: new source code archive uploaded. It has a fixed Makefile and now includes a porting guide to help you port it to other boards/CPUs

Intro

It is common to see newbies asking in microcontroller forums if they can run Linux on their puny little 8-bit micro. The results are usually laughter. It is also common to see, in Linux forums, asked what the minimum specs for Linux are. The common answer is that it requires a 32-bit architecture and an MMU and at least a megabyte of ram to fit the kernel. This project aims to (and succeeds in) shatter(ing) these notions. The board you see on the right is based on an ATmega1284p. I've made one with an ATmega644a as well, with equal success. This board features no other processor and boots Linux 2.6.34. In fact, it can even bring up a full Ubuntu stack, including (if you have the time) X and gnome.

RAM

Yes, it is true that a full Linux install requires megabytes or RAM and a 32-bit CPU with an MMU. This project has all of that. First let's address the RAM. As you can see, there is an antique 30-pin SIMM memory module on the board. These were in use for 80286-based PCs. It is interfaced to the ATmega, and I wrote the code to access it as well as refresh it within spec (SDRAM requires constant refreshing to avoid losing data). How fast it is? The refresh interrupt happens every 62ms and takes up 1.5ms, thus eating under 3% of the CPU. RAM is accessed, for ease of programming, one byte at a time. This results in a maximum bandwidth of about 300 kilobytes per second.

Storage

With the RAM requirement put to rest, we have two to deal with. Storage is not too difficult a problem to solve. SD cards are quite easy to talk to using SPI, and my project does that. A 1GB SD card works fine, though 512Mb would be enough for this particular file system (Ubuntu Jaunty). The ATmega does have a hardware SPI module, but for whatever reason, it didn't quite work out, so I am bit-banging the interface. It is still plenty fast - about 200kilobytes per second. This also adds a nice touch to the project - it can be done on any microcontroller with enough pins - no hardware modules are used.

CPU

All that's left is that pesky 32-bit CPU & MMU requirement. Well the AVR has no MMU and is 8-bit. To conquer this obstacle, I wrote an ARM emulator. ARM is the architecture I am most familiar with, and it's simple enough that I could comfortably write an emulator for it. Why write one instead of porting one? Well, porting someone else's code is no fun, plus none of the emulators I saw out there were written in a way that would make them easy to port to an 8-bit device. One of the factors: AVR compiler insists on making ints 16-bit so something as simple as "(1 << 20)" will get you in trouble, producing zero. Instead you need to do "(1UL << 20)". Needless to say trawling someone else's unknown codebase looking for all places where ints are assumed and would fail would be a disaster. Plus I wanted a chance to write a nice modular ARM emulator. So I did.

Other features

The board's communication with the real world occurs over a serial port. Currently it is attached to a serial port on my PC running minicom, but it is fathomable to instead connect a keyboard and a character LCD to the board, making it entirely standalone. Two LEDs exist on the board as well. They signal SD card access. One for read, one for write. A button is onboard too. When pressed and held for a second it will spit out on the serial port the current effective speed of the emulated CPU. The AVR is clocked at 24MHz (a slight overclocking over its stock 20MHz)

How fast is it?

uARM is certainly no speed demon. It takes about 2 hours to boot to bash prompt ("init=/bin/bash" kernel command line). Then 4 more hours to boot up the entire Ubuntu ("exec init" and then login). Starting X takes a lot longer. The effective emulated CPU speed is about 6.5KHz, which is on par with what you'd expect emulating a 32-bit CPU & MMU on a measly 8-bit micro. Curiously enough, once booted, the system is somewhat usable. You can type a command and get a reply within a minute. That is to say that you can, in fact, use it. I used it to day to format an SD card, for example. This is definitely not the fastest, but I think it may be the cheapest, slowest, simplest to hand assemble, lowest part count, and lowest-end Linux PC. The board is hand-soldered using wires, there is not even a requirement for a printed circuit board.

Details on the emulator?

The emulator is pretty modular, allowing it to be extended at will to emulate other SoCs and hardware configurations. The emulated CPU is ARMv5TE. A while ago I started work on supporting ARMv6, but it was never finished (as can be seen in the code) due to not being needed. The emulated SoC is a PXA255. Due to the modularity of the design, you can replace SoC.c and make a whole new SoC with the same ARMv5TE core, or replace the core, or replace the peripherals at will. This is on purpose, as I meant this code to also be a reasonably clean demonstration on how ARM SoCs work. The code for CPU emulator itself is not too clean, since, well, it is a CPU emulator. It was written over 6 months of free time a few years back, and then put aside. It was resurrected recently specifically for this project. The emulator implements an i-cache to speed things up. This helps a lot on the AVR where internal memory can be accessed at over a 5 megabytes a second, unlike my external RAM. I never got around to implementing d-cache but it was in the todo list. The access to the block device is not emulated as an SD device. This turned out to be too slow. Instead there is a paravirtualized disk device (pvdisk, see pvDisk.tar.bz2, GPL license) that I wrote that uses an invalid opcode to call into the emulator and access the disk. The ramdisk in my image loads this pvdisk, and then chroots to /dev/pvd1. The ramdisk is included as "rd.img". The "machine type" I use is PalmTE2. Why? Because I am quite familiar with the hardware and it was the first pxa255 machine type I saw.

Hypercalls?

There are a few services you can request from the emulator by using a special opcode. In arm it is 0xF7BBBBBB, in thumb it is 0xBBBB. These are picked since they are in range that ARM guarantees to be undefined. The hypercall number is passed in R12, params are passed in R0..R3, return values are placed in R0. Calls:

0 = stop emulation
1 = print decimal number
2 = print char
3 = get ram size
4 = block device ops(R0 = op R1 = sector num). Note that these do not write the emulatoed RAM, they fill in a in-emulator buffer, which the emulated guest accesses using another hypercall, one word at a time. I meant to implement DMA, but never got around to it. Ops:
- 0 = getInfo (if sector num is zero, return num sectors; if sector num is 1, return sector size in bytes)
- 1 = sector read
- 2 = sector write
5 = block device buffer access (R0 = value in/value out, R1 = word number , R2 = 1 if write, 0 else)

Thumb support?

Thumb is fully supported. I cheat a bit, though, decoding each thumb instr to an equivalent ARM instr and executing that instead using the arm emulator function. It is not as fast as it could be otherwise, but it is simple and the code is small. A 256KB lookup table could be used to, but I felt that 256KB is too big for the microcontroller's flash. Some thumb instructions cannot be converted to ARM, they are handled correctly instead.

I want to build one!

For non-commercial purposes, you can definitely do that. The wiring is as follows. RAM DQ0..DQ7 -> AVR C0..C7. RAM A0..A7 -> AVR A0..A7. RAM A8..A11 -> AVR B0..B3. RAM nRAS nCAS nWE -> AVR D7 B4 B5. SD DI SCK DO -> AVR B6 B7 D6. LEDs read write -> AVR D2 D3 (LED's other legs grounded). Button -> AVR D4 (other leg grounded). The ram can be any 30-pin 16MB SIMM that can live with CAS-before-RAS refresh of 4K cycles every 64ms. The one I used (OWC) is available online for a few dollars. The schematic is shown here. Click it for a bigger view or click here.

Source code?

It is a bit of a mess, but it does work. Get it while it's hot: LINK. The license is simple: For non-commercial use, as long as you keep the licence file with the source and publish all your changes, we're cool. For commercial use, talk to me, and we'll agree on something. To build the emulator to try it on the PC type "make". To run use "./uARM DISK_IMAGE". To build optimized PC version use "make BUILD=opt". to build for AVR use "make BUILD=avr". It currently targets ATmega1284p. To target ATmega644, besides the makefile change, reduce the numbers in icache.h so that the icache is small enough to fit in the internal RAM in the 644. Included in the archive is the final hex file for the 1284p as well.

Boot process

To save code-space in the AVR, almost no boot code exists in the emulator. In fact, the "ROM" is a grand total of 50 bytes: 8 bytes to switch to thumb mode, and some thumb code to read the first sector of the SD card and jump to it in thumb mode (see embeddedBoot.c). The MBR of the SD card has another bootloader (written in thumb mode). This one looks at the MBR, finds the active partition and loads its contents to the end of RAM. It then jumps to that destination ram address + 512 (see mbrBoot.c). There lives the third and largest bootloader, ELLE (See ELLE.c). This one relocates the ramdisk, sets up ATAGS, and calls the kernel. All the binaries and sources are provided so that you can make your own images at will. The boot process should be reminiscent of PC boot. :) The included mkbootimg.sh tool can be used to make a working image for the boot partition. A complete working disk image? Here you go: LINK.

Video

The raw video is in a few segments, since I had to change camera batteries a few times while filming. I then spliced them together, to create a huge 3-and-a-half-hour-long video. The uncut version of the video? Here you go.. I then cut out the interesting parts, sped them up 3x (to fit into youtube video length limits) and made the video you see embedded here on the left. There is a clock in view in the video, showing time elapsed since start in realtime.