32bit

17th October 2002. &copy; Peter Naulls.

8th January 2003 - brought up to date.

No reproduction without permission. Information in this document is given in good faith, but may be subject to errors or omissions, for which I cannot be responsible.


 * There has been a great deal of confusion over 32-bit in RISC OS over recent months. Not least, because of its technical nature, and secondly, because of the spin put on it by various parties.  This document is in the same nature as my previous document - The RISC OS Browser Issue. In short, to bring together information which is already available, and to cut through the issues and make them clear and easily understood by everyone.


 * In order to make this document more palatable to non-programmers, certain technical issues have been simplified. I fully expect that programmers will be able to identify where this has been done, and appreciate the issue fully when it might concern them.  Just be aware it is not my intention to simplify matters to the point of not doing them proper justice.

How it came about
Once upon a time, when Acorn designed the original range of ARM chips, certain design decisions were made. One of these was to limit the range of addresses in which code can be executed to 64MB. This comes about, because the ARM's program counter (PC), which contains the location of the currently executing program is limited to 26-bits of the 32-bit ARM register 15. (ARM has 16 registers normally accessible, numbered R0 to R15). The remainder of the bits in this register reflect processor status.

At this point you may be wondering "hang on, but the RiscPC can use more than 64MB?". And yes it can. The reason is two-fold. First off, the 64MB limitation only applies to addresses where code is executed. All other registers can access data in the full 4GB range of memory. More importantly, all modern machines use a layer of memory mapping so that any physical address can appear at any logical location in RAM at any given time. This is why applications always appear at address &amp;8000, although many applications are running at once.

So the RiscPC (and all previous RISC OS machines) really are 32-bit machines in every sense of the word except the way their program counter works (and as we'll see shortly, for ARM6 onwards this isn't the whole story). But we do refer to all current versions of RISC OS as 26-bit, as they require the program counter to act in this fashion. And although not strictly correct, it is sometimes useful to refer to current machines as 26-bit to differentiate them from any new machines which uses a 32-bit RISC OS.

So why 32-bit?
From ARM6 onwards, a 32-bit mode was added to ARM chips. In this mode, all1 the bits of the PC were dedicated to the address of the currently executed mode. RISC OS continued to not take advantage of this mode, deferring the conversion to 32-bit until the present time. However, the Unix variants of ARM Linux and NetBSD/arm32 developed for RiscPCs shortly afterwards did take advantage of this mode.

In this mode, the status bits present in the program counter in 26-bit mode are now available in a new register - the current program status register, or CPSR, which is available via new instructions. On previous processors, such as ARM3, these instructions are NOPs and do nothing. The instructions in question - MSR and MRS work equally well in 26-bit and 32-bit modes.

So why do RISC OS machines need to move to 32-bit? Well, the StrongARM processor used in RiscPCs supports this 26-bit mode, as do many of the variants in the same generation, but the current generation of ARM chips - in particular XScale, ARM9, etc do not, as the manufacturers (ARM and Intel) have removed the 26-bit compatibility in preference for other logic such as Thumb support and various other embedded uses.

Therefore, if RISC OS wishes to move to newer processors, it must move to a 32-bit mode, if it wishes for a full hardware solution (the alternative is emulation on other hardware such as x86, which will be much slower).


 * 1 Well, not all the bits. As in 26-bit modes, the bottom 2 bits are assumed to be zero, so all addresses lie on a multiple of 4.  The use of these 2 bits is beyond the scope of this article.

Moving to 32-bit
It's been suggested that a move to a 32-bit RISC OS would lose a large amount of the existing software base. For many reasons, this isn't true.

For a 32-bit RISC OS, such as on the RISC OS 5, the problem is that a given program may not immediately run on this OS, because of differences in processor mode.

Whether it will or not depends upon the type of program it is:
 * BASIC and other interpreted languages: As long as they don't contain any assembler (and the majority will not - as high as 99% has been suggested) then these programs will run without incident without modification as they only rely upon the interpreter acting correctly.


 * C Programs and other compiled languages: A large number of RISC OS programs are written in C. These will also work correctly if recompiled to 32-bit using Castle's new 32-bit tools or a or a 32-bit GCC, where appropriate. The programs will remain backwards compatible to 26-bit versions of RISC OS, as we'll see shortly.


 * Programs written in assembler or programs producing ARM output: these programs will need manual (or semi-automated) modification before they will run on a 32-bit RISC OS. This modification is not difficult, but it can be tedious. Also, some C programs can contain ARM code, and will need the same changes.

At this point you might groan at the effort required to convert all these applications to 32-bit. But hold on there a moment - these are most certainly not the only ways existing applications can run on a 32-bit RISC OS. For the moment we'll stick with conversion since it's a logical place to start, and come to the other ways shortly.

APCS
APCS, or ARM Procedure Call Standard, is at the heart of the issue of conversion to 32-bit. APCS defines a number of things, most of which we won't go into here, except to say that they concern the saving of processor flags. If you wish to know more, then refer to PRM 4.

There are quite a number of APCS variants, even within the same processor mode. I'll simplify the matter somewhat, by just talking about "APCS-26" and "APCS-32" which will refer to the two variants we care about. The former being using by existing 26-bit RISC OS, and the latter by a 32-bit RISC OS.

There are a number of specifics to conversion from APCS-26 to APCS-32, which are covered in the documentation with Castle's tools, and I'll briefly summarise here. These are simplications, but serve for illustrative purposes:


 * APCS-32 does not expect function calls to save processor flags. To do so would be an onerous burden, and in many cases, it isn't actually needed anyway. APCS-26 requires saving of flags, and this is facilitated by instructions that work incorrectly in 32-bit mode.


 * For C code, this generally means the compiler doesn't attempt to restore flags with the ^ flag on LDM, or MOVS PC, LR - and instead use LDM without the ^ or MOV PC, LR.


 * Instructions which attempt to modify the processor state by using TEQP etc. are out. Instead, they should modify it via MSR/MRS.  The exception here is for code which also must work on ARM3, and does not have these instructions. This code is reasonably rare, and can easily be accomodated by checking for 32-bit mode, and taking advantage of the NOP status of MSR/MRS on these processors.

A crucial point I wish to make is that APCS-32 code will work correctly on 26-bit RISC OS and 32-bit RISC OS from ARM6 to XScale and also ARM3 if care is taken. This means once a program is 32-bit, there need be only one version. Having a profusion of versions of programs would certainly be confusing.

So, to convert a C program to APCS-32 is simple - recompile it. Although you will need to recompile all its libraries too. For programs using Acorn C/C++, a recompile using the default options with Castle's tools is exactly what you want.

Assembler Conversion
Manual assembler conversion can be rather tedious. Fortunately Castle's tools do provide some macros to help you out, and the assembler will produce warnings for non 32-bit code. It can be a little daunting at first, but with some practice, it becomes quite easy. In future, I hope to produce a reference guide which shows what instruction sequences to replace with what.

David Ruck has produced a tool, Armalyser which is able to produce a disassembly of ARM code - whether it's an application, module or other RISC OS binary format. Most importantly, it can output a format which lists non 32-bit instructions, and which can be fed directly back into an assembler once you have made changes. This is extremely useful if a program's source or libraries cannot easily be obtained for recompilation, or for checking your existing applications and modules.

Other methods
Converting a program to 32-bit isn't the only way to have it run on a 32-bit RISC OS. This is by no means the full list, but it's the most obvious.

Emulation
In this case, we provide a program that looks at each instruction in a program to be executed, and gets the host processor - that is, the 32-bit processor the machine is running on - to perform some equivalent action, either directly, or via traditional emulation means. There are quite a number of ways to implement this, but by and large they act similarly. The great advantage of emulation, is that is quite reliable, and reasonably easy to implement and get correct. The disadvantage is that it can be quite slow, with applications running substantially slower than a native 32-bit application.

I have written my own proposal on how this might work. It actually refers to emulation of an ARM3 processor on a RiscPC for the purposes of Archimedes emulation in ArcEm, but the problem is very similar and the ideas suggested are sound. Since it's quite short, it fails to address many of the specific problems, and is certainly far from being a complete specification. Nevertheless, you can read about ARM on ARM emulation.

JITing
Just in time compilation. This means the ARM instructions are dynamically compiled to a version that is suitable for directly running on the host machine. This has a significant speed advantage over emulation, potentially approaching a significant fraction of a natively run application. It can also deal with some dynamic sequences that may not be easily handled in dynamic reassembly.

The problem with JITs is that they quite difficult to write, with the author needing a somewhat advanced knowledge of compilation techniques.

Aemulor is an example of a JIT.

Jason Tribbeck has also pointed me at a project he was considering with some JIT aspects to it, called Cloe (Code Lookahead Optimal Emulator). There are 4 documents:


 * cloe1
 * cloe2
 * cloe3
 * cloe3b

Dynamic reassembly
This is a method best suited to programs compiled from C, as they contain well structured code which is easily converted automatically. In short, a program is disassembled, and then the APCS-26 only instructions are replaced with APCS-32 ones. There are also some issues with C runtime fixup, but these are much the same for all programs. Finally, the program is run.

Armalyser already does most of the work for this to work. This type of scheme is only really failsafe for pure C programs, and it could quite reasonably refuse to go on if it encountered any difficult instruction sequences. But the great advantage is that programs run at full speed.

You may also notice that this process could be done one off, statically, then the new binary distributed, but this isn't always appropriate nor as transparent. However, it may be exactly the ticket for software authors of some programs.

Conclusion
To recap, I will restate some of the points made in this document:


 * Most programs can be made to work without great difficulty on a 32-bit RISC OS via one of a number of methods - many of which are mentioned here.


 * Programs compiled to be 32-bit will also correctly work on previous versions of RISC OS.


 * You can view the conversion to 32-bit a little like the way the Y2K was dealt with. If it's tackled now, it won't be an issue, when it becomes an issue.


 * If you're developing RISC OS applications right now, make them 32-bit, as that means less effort in the long run, and forward compatibility.


 * All my Unix Ports are fully 32-bit compatible.

As a reminder, the important bits when it comes to running programs on 32-bit RISC OS:
 * C programs need to be recompiled with a 32-bit compiler.
 * Pure Basic programs will work unmodified.
 * ARM programs will need changes made to work correctly.

Finally
Some thoughts of my own that have somewhat less basis in fact:


 * Emulation (or emulation-like) 26/32 technologies such as the mentioned emulators and MicroDigital's Omega may actually hinder or slow the eventual conversion to 32-bit, as there will not be an immediate need to convert applications.


 * Software vendors may well charge for 32-bit upgrades to this software. This is quite reasonable given the effort involved, and I expect the charges to be quite minimal. It will also give a helpful cash injection into the market and encourage developers.