=============================================================================
FPU Tutorial v 1.02 (06/05/2000) (c) 2000 Eli-Jean Leyssens
This tutorial can be downloaded from http://www.dse.nl/~topix
=============================================================================
This is an extremely short (or is it ;) tutorial on how to use the FPU,
Floating Point Unit. It shows you how to move values to and from the FPU and
how to use the data operations, like divide, square root etc.
First off, I assume you already know what floating point numbers are and
what single, double and extended precision means. If you don't know what they
are then you can probably still learn something from this tutorial and you
could probably incorporate some of the example code into your own programs.
However, I would certainly advise you to look around on the Internet for some
documents describing the general idea and workings of floating point numbers.
I've included some links at the end of this tutorial which could help you
on your way.
Secondly, note that to run the examples you'll either need RISC OS 4 or the
ExtBas module which extends the BASIC module to recognize and assemble FP
instructions. The ExtBas module is part of the archive:
ftp://mic2.hensa.ac.uk/local/riscos/programming/extbasdis.zip
---------------------------------------------
Floating Point Unit on RISC OS machines
---------------------------------------------
On some machines the FPU is present in hardware as a coprocessor, called a
FPA, Floating Point Accelerator, but most RISC OS machines only have the FPE,
Floating Point Emulator. Note that even with a FPA fitted some instructions
may still be emulated in software.
There can be slight variations in accuracy between FPA and FPE
implementations, but generally speaking programs do not need to know whether
a FPA is fitted or not. The main difference between FPA and FPE is speed of
execution.
This means that you can write code that uses FP instructions without having
to worry whether they'll be executed by dedicated hardware or emulated in
software. When no FPA is present an FP instruction will yield an "Undefined
instruction" exception. The Undefined instruction vector is called, which is
claimed by the FP Emulator, which then emulates the "undefined" instruction.
Execution is then continued after the emulated FP instruction, without any
registers being corrupted (except for the ones requested by the FP
instruction of course).
The FPU has 8 (eight) floating point registers, known as F0 to F7 and also
a status and a control register. In this tutorial we'll only look at the
"normal" floating point registers, from here on called FP registers, not at
the status or control registers.
The format in which numbers are stored in FP registers is not specified.
The different FP formats only become visible when transferring a number from
or to memory:
Name Size Exponent Fraction
Single Precision (S) 4 bytes 8 bits 23 bits
Double Precision (D) 8 bytes 11 bits 52 bits
Double Extended Precision (E) 12 bytes 15 bits 64 bits
Packed Decimal (P) 12 bytes 4 digits 19 digits
Expanded Packed Decimal (EP) 16 bytes 6 digits 24 digits
If you look closely at the the table above you'll notice that Packed
formats store the numbers as digits rather than bits. This is done by storing
1 digit per nibble (4 bits).
In almost all our examples we'll store numbers in memory at Single
Precision (S) and we won't even look into the Packed format as it's rather
silly ;) although it can of course be useful, especially when communicating
with humans as they better understand digits than bits :)
All basic floating point instructions operate as though the result were
computed to infinite precision and then rounded to the length, and in the
way, specified by the instruction. The rounding is selectable from:
- Round to nearest
- Round to +infinity (P)
- Round to -infinity (M)
- Round to zero (Z)
The default is "round to nearest"; in the even of a tie, this rounds to
"nearest even" as required by the IEEE.
>> NOTE! you should only use FP instructions in User Mode programs! <<
--------------------------
Moving data TO the FPU
--------------------------
Before we can tell the FPU to for instance divide two numbers we'll of
course need a way to tell it what these two numbers are. There are many ways
to load a number into a FP register; I'll only show the three most popular
ones here.
The first method is to load a value into a normal ARM register first and
then load the value of that register into any of the 8 FP registers. The
instruction used for the latter operation is FLT. Here's an example of how
you can load the value 123 into FP register f0:
mov r0, #123 ; First setup an ARM register with the value
flts f0, r0 ; Now transfer the value from the ARM
; register into the FP register.
The s in flts means that we want to use single precision. Also note that
instead of r0 we could have used any other general purpose ARM register and
instead of f0 we could have used any of the 8 FP registers. So,
mov r9, #123
flts f3, r9
would also have worked, although then of course f3 would have 123 loaded into
it and not f0.
It should be fairly obvious that by using FLT we can only load integers
into the FP registers. I mean, you can't load 123 and a half into r0 and
therefore you can't load 123.5 into f0 either. At least, not by using FLT.
Which brings us to the second most popular instruction for loading values
into FP registers, namely FLD. To use FLD though you'll first need to set up
a floating point value in memory. And I do mean floating point value. So,
just setting up an integer value using equd won't work. Luckily there's an
instruction for defining a floating point value in memory as well, namely
EQUF. So, to load 123 and a half into f0 you could use:
.floataddress
equfs 123.5
.code
ldfs f0, floataddress
Easy huh? Note that once again the s in both equfs and ldfs stands for
single precision. For ldfs this is particularly important as the precision
must match the precision you specified at the equf command.
The third method shown here for loading values into FP registers also uses
the FLT instruction, but instead of loading the value from an ARM register
the value is encoded in the FP instruction. There are only a small number of
values that can be loaded in this way though. They are: 0, 1, 2, 3, 4, 5, 10
and 0.5
flts f0, #3 ; Load 3 into f0
flts f1, #0.5 ; Load 0.5 into f1
These special values can be used in FP data operations as well as you'll
find out later on.
Now, before we look at how we can perform operations on the FP registers,
let's first look at ways to move data from the FP registers back to the ARM
registers or memory.
----------------------------
Moving data FROM the FPU
----------------------------
For copying data from the FPU we'll look only at the two most popular ways:
transfering a single FP register to a single ARM register or memory.
To transfer the value from a FP register into a normal ARM register you can
use the FIX instruction. So to transfer the value of f3 into r9:
fix r9, f3
Note the absence of the precision identifier, fix doesn't take one. Also
note that registers can only contain integers, so the number stored in r9 is
the rounded value of f3. You can find out how to specify the rounding mode
further down in this document.
To save the value from f3 into memory:
stfs f3, floataddress
Yes, once again you need to specify the precision. Note that Double and
Extended precision floating point numbers take up more bytes than single
precision ones. So, if you defined floataddress with equfs than you should
not use stfD as that will overwrite more bytes than you reserved with equfs.
Right, now that we now how to move data to and from the FPU let's look at
some data operations.
---------------
Square root
---------------
One of the simplest operations is the Square root operation as it only
operates on one value. The instruction for it is SQT and it takes two
parameters. The first parameter indicates the FP register to store the
result in, the other indicates the FP register to take the Square root of.
sqts f0, f1 ; f0 = sqt( f1)
It's as simple as that. So, the "entire" code to calculate the square root
of an ARM register, by using the FPU for the calculation would be:
; r0 = number
flts f0, r0 ; f0 = r0
sqts f0, f0 ; f0 = square root of f0, single precision
fix r0, f0 ; r0 = f0 = sqt( r0)
The "sqroot" program included in this archive contains a working example.
----------------------
Divide and conquer
----------------------
Another handy operation is the divide operation. The instruction for it is
DVF and it takes three parameters, all indicating FP registers. The
parameters are for Quotient, Number and Divisor.
dvfs f0, f1, f2 ; f0 = f1 / f2
So, the code to divide two ARM registers, by using the FPU for the
calculation would be:
; r0 = number
; r1 = divisor
flts f0, r0 ; f0 = r0
flts f1, r1 ; f1 = r1
dvfs f0, f0, f1 ; f0 = f0 / f1
fix r0, f0 ; r0 = f0 = f0 / f1 = r0 / r1
The "divide" program included in this archive contains a working example.
------------------
Wave "Bye-Bye"
------------------
For our last example of data operation instructions we'll look at the sine
wave. As the FPU's sine (and cosine) calculations are extremely slow you will
almost certainly only want to use them to build a look up table. So, that's
just what I'm going to show you.
The first thing you need to know about FPU's sine, cosine, tangent etc
functions is that they work with radians, not degrees. So, a full sine period
is 2*PI (radians) and not 360 (degrees).
Right then, let's say we want to build a sine lookup table with 256 values
describing a whole period. We'll set the amplitude at 127. In BASIC you would
probably do it somewhat like this:
Steps% = 256 : REM Number of steps to divide one period in
Amplitude% = 127 : REM Amplitude of the sine wave
DIM SineTable% Steps%*4 : REM 4 bytes per value as we're storing
REM words, not bytes
FOR x% = 0 TO Steps%-1
SineTable%!( x% * 4) = Amplitude% * SIN( x% * ( 2*PI / Steps%))
NEXT
If you have a hard time understanding this BASIC version then I can only
advise you to dust off some old calculus books before you proceed to the FPU
version ;)
The assembly version using FPU isn't much different from the above. I'm not
going to type it in here though, just look at the "sine" example program.
-----------------------------
Could you be more precise?
-----------------------------
Note that throughout these examples I've used single precision. This means
that only 23 bits will be used for the Fractional part of the floating point
number. However, due to the way floating point numbers work we effectively
get 24 significant bits. So, if you want to load/store numbers bigger than
&ffffff without losing information from the least significant bits then you
should use Double or Extended precision instead. Simply append a d or e
instead of an s after the floating point instruction. So, instead of flts,
you should use fltd or flte. Take a look at the "fltSfltD" example for
further clarification.
------------------------------------
"No, it's rounder" (c) 2000 Nike
------------------------------------
As you have probably read in the part "Floating Point Unit on RISC OS
machines" there are several rounding modes. By default numbers are rounded to
nearest. Note that this rounding not only occurs when transferring values
from FP registers to ARM registers, but also when storing FP registers in
memory, but more importantly also internally in the FPU.
Assume we're loading the value of f3 into r9. Let's see what the results of
the different rounding modes are for 4 different values of f3.
Rounding -4.5 -3.6 -3.5 -3.4 3.4 3.6 3.5 4.5
(Nearest) -4 -4 -4 -3 3 4 4 4
P(lus infinity) -4 -3 -3 -3 4 4 4 5
M(inus infinity) -5 -4 -4 -4 3 3 3 4
Z(ero) -4 -3 -3 -3 3 3 3 4
So, Nearest is also nearest to what you're used to in every day life,
except that on a tie, that is x.5 it is rounded to the "nearest even". So,
that's why 4.5 is not rounded to 5 (uneven), but to 4 (even).
Plus infinity means it's always rounded up to the "higher" value. So, -3.6
is rounded up to -3 as -3 is higher than -4.
Minus infinity means it's always rounded down to the "lower" value. So, 3.6
is rounded down to 3 as that's lower than 4.
Zero is simply discarding the part after the point :)
-----------------------
FP Instruction List
-----------------------
This list is in no way complete! It doesn't include instructions for
handling the status or control registers, nor does it include instructions
for loading/storing multiple FP registers.
-- Register transfer --
Instruction syntax:
FLT{cond}prec{round} Fn, Rd
FLT{cond}prec{round} Fn, #Value
FIX{cond}{round} Rd, Fm
Don't get fooled by the d in FLT... Fn, Rd The destinaton register is
always the first one, just like with any other ARM instruction. So, FLT Fn,
Rd stores the ARM register Rd in FP register Fn.
{cond} is the standard ARM instruction condition (eq, ne, gt etc)
prec is the precision ( S, D, E etc)
{round} is the rounding mode ( P, M, Z)
{cond} and {round} are of course optional and default to respectively
Always and Nearest
Value can be any of 0, 1, 2, 3, 4, 5, 10, 0.5
Instructions:
FLT Integer to Floating Point Fn := Rd
FIX Floating Point to Integer Rd := Fm
-- Data operations --
Instruction syntax:
unop{cond}prec{round} Fd, Fm
unop{cond}prec{round} Fd, #Value
binop{cond}prec{round} Fd, Fn, Fm
binop{cond}prec{round} Fd, Fn, #Value
unop, or unary operations, calculate with just one parameter
binop, or binary operations, calculate with two parameters
Value can be any of 0, 1, 2, 3, 4, 5, 10, 0.5
Instructions:
ADF Add Fd := Fn + Fm
MUF Multiply Fd := Fn * Fm
SUF Subtract Fd := Fn - Fm
RSF Reverse Subtract Fd := Fm - Fn
DVF Divide Fd := Fn / Fm
RDF Reverse Divide Fd := Fm / Fn
POW Power Fd := Fn to the power of Fm
RPW Reverse Power Fd := Fm to the power of Fn
RMF Remainder Fd := remainder of Fn / Fm
Fn - Fm * integer value of ( Fn/Fm)
* FML Fast Multiply Fd := Fn * Fm
* FDV Fast Divide Fd := Fn / Fm
* FRD Fast Reverse Divide Fd := Fm / Fn
MVF Move Fd := Fm
MNF Move Negated Fd := -Fm
ABS Absolute value Fd := ABS( Fm)
RND Round to integral value Fd := integer value of Fm
SQT Square root Fd := square root of Fm
LOG Logarithm to base 10 Fd := log Fm
LGN Logarithm to base e Fd := ln Fm
EXP Exponent Fd := e to the power of Fm
SIN Sine Fd := sine of Fm
COS Cosine Fd := cosine of Fm
TAN Tangent Fd := tangent of Fm
** ASN Arc Sine Fd := arcsine of Fm
ACS Arc Cosine Fd := arccosine of Fm
ATN Arc Tangent Fd := arctangent of Fm
* FML, FDV and FRD are only definded to work with single precision operands
and are not necessarily faster than MUF, DVF and RDF.
** Use ASN Fd, #1 to easily load Pi/2 into Fd.
Note that for all these unops and binops you can replace Fm by one of the
constants 0, 1, 2, 3, 4, 5, 10 and 0.5 This is also why there are Reverse
version of some of the instructions.
The rounding according to the rounding mode specified in the instruction
is only applied in the final stage. The rounding done during the actual
calculations to compute the value are all done with the Nearest rounding
mode.
This is especially noticable for RMF:
Fn := 18
Fm := 5
Fd := Fn - Fm * integer value, rounded to Nearest, of ( Fn / Fm)
:= 18 - 5 * integer value, rounded to Nearest, of ( 18 / 5)
:= 18 - 5 * integer value, rounded to Nearest, of 3.6
:= 18 - 5 * 4 <- !!!
:= 18 - 20
:= -2 !!!
You could correct for this by adding Fm to the remainder when the
remainder is less than zero.
--------------
Link me up
--------------
Here are some links to documents you might find useful in respect to using
and coding for the FPU.
As mentioned at the start of this tutorial, you'll need something like the
ExtBas module to assemble FP instructions if you don't have RISC OS 4. This
module is part of the archive:
ftp://mic2.hensa.ac.uk/local/riscos/programming/extbasdis.zip
There is a whole chapter on the Floating Point Emulator in the RISC OS 3
PRMs (Programmer's Reference Manuals). It should probably have been called
Floating Point Unit instead and it's quite a good read:
Programmer's Reference Manual, Volume 4, Pages 4-163 to 4-184
Even more technical documentation can be found on the ARM Ltd site. The
documentation for the ARM7500FE contains three chapters on the FPA. The
documentation for the ARM7500FE has been split up into several files. Either
view the table of contents, or download only the file containing the FPA
documentation. Note that the documentation is in PDF format. There are PDF
readers out in the Public Domain though.
http://www.arm.com/Documentation/UserMans/PDF/ARM7500FEvB.html
http://www.arm.com/Documentation/UserMans/PDF/ARM7500FEvB_5.pdf
Last but not least, you can learn quite a bit from looking at other
people's code. Many entries in the CodeCraft competition(s) use FP
instructions and as one of the rules of the competition(s) is that full
sources must be included they might prove to be valuable examples. If you're
lost in the high number of entries then I can only say that at least my entry
called HappyRGB, which can be found in the 1K Entries section of the
CodeCraft#2 competition, has a lot of FP code.
http://surf.to/codecraft
http://www.cybercable.tm.fr/~brooby/code.htm
http://www.dse.nl/~topix -> Click the CodeCraft menu entry
-----------
Credits
-----------
Many thanks to Tony Haines for proof reading this tutorial and making some
excellent suggestions on how to improve it.
Much information was gathered from the Floating Point Emulator chapter of
Acorn's Programmer's Reference manual and ARM Ltd's ARM7500FE documentation.
You can find links to both in the "Link me up" chapter above.
-------------
Copyright
-------------
This tutorial and the accompanying example programs have all been written
by Eli-Jean Leyssens, aka Pervect of Topix. Eli-Jean Leyssens holds the
copyright to this tutorial. The accompanying example programs are to be
considered an integral part, and as such this text may only be copied
/together/ with the example programs. Equally, if you wish to copy the
example programs then you must also include this text.
You are freely permitted to use the example routines in your own programs.
An acknowledgement of any help obtained would be appreciated.
This tutorial, in whole or in part, may not be published in any magazine,
digital or hardcopy, or on any website without the written permission of the
copyright holder.
Download text version + example sources: FPE102.ZIP (10k)
Distributed via www.icebird.org with permission by Topix.
©2000 Icebird Acorn Produxions