A Bit of RISC
I recently picked up Hacker's Delight and having a lot of fun with it. It's a collection of bit-manipulation tricks collected by hackers over many years. You can flip open pretty much anywhere in the book and start learn something really cool.
There's something about seeing bit strings and assembly code in a book that really catches my attention, this one goes a bit further by even describing a complete RISC (Reduced Instruction Set Computer) we can implement to play around with the various tricks.
As an exercise and for fun, I'd like to employ some Lisp-fu here and implement a small VM specifically designed for mangling bits.
You can find most of the code from the book itself here.
1. Design
1.1. The Machine
We'll be sticking with a 32-bit word length as recommended in the Preface. We will also represent registers as integers whenever possible, instead of say a bit-vector. Without going into too much detail, it's more efficient to do bitwise ops on integers instead of bit-vectors in Lisp.
We need a minimum of 16 general purpose registers, typically of word length, with R0 reserved for a constant 0. To address 16 different registers we actually only need 4 bits - 5-bits if we needed 32, 6 for 64, etc.
Floating-point support and special purpose registers are not required.
1.2. Instructions
The Hacker's Delight RISC opcodes are described in two tables,
denoted basic RISC
and full RISC
respectively.
Most instructions take two source registers RA
and RB
with a
destination register RT
. The actual general-purpose registers are
labelled R0
(containg the constant 0) through R15
.
A 3-Address machine is assumed and some instructions take 16-bit
signed or unsigned immediate values - denoted I
and Iu
respectively.
Opcode Mnemonic | Operands | Description |
---|---|---|
add,sub,mul,div,divu,rem,remu | RT,RA,RB | RT <- (op RA RB) |
addi,muli | RT,RA,I | RT <- (op RA I), I is a 16-bit signed immediate-value |
addis | RT,RA,I | RT <- (+ RA (|| I 0x0000)) |
and,or,xor | RT,RA,RB | RT <- (op RA RB) |
andi,ori,xori | RT,RA,Iu | As above, expect the last operand is a 16-bit unsigned immediate-value |
beq,bne,blt,ble,bgt,bge | RT,target | Branch to target if (op RT) |
bt,bf | RT,target | Branch true/false, same as bne/beq resp |
cmpeq,cmpne,cmplt,cmple,cmpgt,cmpge,cmpltu,cmpleu,cmpgtu,cmpgeu | RT,RA,RB | RT <- (if (op RA RB) 1 0) |
cmpieq,cmpine,cmpilt,cmpile,cmpigt,cmpige | RT,RA,I | Like cmpeq except second comparand is a 16-bit signed immediate-value |
cmpiequ,cmpineu,cmpiltu,cmpileu,cmpigtu,cmpigeu | RT,RA,I | Like cmpltu except second comparand is a 16-bit unsigned immediate-value |
ldbu,ldh,ldhu,ldw | RT,d(RA) | Load an unsigned-byte, signed-halfword, unsigned-halfword, or word into RT from (+ RA d) where d is a 16-bit signed immediate-value |
mulhs,mulhu | RT,RA,RB | RT gets the high-order 32 bits of (* RA RB) |
not | RT,RA | RT <- bitwise one's-complement of RA |
shl,shr,shrs | RT,RA,RB | RT <- RA shifted left or right by rightmost six bits of RB; 0-fill except for shrs, which is sign-fill (shift amount treated modulo 64) |
shli,shri,shrsi | RT,RA,Iu | RT <- RA shifted left or right by 5-bit immediate field |
stb,sth,stw | RS,d(RA) | Store a byte,halfword,word from RS into memory at location (+ RA d) where d is a 16-bit signed immediate-value |
Opcode Mnemonic | Operands | Description |
---|---|---|
abs,nabs | RT,RA | RT <- (op RA) |
andc,eqv,nand,nor,orc | RT,RA,RB | RT <- (op RA RB) |
extr | RT,RA,I,L | extract bits I through I+L-1 of RA and place them right-adjusted in RT, with 0-fill |
extrs | RT,RA,I,L | Like extr, but sign-fill |
ins | RT,RA,I,L | Insert bits 0 through L-1 of RA into bits I through I+L-1 of RT |
nlz | RT,RA | RT gets count of leading 0's in RA (0 to 32) |
pop | RT,RA | RT gets the number of 1-bits in RA (0 to 32) |
ldb | RT,d(RA) | Load a signed byte into RT from memory at location (+ RA d) where d is a 16-bit signed immediate value |
moveq,movne,movlt,movle,movgt,movge | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
shlr,shrr | RT,RA,RB | RT <- RA rotate-shifted left or right by the rightmost 5-bits of RB |
shlri,shrri | RT,RA,Iu | RT <- RA rotate-shifted left or right by the 5-bit immediate field |
trpeq,trpne,trplt,trple,trpgt,trpge,trpltu,trpleu,trpgtu,trpgeu | RA,RB | Trap (interrupt) if (op RA RB) |
trpieq,trpine,trpilt,trpile,trpigt,trpige | RA,I | Trap if (op RA I) where I is a 16-bit signed immediate-value |
trpiequ,trpineu,trpiltu,trpileu,trpigtu,trpigeu | RA,Iu | Trap if (op RA Iu) where Iu is a 16-bit unsigned immediate-value |
There is also some extensions, which are like macros that usually expand to a single instruction.
Extended Mnemonic | Expansion | Description |
---|---|---|
b target | beq R0,target | Unconditional branch |
li RT,I | (addi,addis,ori) | Load immediate |
mov RT,RA | ori RT,RA,0 | Move register RA to RT |
neg RT,RA | sub RT,R0,RA | Negate (two's-complement) |
subi RT,RA,I | addi RT,RA,-I | Subtract immediate |
All of these instructions are available on x86,arm,riscv and the likes so no real surprises. We will implement the basic set in Lisp, mapping instructions to Lisp functions instead of falling back to the host architecture.
1.3. Execution Model
We'll build this machine in Lisp and use plenty intrinsics from SBCL. As a starting point I followed Paul Khuong's excellent blog post: SBCL: The ultimate assembly code breadboard.
Some things to keep in mind for our machine:
- every instruction requires at most two register reads and one register write - good for compilers
- every instruction counts as a single cycle
- we pay no attention to instruction-level parallelism
2. The HAKMEM VM
First we define a package for our VM:
(defpackage :hakmem (:use :cl :std))
Next, we define the word size and a type which doesn't conflict with
the SB-EXT:WORD
symbol.
(defconstant +word-size+ 32 "word size and register length of the hakmem VM.") (deftype hword () `(unsigned-byte ,+word-size+))
And use the word definition here for some register names we'll bind
later. *IP*
is an index into *STACK*
which is a simple vector of
words. *INSTRUCTIONS*
is a hash-table where we'll store our VM
instructions.
(declaim (hword ra rb rt) ;; arg0,arg1,return (hword r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15) ;; GPRs (array-index *ip*) ((simple-vector 8) *stack*)) (defconstant r0 0) (defparameter *stack* (coerce (make-array 8 :element-type t :adjustable nil) 'simple-vector)) (defparameter *ip* 0) (defparameter *scratch* nil) (defvar *instructions* (make-hash-table))
Next are a couple of utilities we'll use later.
(@ 0)
returns the instruction at the top of*STACK*
,(@ 1)
the one just below, etc.
(defun @ (i) (aref *stack* (mod (+ i *ip*) (length *stack*)))) ;; todo (defun reg3 (rt ra rb)) (defun reg2i (rt ra i)) (defun reg2ui (rt ra ui)) (defun get-inst (i) (gethash i *instructions*)) (defmacro .inst (i &body args) `(funcall (get-inst ',i) ,@args)) (defun hakmem-instruction-set (&optional (tbl *instructions*)) (hash-table-alist tbl))
The DEFINST
macro is used to define instructions which get added to
*INSTRUCTIONS*
. DEFPRIM
calls DEFINST
internally and simply
binds a name to a lisp function name of two arguments - the register
values of RA
and RB
and binds the result to register RT
.
(defmacro definst (name args &body body) ;; todo: compose a function based on regs+args+body `(let ((ra 0) (rb 0) (rt 0)) (declare (ignorable sc r0 ra rb rt)) (setf (gethash ',name *instructions*) (lambda ,args (progn ,@body))))) (defmacro defprim (name op) `(definst ,name () (setf rt (,op ra rb))))
Next we have our list of primitives:
(defprim add +) (defprim sub -) (defprim mul *) (defprim div /) ;; divu (defprim rem mod) ;; remu (defprim cmpeq =) (defprim cmpne /=) (defprim cmplt <) (defprim cmple <=) (defprim cmpgt >) (defprim cmpge >=) (defprim and logand) (defprim or logior) (defprim xor logxor)
And our instructions:
;; ltu leu gtu geu (definst addi (i) (setf rt (+ ra i))) (definst muli (i) (setf rt (* ra i)))
And voila, basic RISC:
(hakmem-instruction-set)
((MULI . #<FUNCTION (LAMBDA (I)) {120C4329AB}>) (ADDI . #<FUNCTION (LAMBDA (I)) {120C4329FB}>) (XOR . #<FUNCTION (LAMBDA ()) {120C4320BB}>) (OR . #<FUNCTION (LAMBDA ()) {120C4320EB}>) (AND . #<FUNCTION (LAMBDA ()) {120C43211B}>) (CMPGE . #<FUNCTION (LAMBDA ()) {120C43214B}>) (CMPGT . #<FUNCTION (LAMBDA ()) {120C43217B}>) (CMPLE . #<FUNCTION (LAMBDA ()) {120C4321AB}>) (CMPLT . #<FUNCTION (LAMBDA ()) {120C4321DB}>) (CMPNE . #<FUNCTION (LAMBDA ()) {120C43220B}>) (CMPEQ . #<FUNCTION (LAMBDA ()) {120C43223B}>) (REM . #<FUNCTION (LAMBDA ()) {120C43226B}>) (DIV . #<FUNCTION (LAMBDA ()) {120C4322BB}>) (MUL . #<FUNCTION (LAMBDA ()) {120C4322EB}>) (SUB . #<FUNCTION (LAMBDA ()) {120C43231B}>) (ADD . #<FUNCTION (LAMBDA ()) {120C43234B}>))