Introduction, or: Why I Don’t Like Your Assembler
As avid readers may have noticed, I like assembly programming. I’ve several times tried to branch into other consoles and systems like the Atari ST, Game Boy, Sega Genesis/Mega Drive, Amiga, etc.
All these endeavors all ended in frustration. Most of the time, for the same reason: I found the available tools to be inadequate to my needs. This is not to say that they’re bad. They’re well designed and for the most part, do what they advertise they do - assemble code for cross-platform development.
But I find myself longing for the simple assembler-linker combination cc65 offers. I think .include
and .org
instructions (and their cousins) are horrible; they make modularity harder to achieve, prevent you from taking advantage of concepts like relocatable code, re-entrant subroutine design, etc.
I think assembler-linker combinations are preferable to assemblers that obfuscate code by introducing assembler directives into the code that do not affect the functionality of the code at all - you don’t write your makefile into your C/C++ headers, so why is it okay to do so with assembly code?
The second problem is, that I like to automate my builds. Be it with make, CMake, or -best of all- tup, I want to set up my project once and know it will build (as long as my code is correct, that is). There’s no black sorcery involved in turning source files into linkable objects (allegedly). Which is to say, I want a linker. It is not my job as a programmer to correctly resolve labels, variable addresses, subroutine calls, return addresses, etc. That’s exactly why we have linkers.
The cc65 toolchain mostly solves this problem - alas, it only supports 6502 and 65816. But I need Z80 (and hopefully 68k) support. Fellow assembly programmers may not feel as strongly about the issues mentioned above. But I do. I will no longer suffer the tyranny of single namespaces and fixed addresses.
Now, if only there was a way for people to make their own assembler and linker (with Black Jack and hookers)…
Wyvern: The Modular Assembler
So, I’m going to write my assembler and linker. Since the market for new assemblers is kind of small, I’ll be arrogant enough to declare this to be a new type of assembler: the modular assembler!
What is a modular assembler? A modular assembler does not allow you to use .include
and .org
(I really dislike those). But kidding aside, the idea is pretty simple: Each source file represents a module (or translation unit, if you like). A module, in turn, creates its own environment.
An environment is a sequence of frames. Frames are bindings between names and values - translated to assembly, we’d call names labels (or symbols sometimes), and the values they’re associated with either constants or addresses (which essentially are the same thing; in the final step of assembling/linking, names/labels/symbols are replaced by the value they represent).
Now, what is the practical implication of this? The paragraph above is just a fancy way of saying that labels/symbols are unique to their respective frame, aka, you can reuse names within your assembly code. Let me illustrate that with an example.
Suppose you want to write a small math library like this (code modified and taken from Neil Parker):
.proc Multiply:
LDA #0 ; Initialize RESULT to 0
STA RESULT+2
LDX #16 ; There are 16 bits in NUM2
L1: LSR NUM2+1 ; Get low bit of NUM2
ROR NUM2
BCC L2 ; 0 or 1?
TAY ; If 1, add NUM1 (hi byte of RESULT is in A)
CLC
LDA NUM1
ADC RESULT+2
STA RESULT+2
TYA
ADC NUM1+1
L2: ROR A ; "Stairstep" shift
ROR RESULT+2
ROR RESULT+1
ROR RESULT
DEX
BNE L1
STA RESULT+3
.endproc
.proc Divide:
LDA #0 ; Initialize REM to 0
STA REM
STA REM+1
LDX #16 ; There are 16 bits in NUM1
L1: ASL NUM1 ; Shift hi bit of NUM1 into REM
ROL NUM1+1 ; (vacating the lo bit, which will be used for the quotient)
ROL REM
ROL REM+1
LDA REM
SEC ; Trial subtraction
SBC NUM2
TAY
LDA REM+1
SBC NUM2+1
BCC L2 ; Did subtraction succeed?
STA REM+1 ; If yes, save it
STY REM
INC NUM1 ; and record a 1 in the quotient
L2: DEX
BNE L1
.endproc
This code will assemble with no problem with cc65. Why do the duplicate labels L1
and L2
not cause any problems? Because they each form a frame. The instruction pair .proc
and .endproc
cause the assembler to create a new scope for the subroutine so that all labels used within it do not clash with those of the same name (and do not cause redefinition problems).
So, in Wyvern, the same code may look like this:
;;; File: math.asm
module Math ( Multiply, Divide ) where
.proc Multiply:
LDA #0 ; Initialize RESULT to 0
STA RESULT+2
LDX #16 ; There are 16 bits in NUM2
L1: LSR NUM2+1 ; Get low bit of NUM2
ROR NUM2
BCC L2 ; 0 or 1?
TAY ; If 1, add NUM1 (hi byte of RESULT is in A)
CLC
LDA NUM1
ADC RESULT+2
STA RESULT+2
TYA
ADC NUM1+1
L2: ROR A ; "Stairstep" shift
ROR RESULT+2
ROR RESULT+1
ROR RESULT
DEX
BNE L1
STA RESULT+3
RTS
.endproc
.proc Divide:
LDA #0 ; Initialize REM to 0
STA REM
STA REM+1
LDX #16 ; There are 16 bits in NUM1
L1: ASL NUM1 ; Shift hi bit of NUM1 into REM
ROL NUM1+1 ; (vacating the lo bit, which will be used for the quotient)
ROL REM
ROL REM+1
LDA REM
SEC ; Trial subtraction
SBC NUM2
TAY
LDA REM+1
SBC NUM2+1
BCC L2 ; Did subtraction succeed?
STA REM+1 ; If yes, save it
STY REM
INC NUM1 ; and record a 1 in the quotient
L2: DEX
BNE L1
RTS
.endproc
And to use this code in another file/module, you’d start that module like this:
;;; File: Graphics.asm
module Graphics where
import Math
import Memory
; some code...
; now use Multiple from Math
JSR Multiply
; or use the fully qualified name
JSR Math.Multiply
; more code...
Hence, the modular assembler.
I believe this is preferable to simply designing code with unique labels. The programmer can more clearly convey their intentions on what the code at hand is supposed to achieve. It makes it easier to extend and integrate existing code without fear of a “polluted namespace”.
The discerning Haskeller will notice a striking resemblance to Haskell modules. Coincidence? We’ll never know.
Why Haskell Racket
A functional language like Haskell is the best fit for a project like this. Parsing and interpreting is a major part of Haskell development - there are more than 200 libraries tagged parsing
that can be found. I will most definitely not rewrite it in Rust (Rust is not a real programming language).
I was going to use Haskell, but being a Schemer, I thought trying out a Lisp with a typing system would be fun. [Racket][racket] fits the bill here. It comes with an [expansive documentation][racketdoc]. I want to depend as little as possible on existing code so I think a Lisp dialect will be easier to follow. Despite what detractors say, Lisp is very readable and the syntax allows zero ambiguity.
Join me next time, when we discuss the three main parts of the assembler.