Paste number 4743: assembler supporting PUSH POP JMP CALL RET MOV and JUMPcc conditional jumps

Paste number 4743: assembler supporting PUSH POP JMP CALL RET MOV and JUMPcc conditional jumps
Pasted by: slava
When:9 years, 7 months ago
Share:Tweet this! | http://paste.lisp.org/+3NR
Channel:#concatenative
Paste contents:
Raw Source | XML | Display As
! A postfix assembler.
!
! x86 is a convoluted mess, so this code will be hard to
! understand unless you actually know the instruction set.
!
! The general format of an x86 instruction is:
!
! - 1-2 bytes: opcode. if the first byte is 0x0f, then opcode is
! 2 bytes.
! - 1 byte (optional): mod-r/m byte, specifying operands
! - 1/4 bytes (optional): displacement
! - 1 byte (optional): scale/index/displacement byte
! - 1/4 bytes (optional): immediate operand
!
! mod-r/m has three bit fields:
! - 0-2: r/m
! - 3-5: reg
! - 6-7: mod
!
! If the direction bit (bin mask 10) in the opcode is set, then
! the source is reg, the destination is r/m. Otherwise, it is
! the opposite. x86 does this because reg can only encode a
! direct register operand, while r/m can encode other addressing
! modes in conjunction with the mod field.
!
! The mod field has this encoding:
! - BIN: 00 indirect
! - BIN: 01 1-byte displacement is present after mod-r/m field
! - BIN: 10 4-byte displacement is present after mod-r/m field
! - BIN: 11 direct register operand
!
! To encode displacement only (eg, [ 1234 ] EAX MOV), the
! r/m field stores the code for the EBP register, mod is 00, and
! a 4-byte displacement field is given. Usually if mod is 00, no
! displacement field is present.

: byte? -128 127 between? ;

GENERIC: mod ( op -- mod )
GENERIC: register ( op -- reg )
GENERIC: displacement ( op -- )

( Register operands -- eg, ECX                                 )
: REGISTER:
    CREATE dup define-symbol
    scan-word "register" set-word-property ; parsing

REGISTER: EAX 0
REGISTER: ECX 1
REGISTER: EDX 2
REGISTER: EBX 3
REGISTER: ESP 4
REGISTER: EBP 5
REGISTER: ESI 6
REGISTER: EDI 7

PREDICATE: word register "register" word-property ;

M: register mod drop BIN: 11 ;
M: register register "register" word-property ;
M: register displacement drop ;

( Indirect register operands -- eg, [ ECX ]                    )
PREDICATE: list indirect
    dup length 1 = [ car register? ] [ drop f ] ifte ;

M: indirect mod drop BIN: 00 ;
M: indirect register
    car register dup BIN: 101 = [
        "x86 does not support [ EBP ]. Use [ EBP 0 ] instead."
        throw
    ] when ;
M: indirect displacement drop ;

( Displaced indirect register operands -- eg, [ EAX 4 ]        )
PREDICATE: list displaced
    dup length 2 = [
        2unlist integer? swap register? and
    ] [
        drop f
    ] ifte ;

M: displaced mod cdr car byte? BIN: 01 BIN: 10 ? ;
M: displaced register car register ;
M: displaced displacement
    cdr car dup byte? [ compile-byte ] [ compile-cell ] ifte ;

( Displacement-only operands -- eg, [ 1234 ]                   )
PREDICATE: list disp-only
    dup length 1 = [ car integer? ] [ drop f ] ifte ;

M: disp-only mod drop BIN: 00 ;
M: disp-only register
    #! x86 encodes displacement-only as [ EBP ].
    drop BIN: 101 ;
M: disp-only displacement
    car compile-cell ;

( Utilities                                                    )
UNION: operand register indirect displaced disp-only ;

: mod-r/m ( mod reg r/m -- )
    >r 3 shift swap 6 shift bitor r> bitor compile-byte ;

: 1-op/1-byte ( reg n -- )
    #! Some instructions encode their single operand as part of
    #! the opcode.
    >r register r> + compile-byte ;

: 1-op/mod-r/m ( op reg -- )
    >r dup mod over register r> swap mod-r/m displacement ;

: 2-op/mod-r/m ( src dst op -- ? )
    #! Sets the opcode's direction bit. It is set if the
    #! destination is a direct register operand.
    over register? [ BIN: 10 bitor ] [ >r swap r> ] ifte
    compile-byte register 1-op/mod-r/m ;

: fixup ( -- addr )
    #! After compiling a jump, this returns the address where
    #! the branch target can be written.
    compiled-offset 4 - ;

( Opcodes                                                      )
GENERIC: PUSH ( op -- )
M: register PUSH HEX: 50 1-op/1-byte ;
M: integer PUSH HEX: 68 compile-byte compile-cell ;
M: operand PUSH HEX: ff compile-byte BIN: 110 1-op/mod-r/m ;

GENERIC: POP ( op -- )
M: register POP HEX: 58 1-op/1-byte ;
M: operand POP HEX: 8f compile-byte BIN: 000 1-op/mod-r/m ;

GENERIC: JMP ( op -- )
M: integer JMP HEX: e9 compile-byte compile-cell ;
M: operand JMP HEX: ff compile-byte BIN: 100 1-op/mod-r/m ;

GENERIC: CALL ( op -- )
M: integer CALL HEX: e8 compile-byte compile-cell ;
M: operand CALL HEX: ff compile-byte BIN: 010 1-op/mod-r/m ;

: JUMPcc ( addr opcode -- )
    HEX: 0f compile-byte  compile-byte  compile-cell ;

: JO  HEX: 80 JUMPcc ;
: JNO HEX: 81 JUMPcc ;
: JB  HEX: 82 JUMPcc ;
: JAE HEX: 83 JUMPcc ;
: JE  HEX: 84 JUMPcc ;
: JNE HEX: 85 JUMPcc ;
: JBE HEX: 86 JUMPcc ;
: JA  HEX: 87 JUMPcc ;
: JS  HEX: 88 JUMPcc ;
: JNS HEX: 89 JUMPcc ;
: JP  HEX: 8a JUMPcc ;
: JNP HEX: 8b JUMPcc ;
: JL  HEX: 8c JUMPcc ;
: JGE HEX: 8d JUMPcc ;
: JLE HEX: 8e JUMPcc ;
: JG  HEX: 8f JUMPcc ;

: RET ( -- ) HEX: c3 compile-byte ;

! MOV where the src is immediate.
GENERIC: (MOV-I) ( src dst -- )
M: register (MOV-I) HEX: b8 1-op/1-byte  compile-cell ;
M: operand (MOV-I)
    HEX: c7 compile-byte  0 1-op/mod-r/m compile-cell ;

! MOV where the src is not immediate.
: (MOV) ( src dst -- ) HEX: 89 2-op/mod-r/m ;

: MOV ( src dst -- ) over integer? [ (MOV-I) ] [ (MOV) ] ifte ;

This paste has no annotations.

Colorize as:
Show Line Numbers

Lisppaste pastes can be made by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively.