compiling to webassembly

webassembly is a compact binary instruction set with a js api to instantiate and interact with the program and data, but it lacks a way to create binaries from source in js directy. even it's text format needs external tools (e.g. wat2wasm) to create binaries.

we'll write a small compiler that runs directly on the client (e.g. on this page).

wasm encoding

the ast of wasm functions is composed by expressions and structural control flow. it uses local variables instead of registers and heap memory. the execution stack is not directly accessible and no pointers into it can be passed to other functions like in c.

wasm encodes the expressions in postfix notation with a stack model, but does not contain stack operations (like swap or duplicate). most operations are encoded with a single byte that take one or two arguments from the stack and push back the result, e.g. add is 0x6a for int32.

wasm has 4 base types for arithmetic operations (32 and 64 bit int and float) but has also heap access for smaller types.

the text format comes in two forms: nested and linear. e.g. x+y can be written as
(i32.add (local.get 0) (local.get 1))      or       local.get 0
                                                    local.get 1
						    i32.add
the binary encoding is 0x2000 0x2001 0x6a.  0x20 is local.get, followed by an immediate, the operation 0x6a is at the end.

we see that the linear format is closer to the binary encoding but sometimes a nested ast is more appropriate as an intermediate form.

functional ast

instead of representing the ast as a nested list, we describe it by nested function calls in javascript:
i32add(get(0), get(1))
each function writes it's operation to the output but ignores the input. since the arguments are evaluated before the function is entered, this does an automatic conversion to postfix. the method only works for host languages that have a fixed evaluation order, e.g. not for c.

the functions can be defined as:
let o=[]  //output array
let get   = x   =>o.push(0x20, x)
let i32add=(x,y)=>o.push(0x6a)     //x and y are evaluated but ignored

type inference

wasm is strictly typed. e.g. 0x6a expects two int32 on the stack. this is known and checked at compile time.

but there are similar operations for int64 and 32/64 bit floating point types, so instead of using 4 different notations for addition, we can use a single function that tracks the input type and selects the right opcode.

let add=(x,y)=>(o.push([0x6a,0x7c,0x92,0xa0][x]),x)
now each function receives type arguments, pushes it's operation to the output and returns the result type. the type is the value 0..4 which is used as an index to select the corresponding opcode.

with this method, the binary is written as a side effect while computing the node types.

these are the definitions for all arithmetic functions:
let o=[],v={},p=(...x)=>o.push(...x),
f=x=>(...y)=>(p(255&x>>8*y[0]),y[0]),
[ez,    eq,        ne,        lt,        lu,    gt,        gu,    le,        ge,        cz,    ct,    cx,   ]= 
[0x5045,0x615b5146,0x625c5247,0x635d5348,0x5449,0x645e554a,0x564b,0x655f574c,0x6660594e,0x7a67,0x7968,0x7b69].map(f),
[ad,        su,        mu,        di,        du,    mo,    rm,    an,    or,    xo,    sl,    sa,    sr,    rl,    rr,    dr,        se,        re        ]=
[0xa0927c6a,0xa1937d6b,0xa2947e6c,0xa3957f6d,0x806e,0x816f,0x8270,0x8371,0x8472,0x8573,0x8674,0x8775,0x8876,0x8977,0x8a78,0x1a1a1a1a,0x1b1b1b1b,0x0f0f0f0f].map(f),
[ ab, ng, ce, fl, tr, na, sq, mi, ma, cs]=[0x998b,0x9a8c,0x9b8d,0x9c8e,0x9d8f,0x9e90,0x9f91,0xa496,0xa597,0xa698].map(x=>f(x<<16))

control flow

the above method works for most arithmetic opcodes, but there is a problem with control flow: e.g. a while loop takes a conditional and a body but needs to emit some code before, between and after the arguments, which does not work, if the arguments are evaluated first and write directly to the output:
 loop=(condition,body)=>...       //this does not work.
 ifelse=(condition,x,y)=>...
we can fix this by deferring the arguments that have to be evaluated later:
 cn=_=>(p(4,64),_=>(p(5),_=>p(11)))
 lo=_=>(p(2,3),_=>(p(69,4,64),_=>p(14,0,11,11)))
cn (the conditional if-else) emits block code and returns a function for each step. it is called like this:
 cn(condition)(ifbody)(elsebody)
lo (while loop) is also a nested call of 3 functions but it needs to emit code before everything else, so the first call take no arguments:
 lo()(condition)(body)

module

the full module needs to be encoded in sections, with function signatures, function code, data blocks, export names, globals, etc.. this is done by the function wa() with the following code:
sp=(x,y)=>x.split(y?":":""),
ns=x=>[...eu(x.length),...sp(x).map(x=>x.charCodeAt(0))],
ty=x=>[0,127,126,125,124][1+x],
tp=x=>sp("i:j:e:f",1).indexOf(x),
fn=(x,...y)=>(x={n:x,s:((y=y[y.length-1])>=0?y:"")+":"+Object.values(v).filter(x=>x.a).map(x=>x.t).join(""),c:o,l:v},o=[],v={},x),
gl=(x,y)=>(x={n:x,t:y,c:o},o=[],x),
ws=(x,y)=>(y=[...eu(y.length),...y.flat()],p(x,...eu(y.length),...y)), //section
wx=x=>{},                                            //exports..
wa=(...x)=>{let r,d=x.filter(x=>x.length),           //data
 f=x.filter(x=>x.s),                                 //funcs
 F=f.filter(x=>x.c),                                 //funcs without imports
 s=f.map(x=>x.s).filter((x,i,a)=>i==a.indexOf(x)),   //signatures
 n=f.map(x=>x.n),                                    //names
 g=x.filter(x=>x.t),                                 //globals
 v=f=>(f=Object.values(f.l).filter(x=>!x.a).map(x=>[1,ty(x.t),...x.i]),[...eu(f.length),...f.flat()])
 o=[0,97,115,109,1,0,0,0]
 ws(1,s.map((x,r,a)=>([r,a]=sp(x,1),[96,a.length,...sp(a).map(x=>ty(Number(x))),r.length,...sp(r).map(x=>ty(Number(x)))])))
 ws(2,f.filter(x=>!x.c).map(x=>[1,97,...ns(x.n),0,...eu(s.indexOf(x.s))]))  //imports
 ws(3,F.map(x=>eu(s.indexOf(x.s)))) //signature index list
 p(5,3,1,0,1)                       //memory 1seg,unshared,1block
 ws(6,g.map(x=>[ty(x.t),1,...x.c]))
 ws(7,[[...ns("memory"),2,0],...F.map((x,i)=>[...ns(x.n),0,...eu(i)])]) //export memory&all functions
 ws(10,F.map((x,y)=>((y=[...v(x),...x.c,11]),[...eu(y.length),...y])))
 ws(11,d.map(x=>[0,65,...eu(x[0]),11,...ns(x[1])]))
 r=new Uint8Array(o);o=[];v={};return r}

compile on line in time:

you can edit the input code below. compiling also tries to instantiate the resulting binary as a webassembly module and reports any errors if that fails.
wa(fn("add",pa("x","i"),pa("y","i"),ad(va("x"),va("y"))))