ktye/k ktye.github.io/kdoc.htm + flp add ' ech both bin lin - neg sub / ovr/fix echright * fst mul \ scn/fix eachleft % sqr div / join decode ! til key mod \ split encode & wer min $[a;b;...] cond | rev max while[c;a;b;d;e;..] < asc les f:{x+y} [bl;o;ck] > dsc mor "abc" c = grp eql 01234567 1 2 3 i ~ not mtc :+-*%&|4 5 6. f, enl cat <>=~!,^#2a300 z^ srt cut _$?@. (1;2 3) L # cnt tak `a`b!5 6 D _ flr drp t,d t,t t,'t join $ str cst k!t key ? unq fnd in k?t group @ typ atx @[x;i;+;y] amend . val cal .[x;i;+;y] dmendabs sin cos exp logfindangleimag conjtypes:cisfzLDTvcdlx?n(uniform) ?-n(normal) ?z(bi)n?n(with) random -n?n(w/o)
pd.k compiles k/os to pdp11 assembly instructions.
the compiler translates an intermediate form of k's source to machine code.
the intermediate form is given as a k table: +`t`p`i`s!(T;P;I;S)
.
907_+`id`t`p`i`s!(!#T;T;P;I;S) id t p i s this is the k-table representation ----------------- of function tp(x) which returns the 907 fun 0 1 tp type of k objects. 908 arg 907 0N k 909 sym 908 0 x in go this is: 910 res 907 0N i 911 ast 907 0N func tp(x uint64) int32 { 912 ret 911 0N i return int32(x >> 59) 913 cst 912 0N i } 914 typ 913 0N k 915 shr 913 2 k 916 cst 915 0N k 917 typ 916 0N k 918 get 916 0N x 919 lit 915 56 kit's relatively easy to query the table, e.g. listing the most used literals of type i:
|^#'=I@&(S=`i)&T=`lit 0 |236 1 |173 8 |89 16 |83 4 |73 2 |45 15 |20 ..or calculate the ratio between function calls and definitions:
(#&T=`cal)%#&T=`fun 7to save code, this suggests that the calling convention should minimize instructions on the call side but can spend some more for the callee.
the pdp11 is a 16bit machine while k uses mainly 32bit integer operations but stores k values in 64bit integers.
there are 8 registers:
r0 r1 r2 r3 | accumulator for 64 bit values |
r4 | return address in microcode |
r5(fp) | stack frame for function calls |
r6(sp) | stack pointer |
r7(pc) | instruction pointer |
of which we use the first 4 as the stack top for both 32bit and 64bit integers.
since 32bit operations need multiple 16bit instructions, we combine most operations in microcode stored at a fixed address.
for the caller this needs 2 words jsr pc,#addr
.
the microcode stores the return address in r4, does it's operation and jumps back.
as opposed to function calls, microcode generally has a stack effect.
as an example, this is the code for adding both 32 and 64bit integers, assuming x is in r0..r4 and y the top 4 words of the stack: jsr pc,#addr
. the code at #addr
is:
mov (sp)+, r4 12604 add (sp)+, r0 62600 adc r1 5501 adc r2 5502 adc r3 5503 add (sp)+, r1 62601 adc r2 5502 adc r3 5503 add (sp)+, r2 62602 adc r3 5503 add (sp)+, r3 62603 jmp r4 104
with 16bit words we can address 64kb or 32 kilo words. this is not enough for both code and data. the pdp11/45 has an operation mode that splits code and data which effectively doubles addressable memory.
the code size of kos is 31626 words and just fits into the address space leaving another 64kb for data.
as a comparison, the same code compiles to 24478 bytes of wasm (37%).
to simplify the excercise, kos/2024 removed all floating point operations, which is indicated by thruck strough sections in the reference.
the stack lives in the data section. k reserves a bucket within it's memory allocator for the cpu stack, that is not touched by k directly.
the stack space is 1kb below 11776 growing downwards.
most stack operations use 4 words at a time, storing the first 4 registers. since calling push/pop microcode uses 2 words already and it's a common operation, they are included in other microcodes where needed. e.g. any leaf node like get-local or literal needs to push the registers first, and all dyadic functions pop the second argument from the stack updating the first argument in the registers.
the calling convention is arranged to use only two words for the caller and does most work within the function pro and epilogue.
in pd.k the function call is generated by:
cal:{(,/y),4727,(afun S x)} /y:evaluated child nodes, x:node index
it evaluates the child nodes (function arguments) first then jumps to the function address that is calculated at linkage.
function arguments are evaluated sequentially, used for debugging, comparing function calls and arguments between the pdp11 and the go version.
the callee finds the last function argument in the registers r0..r3, the others are on the stack in reverse order.
the callee makes space for local variables on the stack and stores the old frame pointer.
the new fp points to the stack below the args and local variables.
as the first argument is in the registers, it will be pushed by any leaf node and can then be referenced by -8(fp)
caller count stack push x y, z on r0..r3 (words) y x jsr pc,#f 2 ret addr 4727 -f callee mov fp,-(sp) 1 +-> old fp 10546 sub 4+8*#loc,sp 2 | gap(align) 162706 (4+8*#loc) | [locs] mov sp,fp 1 | ^-fp 10605 (body) | z (body pushes z) mov fp,sp 1 10506 add (4+8*#loc),sp 2 sp 62706 (4+8*#loc) mov (sp)+,fp 1 12605 mov (sp)+,r4 1 12604 add 8*-1+#args,sp 2 62706 (8*-1+#args) jmp r4 1 104 total: caller(2) callee(12) access: arg0 z -8(fp) argi y(1),x(2).. 8*i+nloc(fp) loci a(0),b(1).. 8*1+i(fp)
while functions always correct the stack pointer on exit, it is still necessary to keep the stack in balance. e.g. niladic functions need to push the registers before the call, otherwise the first leaf node might overwrite the callers last argument. similar for functions that don't return a result: it still needs one pop after the call, or the stack grow's within a loop.
indirect function calls calculate which function to call at runtime.
the compiler creates code that evaluates the arguments first, then calculates the function index and jump to 1004 where the function table starts.
cli:{$[1~#y;4727 -99999;,/1_y],(*y),4727 1004} /indirect call
the function table header calculates the jump position. it stores the index in r4 and pops the first argument to the registers. it also moves the return address to the top stack position where the callee expects it. then the index is multiplied by 4 and added to the instruction pointer.
1004: mov r0, r4 10004 /store function index mov 2(sp), r0 16600 2 /pop first argument mov 4(sp), r1 16601 4 mov 6(sp), r2 16602 6 mov 10(sp), r3 16603 10 mov (sp), 10(sp) 11666 10 /move return address add #10, sp 62706 10 /shrink stack asl r4 6304 /multiply index asl r4 6304 add r4, pc 60407 /index function table jmp #f0 104 .. /jump to function jmp #f1 104 ..
the intro is followed by jump instructions, one for each entry in the function table.
for the callee everything is in the same position as in a direct function call.
cnd:{(iff;ife)[3~#y].(x;y)} iff:{$[(T 1+x)?`bnd`bor;(*y),5700 1401 402;(-4_*y)],62707,(O@2*#y 1),(y 1),po} ife:{$[(T 1+x)?`bnd`bor;(*y),5700 1401 402;(-4_*y)],62707,(O@2*#b),(b:(y 1),62707,O@2*#y 2),(y 2),po}as an example this is a function with a simple conditional:
func f(x int32) int32 { if x > 3 { return 2*x } return x }the function body compiles to:
jsr pc, #2512 4727 2512 /get x jsr pc, #1116 4727 1116 /push also zeros r0..r3 mov #3, r0 12700 3 /literal 3 jsr pc, #2400 4727 2400 /compare x and y bgt 1046 3002 /branch if greater (jump over next instr) add #24, pc 62707 24 /otherwise: jump over if block jsr pc, #1116 4727 1116 /if block.. mov #2, r0 12700 2 jsr pc, #2512 4727 2512 jsr pc, #1760 4727 1760 add #10, pc 62707 10 /return (jump to epilog) jsr pc, #1142 4727 1142 /pop jsr pc, #2512 4727 2512 /get x
switch handles only sequential case expressions 0,1,2.., commonly used to switch over k types. non-matching cases are handled by the default block.
they are compiled by:swc:{b:(1_y),\-55555 0;brk@-2_,/(!0 (*y) /switch-expr in r0 22700,O@-1+#b /cmp #(n-1),r0 101402 62707,O@2*#*|b /jump over default *|b /default 70027 4 10100 60007 /mul #4,r0;mov r1,r0;add r0,pc (jump to case offset) ,/62707,/'O(4*|-1_!#b)+2*-2_0,+\#'b /branch offsets ,/-1_b /cases po)} /pop switch expr
every case gets an autmatic break statement appended. the default branch is handled first and needs a conditional. the remaining cases are implemented as a jump table.
break (and continue) placeholders are replaced by
brk:{j:(#x)-2+i:&x=-55555;x[i]:62707;x[1+i]:O 2*j;x} /replace -55555 0(break) with add #n,pc cnt:{j:2+i:&x=-44444;x[i]:162707;x[1+i]:O 2*j;x} /replace -44444 0(continue) with sub #n,pc
for:{(lop;slp)[I x].(x;y)} /slp:do-while; lop:while-do lop:{p:$[#*y;po;!0];(cnt brk $[#*y;(y 0),5700 1002 62707,(O@2*#b);!0],b:(p,,/|1_y),-44444 0),p} slp:{c:(*y),5700 1402 162707;b:po,cnt brk(-2_y 2),y 1; pu,b,c,O 2+2*(#b)+#c}
io routines include reading a line from the terminal and printing output.
they are:read: 12604 11600 10001 62706 10 105727 177560 100375 112721 177562 1375 160100 5400 5300 5001 104 write:12604 11601 62706 20 5700 1404 112127 177566 77003 5000 5001 104
io in pdp11 is memory mapped, that means moving data to/from a special memory location is all that needs to be done. tstb 177560 returns if input is available, moving a byte from 177562 reads a character.
these are implemented as hot loops, but the emulator stops and waits for input on the test instruction. the emulator continues when a newline is present.
the compiler generates instructions as 16bit octal integers disguised as normal k ints, which are 32 bit. the domain ranges from 0 to 177777 (65k) all unsigned, which means negative integers are outside the domain of the code.
we use negative integers as unresolved addresses which are fixed during the last step.
e.g. small negative numbers are function addresses, -1 is the first function (0 is main, never called).
other negative numbers have a fixed meaning and are replaced by built-in microcode, such as push:-99999, get-literal:-77000, memcpy:-55000, add:-50000, etc.
the emulator is written in javascript. it's not a general pdp11 emulator, but grew along with the compiler and just implements what is needed.
kos has been manually booted. the code was loaded at 1000 and executed until the repl was about to read.
at this time all memory content was downloaded to an image k.img that is now loaded every time the page loads, ready to evaluate k input from the terminal.
it has 3 speeds: one instruction per second, one per animation frame, and unrestricted, as fast as possible without animation.
after printing the result, it shows an instruction histogram.
the first two also make noise (or music, that depends). they set the frequency according to the current instruction in octal scaled by 0.334*22050/0177777.
special recognition has the clz microcode in the allocator to calculate the bucket size.