a simplified k - introduction to atw/c

expand (click on a line to show explanations) back to a/

a.c
#include"a.h"//fF[+-!#,@] atom/vector 1byteint 1bytetoken  no(memmanage parser tokens ..)
+-!#,@ are are the list of primitives that are implemented in simple/k
fF means for both monadic and dyadic cases
the implementation contains atoms and vectors
but only for one data size: all values are one integers, e.g. -128..127
tokens are limited to single digit numbers as well
a true implementation needs memory management (allocate/free/reuse..)
this is not included here. vectors are allocated but never freed.
k/simple includes a parser and tokenizer in a basic form.

#define r(n,e) _(u r=a(n);i(n,ri=e)r)
r is defined as a macro.
it returns a vector of length n and assigns the expression e to each element.
a creates the result array and ri=e assigned the e to the element r[i]
f(w,write(1,ax?(u)&x:x,ax?1:strlen(x)))
f(w,..) defines the monadic function w(x).
w uses the unix system call (see "man 2 write") and writes to stdout.
strlen is a libc function that clang recognizes as a builtin.
it returns the length of the string by counting until a 0 byte occurs in the data.
strings in c are terminated by the sential value 0. they do not contain a length.
but this is just a convention. we later define arrays, storing length and data.

w uses ax to check if the argument is an atom.
for atoms it calls write passing a pointer to the argument &x and the length 1.
for vectors it passes x which in this case is interpreted as a pointer as it is.
c b[12];
b is an array of type c with length 12. c definitions are type name.
the length comes last and is not part of the type.
b is only used locally as a buffer by the next function
but is defined as a global variable.
f(si,sprintf(b,"%d ",128>x?x:x-256);b)
si(x) prints/formats the number x into the buffer b including a space
used for printing vectors (or atoms where we don't see it).
the argument x is an unsigned integer but is used to format signed integers.
the max value for the domain is 127 and we get negative number with x-256.
si returns the buffer because the next function calls it as w(si(x)) and w expects
the buffer as an argument.

f(wi,w(si(x)))
wi writes an integer, by calling w(si(x))
f(w_,$(ax,wi(x))i(nx,wi(xi))w(10))
w_ writes a k value (atom or vector).
ax tests if x is an atom calling wi directly, otherwise it calls
wi in a loop for each element xi.
the conditonal form $(a,b)c is defined as a macro $
it is very similar to the ternary operator that c already has: a?b:c
the difference is that in a ternary operator a b and c must be expressions.
the operator itself is also an expression, it has a value.
but the loop i(nx,..) is a block and cannot be used
in a ternary expression.

F(err,w(f);w(58);w(x);w(10);96)
err(f,x) writes an error message. it is called by Qr Qz.
the arguments are the current function and the error string "rank" or "nyi".
the error message has the form (function):(error-type)(newline)
G(m,(u)memcpy((void*)x,(void*)y,f))
m(f,x,y) moves/copies data by calling the builtin memcopy.
it reorders the argument: the first arg is the number of elements to copy,
followed by destination and source pointer.
f(a,c*s=malloc(x+1);*s++=x;(u)s)
a(x) creates a vector. it allocates memory using the builtin malloc,
which expects a length (number of bytes).
it passes 1 element more than needed, stores the array length in the first
element and increases the pointer to the result c before returning it.
in this form c points to the data part of the vector and c[-1] is the length
usually accessed by nx
f(foo,Qz(1)0)
foo is a function that print the nyi error.
it is a placeholder within the function table for monadic primitives.
f(sub,ax?(c)-x:r(nx,-xi))
sub is arithmetic negate, a monadic function.
it negates atoms and uses r to negate all elements of a vector.
in (c)-x, (c) is a type cast. this casts the type from a longer integer to char.
both are unsigned, but the effect is that all but the lower byte remain 0 instead
of 1 because of sign extension.
f(ind,Qr(!ax)r(x,i))
ind is !x known as til x, iota, or index generation.
it uses Qr to check the rank of x and allows only atoms.
r creates a return array and assigns the loop variable i to each element.
f(cnt,Qr(ax)nx)
cnt is #x, also known as ⍴ or tally. is fails for atoms and returns the length of vectors using nx
f(cat,Qr(!ax)r(1,x))
cat is ,x the monadic form known as enlist.
it allows only atoms, creates a length-1 vector putting x inside.
F(Add,ax?af?f+x:Add(x,f):r(nx,(af?f:fi)+xi))
Add is the example for a scalar function that extends atoms to vectors for
all 4 combinations of atom/vector arguments.
it uses a nested ternary test for the arguments.
note that f is the first arg and x the second.
if x is an atom but f is a vector is switches the arguments and calls itself again.
the final expression r(nx,..) handles the part where x is a vector and iterates
over x. f is checked within the loop each time and either added directly or indexed
with the loop variable.
F(Sub,Add(f,sub(x)))
Sub is like Add but instead of implementing it again it negates the second argument
and calls Add.
F(Ind,Qr(!f)r(nx,xi%f))
Ind is f!x and takes the modulo.
f is checked to be nonzero. it can only be an atom,
since we have neither vector literals nor parens.
the right argument x is not checked and assumed to be a vector.
x is rank checked to be a vector (try 3!4 and see it crash).
F(Cnt,Qr(!af)r(f,ax?x:xi))
Cnt is take: f#x.
f is alywas atomic, x may be both.
the return array is filled in a loop using r.
F(Cat,f=af?cat(f):f;x=ax?cat(x):x;u r=a(nf+nx);m(nx,r+nf,x);m(nf,r,f))
Cat is catenation: f,x.
it rank checks both arguments and enlists each if it's an atom.
Cat returns a vector explicitly with u r=a(..) giving it the catenated length 
of both arguments.
F(At,Qr(af)ax?sf[x]:r(nx,sf[xi]))f(at,At(x,0))
At(f,x) is indexing. f must be a vector but x can be both atom or vector.
it uses sf[x] or sf[xi] to index into the array f.

watch carefully, the line also defines the monadic funcion at(x).
this is what we know as first, maybe as the value "at the location of x"
or the "head of the list". it indexes with a 0 value.
char*V=" +-!#,@";
V defines a character vector that is used by the parser.
we also see which primitives are included in the demo implementation.
u(*f[])()={0,foo,sub,ind,cnt,cat,at},(*F[])()={0,Add,Sub,Ind,Cnt,Cat,At},U[26];
this line defines two arrays of function pointers.
f holds our monadic function and F the dyadic counterparts.
the first function (index 0) is a placeholder. it is never called.
U is the storage space for variables. it holds 26 slots, one for each lower case letter.
but we cannot make use of them, because there is no assignment.
f(v,(strchr(V,x)?:V)-V)
v(x) is used by the parser for both: check if the x is a verb (a primitive function)
and convert the character x to a function value.
here we see why the function pointers start with a dummy value: a result of 0
is not a function.
the special case of the ternary expression a?:b is a shortcut for a?a:b
strchr returns a pointer to the character, not an index, so the start pointer V is
substracted to return the index. strchr returns 0 if x is not within V, so does the v.
f(n,10>x-48?x-48:U[x-97])
n(x) parses nouns, both single character numbers and lower case variable names.
the function converts the character values to numbers by substracting 48 (the ascii value for 0)
or it substracts 97(a) if the value is not within '0' to '9'.
note the special check 10>x-48. this works for unsigned comparison,
because negative numbers are large when converting to unsigned and it's shorter then the
equivalent (x>=48)&&x<48+10.

us(e,u i=*s++;v(i)?x(e(s),Q(x)f[v(i)](x)):x(n(i),*s?y(e(s+1),Q(y)F[v(*s)](x,y)):x))
e(x) is the core of the interpreter. e may be short for expression, execute or evaluate.
it parses an expression and exectues it directly, like apl.
usually k parses into a parse tree or byte code and execute in a separate step.
e takes an input string, tokenizes the first character and calls itself recursively.
it implements only the minimal monadic/dyadic infix syntax. no parens, adverbs, function, etc.
v(i) tests if the current character is a verb, if yes
it recursively parses the rest of the input. this is k/apl's left of right execution
parsing left-to-right not backwards as many feel apl is doing.
if the current character is not a verb, it is parsed as a noun n(i)
which is returned at the end of the input, or executed dyadically calling F[..] from the function table.
the dyadic case implies that a verb always follows a noun. two nouns in a row is not handled,
which usually implies dyadic @ sometimes called juxtaposition.
int main(){char s[99];while(1)if(w(32),s[read(0,s,99)-1]=0,*s)w_(e(s));}
main is the program's entry point. it defines s, a local buffer for the input characters
and goes directly into the read-eval-print-loop.
within the loop it prints k's cursor, a single space (ascii 32), reads a line, terminates the input with a 0
and w_ writes the output of the evaluated expression e.
a.h
typedef unsigned char c;typedef unsigned long u;u Q=96;
type definitions for c and u that are used throughout the code.
Q is used in the parser to indicate an error.

#define $(a,b) if(a)b;else
$ is used as $(a,b)c like a ternary expression but accepts blocks.
it is used by w_ with a loop block in the else clause.
#define ax (256>x)
ax tests if x is an atom. it is used directly as ax not like a call ax(x).
atoms are integers with a value within the range of one byte.
#define xi (nx>i?sx[i]:0)
xi indexes the implicit x with i. it also check the range using nx.
#define i(n,e) {int $n=n;int i=0;for(;i<$n;++i){e;}}
i(n,e) is the loop construct. loops are omnipresent in array languages
but they are hidden in the implementation.
n is the length and e an expression that is executed for every step.
#define _(e) ({e;})
_ puts e into a block/scope.
#define _u(f,e,x...) u f(x){return({e;});}
_u is the prototypes for other macros which defines a function that returns type u an unsigned integer.
the arguments are defined as the variadic macro x...
e.g. f uses it to define monadic functions and F for dyadic functions.
#define nx sx[-1]
nx returns the length of a vector. it's not a function but uses x explicitly.
the length is stored in the byte before the data section that x points to.
it's not allowed to use nx on atoms, it must be checked before using ax.
#define sx ((c*)x)
sx interprets x as a char pointer
#define x(a,e) _(u x=a;e)
x(a,e) assigns a to x in a local scope and executes the expression e
#define y(a,e) _(u y=a;e)
y(a,e) assigns a to y in a local scope and executes the expression e
#define f(g,e) _u(g,e,u x)
f(g,e) defines a monadic function with implicit argument x.
g is the function name, and the expression e is the function body.
an example is cnt.
#define F(g,e) _u(g,e,u f,u x)
F(g,e) defines a dyadic function with implicit arguments f and x.
g is the function name, and the expression e is the function body.
an example is Add.
#define G(g,e) _u(g,e,u f,u x,u y)
G(g,e) defines a triadic function with name g.
the arguments are f,x,y and the body e.
see e.g. m
#define us(f,e) _u(f,e,c*s)
us(f,e) defines a function which takes a char pointer as an argument and body e.
it is used only be e.
#define Q(e) if(Q==(e))return Q;
Q is used for error handling. it checks the argument and returns early if e is nonzero.
it returns Q which is not the same as the macro but the constant 96.
#define Qr(e) if(e){return err((u)__func__,(u)"rank");}
Qr returns a rank error if e is nonzero.
#define Qz(e) if(e){return err((u)__func__,(u)"nyi");}
Qz returns an nyi error (not yet implemented), if e is nonzero
#define sr x(r,sx)
sr is only used by the next macro ri
#define ri sr[i]
ri assigns r to x, treats it as a char vector and indexes the i's element.
it is used by r to assign a value to the vector at index i.

#define af x(f,ax)
af tests if f is an atom. f is the first argument for dyadic functions.
#define nf x(f,nx)
nf treats f as an array and returns it's length.
#define sf x(f,sx)
sf interprets the first function argument f as a vector,
such that it can be indexed. it is used by At.
#define fi x(f,xi)
fi accesses element i of the first argument f.
see Add

puzzle: find out why 1-2 works and returns -1 but 2-1 fails to produce the correct result.