S:" \t" /space L:"({[;\n" /left R:")}]" /right A:"'/\\" /adverb V:"+-*%&|<>=^!~,#_$?@." /verb /class C:(S,L;R;V;A;".",_,/65 97+\:!26 "0123456789"),"\"`-:e\\/\n" C:@[;;]/[-1+&256;0+C;!#C]@0+ q:*'1_"\n"\:t T:t[;0]?/:1_'t:1_'(&"\n"=t)^t w:{q@0{T[x;y]}\C x} s:"-^"@10>q?
;)+'a0q`-:e\/n ;;)+'a0q`-+a'/; );)+'a0q`++a''; +;)+'a0q`-+a''; ';)+'a0q`-:a''; a;)+'bbq`++b''; 0;)+'11q`+:e''; qrrrrrrtrrrrsrr `;)+'b0q`++b''; -;)+'a1q`-+a''; /ccccccccccccc; cccccccccccccc; :;)+'a0q`-+a''; b;)+'bbq`++b''; 1;)+'11q`++e''; e;)+'11q`1+1''; rrrrrrrtrrrrsrr srrrrrrrrrrrrrr t;)+'a0q`++a'';
phi:(1+%5)%two:200e-2 /golden ratio
This is an example of a tokenizer with a state transition matrix.
The left side defines the code. It sets up a lookup table C
that maps each input character to its token class.
The center column is the state transition matrix encoded in a string. It is assigned to the variable t
before running the program.
0{T[x;y]}\C x
translates the input string x
and collects all states.
Finally the string is split into tokens where the states are >= 10.
The code, the transition matrix and the input string can be edited. Press run to recalculate and next to step through the input.