This is the TNSL specification, a document related to the definition of the TNSL language, meta-language, it's usage, and where it's compiler may be ported. Document version (semver): 0.0.1 Main Author: Kyle Gunger License: Apache 2.0 ---------------------------------- Contents: Preamble Part 1 -- The Language 1.1: .tnsl Files 1.2: Blocks 1.3: Statements 1.4: Types 1.5: Operators 1.6: Borrow checker 1.7: Anonymous blocks 1.8: Raw and Asm Part 2 -- Related Features 2.1: Style guide 2.2: The pre-processor 2.3: Compiler options 2.4: Included tools Part 3 -- The TNSL Calling ABI 3.1: Part 4 -- Standard libs 4.1: Bare-metal 4.2: libts 4.3: Cross call libc Appendix A: Reserved Characters and their Uses Appendix B: Multi-Character Operators ---------------------------------- Preamble The past few years have seen an explosion of languages trying to break into the space C has held for so long, with low to moderate success. TNSL's primary goal is not "fix" C into some every man's language, the hope is that TNSL will be relatively nice to program in for *programmers*, and that this quality will be rewarding to learn for those new to the field. TNSL's goal is more to de-mistify some of the hard edges which make C difficult to *completely* grasp for new programmers. By making the specification and documentation open and free, we hope that new and old programmers can contribute and easily experience a nice language, so once a novice goes out into the real-world of TNSL, it won't be too far removed from the documentation of the language at its core. We also hope that TNSL will be compitant enough to be loved by veterans of the field as well, and make life slightly easier for those wierdos who still love to program without safty. Though TNSL does not completely resemble those who have inspired it, you will find that many features present in languages such as C, Golang, Rust, Java, and C++ still live on in these halls. Welcome to TNSL ---------------------------------- Part 1: The Language ---------------------------------- Section 1: .tnsl files The high-level files containing code in the TNSL language contain the .tnsl extension. Each file may contain 0 or more of the following: 1) Comments 2) Pre-processor statements 3) Variable definitions 4) Named code blocks 5) Method blocks 6) Module blocks Code blocks and method blocks are the only blocks which may contain statements (1.3) and logical blocks. ---------------------------------- Section 2: Blocks TNSL files will consist primarily of blocks both with user-defined names and language-defined keywords. Standard blocks (and their modifiers) are listed here. Each block in TNSL starts with a forward slash ( / ) followed by the character of the block. The block ends with the reverse of these characters, or at special multi-character boundaries. 1.2.1: The Comment Block "/#" starts the comment block, and "#/" ends the comment block. Any text within the block will be ignored by the pre-processor and compiler. 1.2.2: The Pre-Processor Block "/:" starts the pre-processor block, which is usually followed by a pre-processor keyword. Every line in the pre-processor block acts as if the keyword specified at the start of the block was placed at the beginning of the line. E.g. /: import 'math' "physics" :/ represents the following two pre-processor statements: : import 'math' : import "physics" ":/" ends the block. 1.2.3: The Code Block "/;" starts the code block. The code block may be followed immediately by the following qualifiers: - "loop" to represent a looping block. - "if" to represent the start of an if chain. - "if else" to represent the start of a secondary if case. - "else" to represent the fallback block of an if chain. - "match" to represent the beginning of a match (or switch) block. - "case" (inside match only) to represent the start of a case block. - "default" catchall case - "method" to represent a set of methods for use on a type. - "override" to replace an extended method with a new one. - "operator" (inside method only) to represent an operator overload. - "interface" to create a set of methods that a struct must impliment if it extends the interface. - "module" to represent a set of related functions, types, methods, and other modules. (in other languages this is sometimes referred to as a namespace) - A non-keyword consisting of unreserved characters to "name" a block as a function or method. If none of the above are put after the start of a block then the block is considered anonymous. This is discussed more in 1.8 (Anonymous Blocks) A block may also be followed by parentheses "()" which denote a list of inputs, and/or brackets "[]" which denote a list of outputs. These are discussed more in 1.5 (Types). If a loop block is followed by a set of parentheses and the last statement is an expression resolving in a boolean expression, this expression is evaluated as the exit condition for the loop. If a loop block is followed by a set of brackets, each expression is evaluated before the loop jumps back to the top (and also before the exit condition is evaluated if one exists). An if or if else block must have a set of parentheses whose final statement is an expression resolving in a boolean expression. This expression evaluated to decide if the branch is taken or not. (if the condition is true the branch is taken, if false, the branch is skipped) A match block must have a set of parentheses whose final statement is an expression which resolves in a type whose data is stored on the stack. Each case statement must have a constant value directly following. This branch of code is taken only if the parent match block's expression resolves in a value matching the case block. Default block may be used to create a catchall case (for if no other) case was used. The method block represents a set of functions related to a type, and must be followed by parentheses which contain a pointer to that type. The operator block is immediately qualified by a reserved character representing an operator, then followed by a set of parentheses representing a pointer to the type the operator can be used on. The override block may be used if overriding an extended method. The interface block acts similar to the method block, but any methods actually implimented must only reference other methods. May contain "operator" and "override" like the method block. May extended other interfaces, but not types or structs. The module keyword must be followed immediately by a name for the module using only un-reserved characters. A named or anonymous block may be followed by parentheses indicating parameters, and/or brackets indicating return type(s). This is discussed more in 1.5 and 1.8 ";/" ends the code block. 1.2.4: Block redefinition and shortcuts Due to the two-character nature of block beginnings and endings, some simple shortcuts have been devised to mitigate annoyance at re-defining or swapping between block types. ";;" closes a code block and opens a new code block. e.g. /; if ;; else ;/ "::" closes a pre-processor block and opens a new one. ";#" closes a code block and opens a comment block. "#;" closes a comment block and opens a code block. ":#" closes a pre-processor block and opens a comment block. "#:" closes a comment block and opens a pre-processor block. ---------------------------------- 1.3: Statements Statements make up the actual "doing" part of TNSL 1.3.1: Comments Comments start with the "#" character and end at the end of the line. The compiler will ignore all characters (even other reserved characters) after the # on that line. If I may be candid for a moment: Comments ( # ) should not matter in a comment block, although this is currently a tokenizer bug resulting in some weird behavior. 1.3.2: Pre-processor statements Pre-processor statements almost make up a meta-programming language (not solidified yet) which influences how code is interpreted by the compiler. Pre-processor statements start with a ":" character and end at the next line The pre-processor is discussed more in 2.2 1.3.3: Code statements Code statements begin with the ";" character and end at the next ":", ";", or block. Code statements come in four forms: 1) Definition (variable, struct, etc.) 2) Expression (assignment, increment, etc.) 3) Call (function call, block call, etc.) 4) Keyword (using a keyword to perform a task) These forms can all mix to form the final statement. 1.3.4: Keywords - "struct" to define a new struct type - "extends" to inherit methods and members from another struct or interface - "is" to check if a type extends (or equates) another type - "continue" to continue through a loop - "break" to end a loop preemptively - "label" to mark a point to jump to in the code - "goto" to jump back or forward to the label - "const" an unchangable value (used in variable definition) - "static" a variable that is kept between block calls, even if it falls out of scope (used in variable definition) - "volatile" a value that may change at any time. The compiler will not optimize the value even if optimization is enabled. (used in variable definition) - "self" only for use in methods. References the object the method was called on without the need for pointer differentiation. - "super" only for use in methods on structs or interfaces which inherit from other structs/interfaces. references the parent struct's methods if overridden. - "return" stops the current named (or anonymous) code block to give a value to it's caller 1.3.5: Expressions Expressions represent pieces of code which return values. These take several forms: 1) A literal value 2) A call to a function or method which returns a value 3) A reference to a variable 4) An operator combining two or more of the above More in depth: 1) Literal values Literal numbers start with a decimal number, and can contain up to one non-number element: 0 - valid 0.2341 - valid 120000 - valid .34567 - invalid 0asd...kj - invalid Special bases: 0b1001 - valid (binary) 0o0127 - valid (octal) 0xABCD - valid (hex) 0BZZZZ - valid (base 64) These rules will be ammended and are mostly for ease of parsing in the first version of the language. Literal boolean values are put as such: true false Literal string values use "", and \ as an escape character "hello, world!" - valid "\"" - valid "\\" - valid "\" - invalid " - invalid Literal characters use '' and either an escape character or a single character/code point between them ' ' - valid '\u2000' - valid '\\' - valid '\'' - valid invalid: '\u200220202asdasdaw qwe ' '\\asd asd ' 'ab' '\' ' 2) Call with a return calling a function is as simple as naming a block of code and then calling it with parentheses # Define /; add ;/ # Call (not expression) ;add() what makes a call an expression is if it outputs any types # Define /; get_five [int] ;return 5 ;/ # Call (expression) ;get_five() 3) A reference to a variable Fairly straight-forward, an initialized variable on it's own is a value, and thus, an expression. ;int x = 0 ;x 4) Combining two or more of the above ;x + get_five() 1.3.6: Ways to define variables When defining variables, first a variable type must be specified, then a variable name. Finally, it may be initialized when defined, or not. #integer type, uninitialized ;int x #integer type, initialized by literal ;int y = 42 #integer type, initialized by expression ;int z = y + get_five() ;x = z*2 ---------------------------------- 1.4: Types TNSL's type system consists of built-in types, user interfaces, and user structs. The built-in types are always on the stack, while user types can be on the stack or the heap. 1.4.1: Built-in types Built-in types are meant to be portable so that tnsl can be backported to whatever system one may want. There are four levels: REQUIRED - If the machine does not have at least this, we will not officially support it SHOULD - Consumer computer systems have had this for a while, so a system really should have it LIKELY - Many consumer electronics now have this, so it would be likely to be included NICE - Future proofing UNLIKELY - These are just wierd things i thought up in the shower Where a given arch falls on this spectrum (as well as market permiation) relates to how likely we are to support it. It also relates to how standards compliant a device is for our language. Types and their sizes: smallest (REQUIRED to be standards compliant): bool - true or false (smallest addressable space on the system) size 8 (REQUIRED to be standards compliant): achar - represents an ascii value int8 - -128 to 127 uint8 - 0 to 255 variable (depends on system) (REQUIRED): generic (void) pointer - "~void" can represent an arbitrary memory address part of heap representation pointer (for each supported type) - part of heap representation size 16 (SHOULD to be standards compliant): int16 - 16 bit signed int uint16 - 16 bit unsigned int variable (SHOULD): type - a type which represents other types. uchar - represents a unicode code point {} - an array type size 32 (SHOULD): int32 - 32 bit signed int uint32 - 32 bit unsigned int float32 - 32 bit floating point (single percision) size 64 (LIKELY): int64 - 64 bit signed int uint64 - 64 bit unsigned int float64 - 64 bit floating point vect (NICE): vector, simd, etc. not really sure how these work yet. I'll get back to you size 128 (NICE): int128 uint128 float128 1.4.2: User defined types (stack) Any structs defined not using pointers are automatically allocated on the stack. Structs are normally alligned to byte boundaries User defined type ids are always >= 64 Defining a struct can be done as follows: ;struct () { } e.g. ;struct Vector2 {int32 x, y} Creating a variable from the struct can be done in two ways: # one, use the assignment operator ;Vector2 vec = Vector2{0, 1} # two, use the list syntax ;Vector2 vec{0, 1} # also, feel free to set the members in the list brackets (does not have to be in order) ;Vector2 vec{x = 0, y = 1} # re-assignment must always use the following syntax ;vec = 1.4.3: User defined methods Any type may be expanded by user defined methods on that type. method block example using operator /; method Vector2 /; operator + (~Vector2 v) ;self.x += `v.x ;self.y += `v.y ;/ /; dot (~Vector2 v) [int32] ;return self.x * `v.x + self.y * `v.y ;/ ;/ said mathods may then be called like so /; some_method ;Vector2 vec1 {0, 2} ;Vector2 vec2 {1, 4} # Call ;vec1.dot(~vec2) # represents 8 ;/ 1.4.4: Interfaces Interfaces exist as a block of mostly un-implimented methods which can be implimented by other interfaces, but ultimatly, structs. Interfaces are types in the sence that objects can be derived from them, and type equivalance is possible, but there are never any pure instances of interfaces. To define an interface, create a block with some methods. Methods not marked with "override" are considered implimented by the interface, and structs do not have to override them. /;interface Vector /; override length_sq [int32] ;/ /; override dimension [int8] ;/ ;/ Interfaces can be used by the extends keyword, just like structs. ; struct Vector2 extends Vector { x, y int32 } # Now, not implimenting will throw an error on compile /; method ~Vector2 /; override length_sq [int32] ;return self.x*self.x + self.y*self.y ;/ /; override dimension [int32] ;return 2 ;/ ;/ ---------------------------------- 1.5: Operators 1.5.1: List of operators At the moment, read Appendix A and Appendix B for a list of operators. 1.5.2: List of reserved characters ; : ' " , . < > ~ ` ! # % ^ & * ( ) { } [ ] - = + ---------------------------------- 1.6: Borrow checker IDK man, maybe later ---------------------------------- 1.7: Anonymous blocks This chapter covers anonymous code blocks, where they can be used, and how functions are first-class in TNSL 1.7.1: Anonymous In TNSL, like many other languages, we have closures (or lambda expressions if you perfer). They have the same type as code blocks (which we havn't really talked about yet), and can be passed around as variables as well as called. The type that functions take on depends on their return value. Block variables can be written as ;void(inputs)[outputs] block i.e. ;void(int32)[int32] block # represents a block which returns a int32 and takes a int32 as a parameter Anonymous blocks can be written only as scope, or with inputs and outputs for function calls. /; call_func (void(int32)[int32] to_call) [int32] ;return to_call(5) ;/ /; provide_anon () [int32] ;return call_func(/; (int32 a) [int32] ;return a + 1 ;/) ;/ In fact, all functions are special types of expressions which return themselves (a set of other statements). ---------------------------------- 1.8: Raw and Asm Sometimes, the programmer needs to impliment an exact or unique set of instructions as lines of assembly to achieve their goal. Raw and Asm allow for that. 1.8.1: raw The raw keyword tells the compiler to leave the entire block (and its contents) to the programmer. When specifying a raw block, the compiler disables the borrow checker on the block and everything inside it. It also strips any calling convention from the block if it is a function, reducing the open and close of the block to a simple call and ret (or equiv). If the block is inline, it does even less. Simply carrying forward the instructions passed inside of the block. Note that any memory allocated inside the block will not be automatically de-allocated as is normal for variables, so programmers must take care to make sure there are no memory leaks in the code. Thus, all memoty leaks can be traced to a raw block. Any and every block may be raw. This includes the main function. 1.8.2: asm The asm keyword may only be used in blocks marked as raw. not all the kinks are worked out yet, but the jist is this example: ;asm "" you can also use variables in it as long as they are in scope. ;asm "mov ax, [some_var]" This paired with knowing the calling convention (i.e. where TNSL stores variables before and after a function call) allows you complete control of the code and execution order if you so wish. ---------------------------------- Part 2: Related Features 2.1: Style guide As a basic rule, users of the language *should* be using the following style guide when writing TNSL programming. Some of the following definitions are arbitrary, but style guides are more for consistancy than code quality. If working on a large project, a differing style guide may be written to the compliment of the programmers working on the specific project. The style guide is not meant as a way to keep people from programming in TNSL, and is not enforceable, but should be followed nonetheless for no other reason than consistency in the code base. All TNSL specification are **heavily** encouraged to follow the style guide for ease of reading. 2.1.1: Comments Minimal comments in the code itself unless a particular implementation is fairly obtuse. Doc comments for functions should explain what the function is/does, not how it does it. Doc comment blocks should start with "/##" and end at the function or method written with "#;" e.g. /## main is the entry point for the program #; main ;/ 2.1.2: Variable names Variable names should be as descriptive as they need. It is encouraged to make parameters more descriptive so others can see what they might need to call a function. Otherwise single letters, snake_case, and other common cases are just fine as long as it is fairly clear what is going on. constants should be in UPPER_CASE 2.1.3: Function and module names It is recommended to use snake_case ---------------------------------- 2.2: The pre-processor ---------------------------------- Part 3: The TNSL Calling ABI Honestly I'm not that versed in assembly, I need to read up >_< ---------------------------------- Part 4: Standard libs 4.1: Bare-metal 4.2: libts libts is an effort to create a TNSL standard library and is too large for this document. Please read related spec text documents. *(discussed more in libts.txt) 4.3: Cross Call libc ---------------------------------- Appendix A: Reserved Characters and their Uses ( - Starting condition/expression mark open ) - Starting condition/expression mark close [ - Ending condition/expression mark open ] - Ending condition/expression mark close { - Array/Set mark open } - Array/Set mark close : - Pre-processor Statement/Directive ; - Code Statement # - Comment Statement , - Separates arguments or in-line statements = - Assignment operator . - Get operator (from struct or module) & - Bit-wise and operator | - Bit-wise or operator ^ - Bit-wise xor operator > - Greater than boolean operator < - Less than boolean operator ! - Not prefix for boolean expression + - Addition/concat operator - - Subtraction operator * - Multiplication operator / - Division operator % - Modulo operator ~ - Address of (and define pointer type) operator ` - Pointer de-reference operator ---------------------------------- Appendix B: Multi-Character Operators /; - Code block mark open ;/ - Code block mark close /: - Pre-processor block mark open :/ - Pre-processor block mark close /# - Comment block mark open #/ - Comment block mark close ;; - Redefine code block (acts as a shortcut for ;//;) :: - Redefine Pre-processor block (acts as shortcut for ://:) ;# - Switch from code block to comment block (;//#) :# - Shortcut (://#) #; - Shortcut (#//;) #: - Shortcut (#//:) == - Boolean equals && - Boolean and || - Boolean or << - Bit-wise l-shift >> - Bit-wise r-shift ++ - Increment -- - De-Increment Augmented assignment operators (a = a b) = (a = b) &= |= ^= += -= *= /= %= ~= `= Augmented boolean operators (a ! b) = !(a b) !& - NAND !| - NOR !^ - XAND !== - Boolean equals !&& !|| !> !< >== - Same as !< <== - Same as !> ---------------------------------- Appendix C: Memory control (and speed) with each type of struct Each type of user-definable type or struct or interface grants it's own level of memory control. These (and their ramifications) are listed here from low to high. --- High level, low control structs (dynamic structs) are created when using the parameters for structs/types. They allow variable length which can house different information at the cost of speed, memory, and control. These are the only type of structs which can house other dynamic structs. Dynamic structs can only be passes by reference due to undefined size at compilation. --- Medium level, medium control structs (type structs) are created normaly through the struct keyword without parameters. These structs are fixed length, but the compiler encodes extra info into them. This means they get method resolution and override checks which may reduce speed of the application. --- Low level, high control structs (raw structs) are created using the "raw" keyword before the "struct" keyword. There are no frills, and method resolution is not performed for these structs. These structs may not extend or be extended. They may, however, implement interfaces. They perform as a "what you see is what you get" kind of memory model. They may not use parameters, and all internal types must be consistant length (no dynamic structs or dynamic type identifiers). --- To summerize: All these structs can encode the same info, but as you get lower to the system, you get the bonus of speed and control while losing higher level functions provided by the language. This shouldn't matter much to most programmers unless they are doing embedded development, systems programming, or firmware programming, but it is still a consideration to make for time-sensitive applications.