This is the TNSL specification, a document related to the definition of the TNSL language,
meta-language, it's usage, and where it's compiler may be ported.

Document version (semver): 0.0.1
Main Author: Kyle Gunger

License: Apache 2.0

----------------------------------

Contents:

Preamble

Part 1 -- The Language
	1.1: .tnsl Files
	1.2: Blocks
	1.3: Statements
	1.4: Types
	1.5: Operators
	1.6: Borrow checker
	1.7: Anonymous blocks
	1.8: Raw and Asm

Part 2 -- Related Features
	2.1: Style guide
	2.2: The pre-processor
	2.3: Compiler options
	2.4: Included tools

Part 3 -- The TNSL Calling ABI
	3.1: 

Part 4 -- Standard libs
	4.1: Bare-metal
	4.2: libts
	4.3: Cross call libc

Appendix A: Reserved Characters and their Uses
Appendix B: Multi-Character Operators

----------------------------------

Preamble

The past few years have seen an explosion of languages trying to break into
the space C has held for so long, with low to moderate success.  TNSL's
primary goal is not "fix" C into some every man's language, the hope is that
TNSL will be relatively nice to program in for *programmers*, and that this
quality will be rewarding to learn for those new to the field.  TNSL's goal
is more to de-mistify some of the hard edges which make C difficult to
*completely* grasp for new programmers.  By making the specification and
documentation open and free, we hope that new and old programmers can
contribute and easily experience a nice language, so once a novice goes out
into the real-world of TNSL, it won't be too far removed from the documentation
of the language at its core.  We also hope that TNSL will be compitant enough
to be loved by veterans of the field as well, and make life slightly easier for
those wierdos who still love to program without safty.

Though TNSL does not completely resemble those who have inspired it, you will
find that many features present in languages such as C, Golang, Rust, Java,
and C++ still live on in these halls.

Welcome to TNSL


----------------------------------

Part 1: The Language

----------------------------------

Section 1: .tnsl files

The high-level files containing code in the TNSL language contain the .tnsl extension.
Each file may contain 0 or more of the following:

	1) Comments

	2) Pre-processor statements

	3) Variable definitions

	4) Named code blocks

	5) Method blocks

	6) Module blocks

Code blocks and method blocks are the only blocks which may contain statements (1.3) and logical blocks.

----------------------------------

Section 2: Blocks

TNSL files will consist primarily of blocks both with user-defined names and language-defined keywords.
Standard blocks (and their modifiers) are listed here.

Each block in TNSL starts with a forward slash ( / ) followed by the character of the block.
The block ends with the reverse of these characters, or at special multi-character boundaries.


	1.2.1: The Comment Block
		"/#" starts the comment block, and "#/" ends the comment block.
		Any text within the block will be ignored by the pre-processor and compiler.


	1.2.2: The Pre-Processor Block
		"/:" starts the pre-processor block, which is usually followed by a pre-processor keyword.

		Every line in the pre-processor block acts as if the keyword specified at the start of the block was placed
		at the beginning of the line.

		E.g.

		/: import
			'math'
			"physics"
		:/

		represents the following two pre-processor statements:
		
		: import 'math'
		: import "physics"

		":/" ends the block.


	1.2.3: The Code Block
		"/;" starts the code block.

		The code block may be followed immediately by the following qualifiers:

		- "loop" to represent a looping block.

		- "if" to represent the start of an if chain.

		- "if else" to represent the start of a secondary if case.

		- "else" to represent the fallback block of an if chain.

		- "match" to represent the beginning of a match (or switch) block.

			- "case" (inside match only) to represent the start of a case block.

			- "default" catchall case

		- "method" to represent a set of methods for use on a type.

			- "override" to replace an extended method with a new one.

			- "operator" (inside method only) to represent an operator overload.
		
		- "interface" to create a set of methods that a struct must impliment if it extends the interface.

		- "module" to represent a set of related functions, types, methods, and other modules.
		(in other languages this is sometimes referred to as a namespace)
		
		- A non-keyword consisting of unreserved characters to "name" a block as a function or method.


		If none of the above are put after the start of a block then the block is considered anonymous.
		This is discussed more in 1.8 (Anonymous Blocks)


		A block may also be followed by parentheses "()" which denote a list of inputs, and/or
		brackets "[]" which denote a list of outputs.  These are discussed more in 1.5 (Types).


		If a loop block is followed by a set of parentheses and the last statement is an expression
		resolving in a boolean expression, this expression is evaluated as the exit condition for the loop.

		If a loop block is followed by a set of brackets, each expression is evaluated before the loop
		jumps back to the top (and also before the exit condition is evaluated if one exists).


		An if or if else block must have a set of parentheses whose final statement is an expression
		resolving in a boolean expression.  This expression evaluated to decide if the branch is taken or not.
		(if the condition is true the branch is taken, if false, the branch is skipped)

		A match block must have a set of parentheses whose final statement is an expression which
		resolves in a type whose data is stored on the stack.

		Each case statement must have a constant value directly following.  This branch of code is taken
		only if the parent match block's expression resolves in a value matching the case block.

		Default block may be used to create a catchall case (for if no other) case was used.


		The method block represents a set of functions related to a type, and must be followed by parentheses
		which contain a pointer to that type.

		The operator block is immediately qualified by a reserved character representing an operator, then
		followed by a set of parentheses representing a pointer to the type the operator can be used on.

		The override block may be used if overriding an extended method.


		The interface block acts similar to the method block, but any methods actually implimented must only
		reference other methods.  May contain "operator" and "override" like the method block.  May extended
		other interfaces, but not types or structs.


		The module keyword must be followed immediately by a name for the module using only un-reserved characters.


		A named or anonymous block may be followed by parentheses indicating parameters, and/or brackets
		indicating return type(s). This is discussed more in 1.5 and 1.8

		";/" ends the code block.


	1.2.4: Block redefinition and shortcuts
		Due to the two-character nature of block beginnings and endings, some simple shortcuts have been devised
		to mitigate annoyance at re-defining or swapping between block types.

		";;" closes a code block and opens a new code block.


		e.g.

		/; if

		;; else

		;/


		"::" closes a pre-processor block and opens a new one.

		";#" closes a code block and opens a comment block.

		"#;" closes a comment block and opens a code block.

		":#" closes a pre-processor block and opens a comment block.

		"#:" closes a comment block and opens a pre-processor block.

----------------------------------

1.3: Statements

Statements make up the actual "doing" part of TNSL

	1.3.1: Comments
		Comments start with the "#" character and end at the end of the line.
		The compiler will ignore all characters (even other reserved characters) after the # on that line.

		If I may be candid for a moment:
		Comments ( # ) should not matter in a comment block, although this is currently a tokenizer bug resulting in some weird behavior.


	1.3.2: Pre-processor statements
		Pre-processor statements almost make up a meta-programming language (not solidified yet)
		which influences how code is interpreted by the compiler.

		Pre-processor statements start with a ":" character and end at the next line

		The pre-processor is discussed more in 2.2


	1.3.3: Code statements
		Code statements begin with the ";" character and end at the next ":", ";", or block.

		Code statements come in four forms:

			1) Definition (variable, struct, etc.)
			2) Expression (assignment, increment, etc.)
			3) Call (function call, block call, etc.)
			4) Keyword (using a keyword to perform a task)

			These forms can all mix to form the final statement.

	
	1.3.4: Keywords

		- "struct" to define a new struct type

		- "extends" to inherit methods and members from another struct or interface

		- "is" to check if a type extends (or equates) another type

		- "continue" to continue through a loop	

		- "break" to end a loop preemptively

		- "label" to mark a point to jump to in the code

		- "goto" to jump back or forward to the label

		- "const" an unchangable value
		(used in variable definition)

		- "static" a variable that is kept between block calls, even if it falls out of scope
		(used in variable definition)

		- "volatile" a value that may change at any time.
		The compiler will not optimize the value even if optimization is enabled.
		(used in variable definition)

		- "self" only for use in methods. References the object the method was called on
		without the need for pointer differentiation.

		- "super" only for use in methods on structs or interfaces which inherit from other structs/interfaces.
		references the parent struct's methods if overridden.

		- "return" stops the current named (or anonymous) code block to give a value to it's caller


	1.3.5: Expressions
		Expressions represent pieces of code which return values.
		These take several forms:

		1) A literal value
		2) A call to a function or method which returns a value
		3) A reference to a variable
		4) An operator combining two or more of the above

		More in depth:

		1) Literal values

		Literal numbers start with a decimal number, and can contain up to one non-number element:
			0 - valid
			0.2341 - valid
			120000 - valid

			.34567 - invalid
			0asd...kj - invalid

			Special bases:

			0b1001 - valid (binary)
			0o0127 - valid (octal)
			0xABCD - valid (hex)
			0BZZZZ - valid (base 64)

		These rules will be ammended and are mostly for ease of parsing in the first version of the language.

		Literal boolean values are put as such:
			true
			false

		Literal string values use "", and \ as an escape character
			"hello, world!" - valid
			"\""            - valid
			"\\"            - valid

			"\"             - invalid
			"               - invalid

		Literal characters use '' and either an escape character or a single character/code point between them
			' '      - valid
			'\u2000' - valid
			'\\'     - valid
			'\''     - valid

			invalid:
			'\u200220202asdasdaw qwe '
			'\\asd asd '
			'ab'
			'\'
			'

		2) Call with a return

		calling a function is as simple as naming a block of code and then calling it with parentheses

		# Define
		/; add
		;/

		# Call (not expression)
		;add()

		what makes a call an expression is if it outputs any types

		# Define
		/; get_five [int]
			;return 5
		;/

		# Call (expression)
		;get_five()


		3) A reference to a variable
		
		Fairly straight-forward, an initialized variable on it's own is a value, and thus, an expression.

		;int x = 0

		;x


		4) Combining two or more of the above

		;x + get_five()


	1.3.6: Ways to define variables

		When defining variables, first a variable type must be specified, then a variable name.
		Finally, it may be initialized when defined, or not.

		#integer type, uninitialized
		;int x

		#integer type, initialized by literal
		;int y = 42

		#integer type, initialized by expression
		;int z = y + get_five()

		;x = z*2

----------------------------------

1.4: Types

TNSL's type system consists of built-in types, user interfaces, and user structs.
The built-in types are always on the stack, while user types can be on the stack or the heap.

	1.4.1: Built-in types
		Built-in types are meant to be portable so that tnsl can be backported to whatever system one may want.

		There are four levels:
		REQUIRED - If the machine does not have at least this, we will not officially support it
		SHOULD - Consumer computer systems have had this for a while, so a system really should have it
		LIKELY - Many consumer electronics now have this, so it would be likely to be included
		NICE - Future proofing
		UNLIKELY - These are just wierd things i thought up in the shower

		Where a given arch falls on this spectrum (as well as market permiation) relates to how likely we are to support it.
		It also relates to how standards compliant a device is for our language.
		

		Types and their sizes:

		smallest (REQUIRED to be standards compliant):
		bool - true or false (smallest addressable space on the system)

		size 8 (REQUIRED to be standards compliant):
		achar - represents an ascii value
		int8 - -128 to 127
		uint8 - 0 to 255

		variable (depends on system) (REQUIRED):
		generic (void) pointer - "~void" can represent an arbitrary memory address part of heap representation
		pointer (for each supported type) - part of heap representation

		size 16 (SHOULD to be standards compliant):
		int16 - 16 bit signed int
		uint16 - 16 bit unsigned int

		variable (SHOULD):
		type - a type which represents other types.
		uchar - represents a unicode code point
		{}<type> - an array type

		size 32 (SHOULD):
		int32 - 32 bit signed int
		uint32 - 32 bit unsigned int
		float32 - 32 bit floating point (single percision)

		size 64 (LIKELY):
		int64 - 64 bit signed int
		uint64 - 64 bit unsigned int
		float64 - 64 bit floating point
		
		vect (NICE):
		vector, simd, etc. not really sure how these work yet.  I'll get back to you

		size 128 (NICE):
		int128
		uint128
		float128

	
	1.4.2: User defined types (stack)
		Any structs defined not using pointers are automatically allocated on the stack.

		Structs are normally alligned to byte boundaries

		User defined type ids are always >= 64

		Defining a struct can be done as follows:

		;struct <struct name> (<list of inputs (makes this a dynamic type)>) { <list of members (may use inputs)> }

		e.g.

		;struct Vector2 {int32 x, y}


		Creating a variable from the struct can be done in two ways:

		# one, use the assignment operator
		;Vector2 vec = Vector2{0, 1}

		# two, use the list syntax
		;Vector2 vec{0, 1}

		# also, feel free to set the members in the list brackets (does not have to be in order)
		;Vector2 vec{x = 0, y = 1}

		# re-assignment must always use the following syntax
		;vec = <expression returning variable type>
	

	1.4.3: User defined methods
		Any type may be expanded by user defined methods on that type.

		method block example using operator

		/; method Vector2

			/; operator + (~Vector2 v)
				;self.x += `v.x
				;self.y += `v.y
			;/

			/; dot (~Vector2 v) [int32]
				;return self.x * `v.x + self.y * `v.y
			;/

		;/

		said mathods may then be called like so

		/; some_method
			;Vector2 vec1 {0, 2}
			;Vector2 vec2 {1, 4}

			# Call
			;vec1.dot(~vec2) # represents 8
		;/

	
	1.4.4: Interfaces
		Interfaces exist as a block of mostly un-implimented methods which
		can be implimented by other interfaces, but ultimatly, structs.

		Interfaces are types in the sence that objects can be derived from them,
		and type equivalance is possible, but there are never any pure instances
		of interfaces.

		To define an interface, create a block with some methods.
		Methods not marked with "override" are considered implimented by the interface,
		and structs do not have to override them.

		/;interface Vector

			/; override length_sq [int32] ;/

			/; override dimension [int8] ;/

		;/

		Interfaces can be used by the extends keyword, just like structs.

		; struct Vector2 extends Vector
		{
			x, y int32
		}

		# Now, not implimenting will throw an error on compile
		/; method ~Vector2

			/; override length_sq [int32]
				;return self.x*self.x + self.y*self.y
			;/

			/; override dimension [int32]
				;return 2
			;/
		;/

----------------------------------

1.5: Operators

	1.5.1: List of operators
		At the moment, read Appendix A and Appendix B for a list of operators.

	1.5.2: List of reserved characters
		; : ' " , . < > ~ ` ! # % ^ & * ( ) { } [ ] - = +

----------------------------------

1.6: Borrow checker

IDK man, maybe later

----------------------------------

1.7: Anonymous blocks

This chapter covers anonymous code blocks, where they can be used,
and how functions are first-class in TNSL

	1.7.1: Anonymous
		In TNSL, like many other languages, we have closures (or lambda expressions if you perfer).
		They have the same type as code blocks (which we havn't really talked about yet),
		and can be passed around as variables as well as called.

		The type that functions take on depends on their return value.

		Block variables can be written as

		;void(inputs)[outputs] block

		i.e.

		;void(int32)[int32] block
		# represents a block which returns a int32 and takes a int32 as a parameter

		Anonymous blocks can be written only as scope, or with inputs and outputs for function calls.

		/; call_func (void(int32)[int32] to_call) [int32]
			;return to_call(5)
		;/

		/; provide_anon () [int32]
			;return call_func(/; (int32 a) [int32]
				;return a + 1
			;/)
		;/

		In fact, all functions are special types of expressions which return themselves (a set of other statements).

----------------------------------

1.8: Raw and Asm

Sometimes, the programmer needs to impliment an exact or unique set of instructions
as lines of assembly to achieve their goal.  Raw and Asm allow for that.

	1.8.1: raw
		The raw keyword tells the compiler to leave the entire block (and its contents)
		to the programmer.

		When specifying a raw block, the compiler disables the borrow checker on the block and
		everything inside it.  It also strips any calling convention from the block if it is a function,
		reducing the open and close of the block to a simple call and ret (or equiv).

		If the block is inline, it does even less.  Simply carrying forward the instructions passed inside of the block.

		Note that any memory allocated inside the block will not be automatically de-allocated as is normal for variables,
		so programmers must take care to make sure there are no memory leaks in the code.  Thus, all memoty leaks can be
		traced to a raw block.  Any and every block may be raw.  This includes the main function.

	1.8.2: asm
		The asm keyword may only be used in blocks marked as raw.

		not all the kinks are worked out yet, but the jist is this example:

		;asm "<string of assembly>"

		you can also use variables in it as long as they are in scope.

		;asm "mov ax, [some_var]"

		This paired with knowing the calling convention (i.e. where TNSL stores variables before and after a function call)
		allows you complete control of the code and execution order if you so wish.


----------------------------------

Part 2: Related Features

2.1: Style guide
	As a basic rule, users of the language *should* be using the following style guide when
	writing TNSL programming.  Some of the following definitions are arbitrary, but style guides
	are more for consistancy than code quality.

	If working on a large project, a differing style guide may be written to the compliment of the
	programmers working on the specific project.

	The style guide is not meant as a way to keep people from programming in TNSL, and is not enforceable,
	but should be followed nonetheless for no other reason than consistency in the code base.

	All TNSL specification are **heavily** encouraged to follow the style guide for ease of reading.


	2.1.1: Comments
		Minimal comments in the code itself unless a particular implementation is fairly obtuse.

		Doc comments for functions should explain what the function is/does, not how it does it.

		Doc comment blocks should start with "/##" and end at the function or method written with "#;"

		e.g.

		/## main is the entry point for the program
		
		#; main
		
		;/
	

	2.1.2: Variable names
		Variable names should be as descriptive as they need.
		It is encouraged to make parameters more descriptive so others can see
		what they might need to call a function.

		Otherwise single letters, snake_case, and other common cases are just fine
		as long as it is fairly clear what is going on.

		constants should be in UPPER_CASE

	2.1.3: Function and module names
		It is recommended to use snake_case

----------------------------------

2.2: The pre-processor


----------------------------------

Part 3: The TNSL Calling ABI

Honestly I'm not that versed in assembly, I need to read up >_<


----------------------------------

Part 4: Standard libs

4.1: Bare-metal


4.2: libts
	libts is an effort to create a TNSL standard library and is too large for this document.
	Please read related spec text documents. *(discussed more in libts.txt)


4.3: Cross Call libc


----------------------------------

Appendix A: Reserved Characters and their Uses

( - Starting condition/expression mark open

) - Starting condition/expression mark close

[ - Ending condition/expression mark open

] - Ending condition/expression mark close

{ - Array/Set mark open

} - Array/Set mark close

: - Pre-processor Statement/Directive

; - Code Statement

# - Comment Statement

, - Separates arguments or in-line statements

= - Assignment operator

. - Get operator (from struct or module)

& - Bit-wise and operator

| - Bit-wise or operator

^ - Bit-wise xor operator

> - Greater than boolean operator

< - Less than boolean operator

! - Not prefix for boolean expression

+ - Addition/concat operator

- - Subtraction operator

* - Multiplication operator

/ - Division operator

% - Modulo operator

~ - Address of (and define pointer type) operator

` - Pointer de-reference operator


----------------------------------

Appendix B: Multi-Character Operators

/; - Code block mark open

;/ - Code block mark close

/: - Pre-processor block mark open

:/ - Pre-processor block mark close

/# - Comment block mark open

#/ - Comment block mark close

;; - Redefine code block (acts as a shortcut for ;//;)

:: - Redefine Pre-processor block (acts as shortcut for ://:)

;# - Switch from code block to comment block (;//#)

:# - Shortcut (://#)

#; - Shortcut (#//;)

#: - Shortcut (#//:)

== - Boolean equals

&& - Boolean and

|| - Boolean or

<< - Bit-wise l-shift

>> - Bit-wise r-shift

++ - Increment

-- - De-Increment


Augmented assignment operators (a = a <op> b) = (a <op>= b)
&=

|=

^=

+=

-=

*=

/=

%=

~=

`=


Augmented boolean operators (a !<op> b) = !(a <op> b)
!& - NAND

!| - NOR

!^ - XAND

!== - Boolean equals

!&&

!||

!>

!<

>== - Same as !<

<== - Same as !>

----------------------------------

Appendix C: Memory control (and speed) with each type of struct

Each type of user-definable type or struct or interface grants
it's own level of memory control.  These (and their ramifications) are
listed here from low to high.

---

High level, low control structs (dynamic structs) are created when using
the parameters for structs/types.  They allow variable length which can
house different information at the cost of speed, memory, and control.

These are the only type of structs which can house other dynamic structs.
Dynamic structs can only be passes by reference due to undefined size at
compilation.

---

Medium level, medium control structs (type structs) are created normaly
through the struct keyword without parameters.  These structs are fixed
length, but the compiler encodes extra info into them.  This means they
get method resolution and override checks which may reduce speed of the
application.

---

Low level, high control structs (raw structs) are created using the "raw"
keyword before the "struct" keyword.  There are no frills, and method
resolution is not performed for these structs.  These structs may not
extend or be extended.  They may, however, implement interfaces.  They
perform as a "what you see is what you get" kind of memory model.  They
may not use parameters, and all internal types must be consistant length
(no dynamic structs or dynamic type identifiers).

---

To summerize:
All these structs can encode the same info, but as you get lower to
the system, you get the bonus of speed and control while losing higher
level functions provided by the language.

This shouldn't matter much to most programmers unless they are doing
embedded development, systems programming, or firmware programming,
but it is still a consideration to make for time-sensitive applications.