Argile Language Reference

# An Argile program is a list of zero or more calls to definitions, separated by a semicolon (;) or a newline (except inside an explicit sub-call), and with identical indentation.

example:
call_1;call_2
call_3 ;; call_4

# A definition has a return type, a syntax, a scope, and can be either :

# A call is a list of one or more call elements (possibly separated by blanks) that must match the syntax of a definition, or it can be is a single constant literal.

# An indentation is the column number of the first character in the line that is neither a space nor a tab character, and that marks the beginning of a call (i.e. it is not inside an explicit sub-call, and the previous line does not end with a backslash '\').

# A blank is one or more of the following :

# An escaped newline is the backslash character (\) immediately followed by a newline.

# A comment is any sequence of characters preceded by '(:' and followed by ':)' ; comments can be nested.

examples:
(: inside a comment :)
(: inside a (:nested comment:) :)

# A call element is either :

examples:
foo + (bar) 42

# A word is one or more of the following characters:

  • a lower case ASCII letter (from 'a' to 'z')
  • an upper case ASCII letter (from 'A' to 'Z')
  • an underscore character (_)
  • an ASCII digit character (from '0' to '9'), except if it is the first character of the word.
  • a non-ASCII character, except for those specified in the ARGILE_OPERATORS environment variable

regular expression: [a-zA-Z_\x80-\xff][0-9a-zA-Z_\x80-\xff]*

examples:
foobar
This_is_also_a_single_word_123
non_ASCII_à

# An operator is one character amongst :

  • these ASCII characters: !#$%&'*+,-./<=>?@[]\^`|~
  • the backslash character (\) except at the end of a line
  • characters in the ARGILE_OPERATOR environment variable

regular expression: [!#$%&\'*+,\-./<=>?@[\]\\^`|~]

# A sub-call is either :

  • the empty sub-call:

    an open parenthesis, followed by zero or more blanks, followed by a closing parenthesis: ().

    it can only match definitions associated with the empty syntax (::)

  • an explicit sub-call: a call within parenthesis
  • an implicit sub-call: a call without parenthesis around (they are guessed by the compiler)
examples:
use std
let x=1, y=2
print (x) (y)   (:explicit sub-calls of variables x and y:)
print x y       (:implicit sub-calls of variables x and y:)

# A constant literal is either :

# A relative integer literal is zero or one minus sign (-) followed by either :

examples:
0 1 -24 0x7f 0X2A 0644 0b01101101

# A decimal integer literal is one or more digits amongst 0123456789, except when it matches an octal integer literal ; it is interpreted in base 10.

regular expression: [0-9]+

examples:
123  -456

# An hexadecimal integer literal is the prefix 0x or 0X followed by one or more character amongst 0123456789abcdefABCDEF ; it is interpreted in base 16.

regular expression: 0[xX][0-9a-fA-F]+

examples:
0xfF  0x0   0X7d

# An octal integer literal is the zero digit (0) followed by one or more digits amongst 01234567 ; it is interpreted in base 8.

regular expression: 0[0-7]+

examples:
07   010   0644

# A binary integer literal is the prefix 0b or 0B followed by one or more digit amongst 01 ; it is interpreted in base 2.

regular expression: 0[bB][01]+

examples:
0b110110011100   0B1

# A real literal is zero or one minus sign (-) then a decimal integer literal then a dot (.) then a decimal integer literal, optionnally followed by an exponent of ten wich is: the letter 'e' or 'E' optionnally followed by a plus (+) or minus (-), and then a decimal integer literal.

regular expression: -?[0-9]+\.[0-9]+([eE][-+]?[0-9]+)?

examples:
0.0   -123.456e-3

# A text literal is, within two double quotes ("), a list of zero or more of the following :

  • any character except the backslash (\) and the double quote (")
  • \" to insert the double quote character
  • \\ to insert the backslash character
  • \ followed by 1, 2 or 3 character(s) amongst 01234567 to insert a character with an octal value
  • \x followed by 2 characters amongst 0123456789abcdefABCDEF to insert a character with an hexadecimal value
  • \a to insert the ASCII bell
  • \b to insert the ASCII backspace
  • \t to insert the ASCII horizontal tab
  • \n to insert the ASCII line feed
  • \v to insert the ASCII vertical tab
  • \f to insert the ASCII form feed
  • \r to insert the ASCII carriage return
examples:
""   "some text literal"   "newline:\n"   "\0\xff"

# A code block literal is an open brace { followed by a list of zero or more calls separated by semicolons (;) or newlines, and with identical indentation, followed by a closing brace (}).

examples:
{}

call_1 {call_2 ; call_3}

call_1 {
  call_2
  call_3
}

A code block literal can also be without braces, and deduced from indentation according to these rules:

  • The identical indentation:

    If the indentation of the current call C1 is equal to the indentation of the previous call C0, then C1 is following C0 in the same code block.

    example:
      C0
      C1   (: identical indentation :)
    
    (: equivalent to: :)
    
      C0 ; C1
    
  • The forward indentation:

    If the indentation of the current call C1 is superior to the indentation of the previous call C0, then C1 is the first call of a new code block literal which is a call element of C0. (Except for the first call of an explicit code block literal)

    example:
      C0
        C1   (: forward indentation :)
        C2
    
    (: equivalent to: :)
    
      C0 { C1 ; C2 }
    
  • The back indentation:

    If the indentation of the current call C1 is inferior to the indentation of the previous call C0, but is identical to the indentation of a precedent call Cx, then all implicitly open blocks are closed, the call they belong to are ended, and C1 follows Cx in the same code.

    example:
    Cz
    	Cx
    		C_a
    		C_b
    			C_c
    						C0
    	C1	(: back indentation :)
    
    (: equivalent to: :)
    
    Cz { Cx { C_a ; C_b { C_c { C0 }}} ; C1 }
    
  • The half-back indentation:

    If the indentation of the current call C1 is inferior to the indentation of the previous call C0, but is between the indentations of two previous calls Cx and Cy, then all implicitly open blocks are closed, the call they belong to are ended, except for Cx: C1 is the continuation of Cx, they are part of the same call.

    example:
    Cx
    	Cy
    		C0
        C1    (: half-back indentation :)
    Cz
    
    (: equivalent to: :)
    
    Cx { Cy { C0 }} C1 ; Cz
    

# A syntax literal is a list of zero or more syntax elements (possibly separated by blanks) surrounded by two colon characters (:).

a non-empty syntax literal that could be matched by the empty sub-call (due to syntax options and repeated lists) is invalid.

example:
:a (simple) syntax literal:

# A syntax element is either :

# A syntax word is a word that is meant to be matched by an identical word call element.

examples:
foo_bar   _    AbcDEfG42 

# A syntax operator is a like an operator, except for the following operator that may need to be escaped with a backslash:

  • the inferior sign (<) always needs a backslash (\<) to not be confused with the beginning of a paramerer
  • the superior sign (>) may need a backslash (\>) to not be confused with the end of a paramerer
  • the open square bracket ([) needs a backslash (\[) to not be confused with the beginning of a repeated list or option
  • the closing square bracket (]) may need a backslash (\]) to not be confused with the end of a repeated list or option
  • the vertical bar (|) may need a backslash (\|) to not be confused with the separator of an enumeration
  • the dot (.) may need a backslash (\.) to not be confused with the '...' separator of a repeated list
  • the backslash (\) always needs a backslash (\\) in a syntax literal to be matched by a single backslash operator call element

It is meant to be matched by an identical operator call element.

examples:
: \<   =   > :

(: equivalent to: :)

:\<=>:   (: these 3 syntax operators may be matched by (<=>) :)

# A syntax option is an open parenthesis (() followed by a list of one or more syntax elements followed by a closing parenthesis ()).

It is also possible to replace the parenthesis () by square brackets [] if there is no 3 consecutives unspaced dots (...) in its content.

Its content is meant to be optionally matched.

example:
:an (option (with another [nested] option)):

(: may be matched by the following calls: :)

an ; an option ;
an option with another option;
an option with another nested option;

# A syntax enumeration is an open brace ({) followed by a vertical-bar-separated (|) list of two or more lists of one or more syntax elements, followed by a closing brace (}).

Only one of its contained list of syntax element is meant to be matched.

example:
: do { A B | C D | E } :

(: may be matched by these calls: :)

do A B
do C D
do E

# A syntax repeated list is, in order :

  • an open square bracket ([)
  • a list of one or more syntax elements
  • 3 consecutives unspaced dots (...)
  • either :
    • 'minimum,maximum'
    • 'minimum,' (with the comma ','), implying maximum=infinity
    • 'minimum' (without comma), implying maximum=minimum
    • nothing, implying minimum=0 and maximum=infinity
    where minimum and maximum are decimal integers
  • a closing square bracket (])

It can also be the special empty syntax list [...] if placed at the end of a syntax, in which case it is like [<anything>...], except it is meant to generate C-like variadic functions (which uses a flat list of function arguments) instead of Argile-like variadism (which uses arrays).

Its content is meant to be matched N times successively, where N is within boundaries: minimum <= N <= maximum.

examples:
:[y ... 2,4]:       (: may be matched only by these: :)

y y   ;   y y y   ;   y y y y



: x [y...] :        (: may be matched by these: :)

x
x y
x y y y y y y y y y y y y y


: eprintf <text format> [...] :     (: for a C-variadic function :)
eprintf "%i %s %f\n" -11 "foobar" 3.1415926

# A syntax parameter is a less-than sign (<), followed bye a list of one or more paramters specifications separated by comma (,), then a greater-than sign (>).

When there are more than one parameter specification, it is strictly equivalent to a succession of syntax parameters with one specification each.

A syntax parameter with a single specification is meant to be matched by either :

# A parameter specification is, in order :

Whether the last word is part of the type or is the syntax of the parameter, is determined by trying to compile the type without the last word first, and if it fails, then it is tried with the last word appended.

examples:
:<real x> <real> (<text="empty">) [<int y = -1>...]:
:<any a, any b = (nil), any c>:
:<any a> <any b = (nil)> <any c>:  (: is equivalent to the previous one :)

# A type is either :

A type also has a reference version and a raw version.

# A basic type is a built-in type that is either :

  • the nothing type
  • the anything type
  • the type type
  • the word type
  • the integer type
  • the natural type
  • the real type
  • the text type
  • the syntax type
  • the code type

A basic type definition can be made by doing a binding.

example:
bind :nothing: to std/nothing

# A reference type is a type that corresponds to a value that can be read from as well as written to.

The type of a variable is a reference type by default.

Modifying the value of a function parameter with a reference type will change the value for the caller as well; if the parameter is not a reference, then it is passed by value.

It is possible to get the reference version of a type by calling a binding to std/typeref

example:
use std
.: mult3 <int & i> :.    (: expects a reference to an integer :)
   i = i * 3
let x = 2
print x    (: will print 2 :)
mult3 x
print x    (: will print 6 :)

# A raw type is intended for classes and means "by value" instead of "by address" (which is the default). In C, this corresponds to a plain structure (not by pointer).

It is possible to get the raw version of a type by calling a binding to std/typeraw

example:
use std
class K { int i }
let K@ raw_k          (: some data are locally allocated :)
let K  addr_k = raw_k (: this is an address to the same data :)

# A class is a type that corresponds to a structured block of data. Its structure is a list of fields, where each field has a name (a word) and a type.

If a parent type is specified, then an implicit field of the parent type will be placed at the beginning of the data block, as well as an implicit type caster to the parent type.

A class may be defined with a call to a binding to std/class.

The code parameter of std/class is interpreted in a special way: calls inside are meant to only match :<type> <word> (/<nat field_size>): to define class fields.

There isn't really methods, instead there are simply functions that take a class type as parameter.

example:
use std

class Person
  text firstName
  text lastName
  int age

bind :function <syntax> (returns <type>) <code>: to std/funcdef

function :<Person>.setAge <int x>: returns int
  Person.age = x
  Person.age

# A union is a type that corresponds to a list of several possible types. The size of a union is the maximal size of its sub-types.

A union can be defined by calling a binding to std/union or a binding to std/class with the union binding option.

example:
use std

union number is real or int or nat

let the real r = 0.0
let the number n
r = n
n = (r as number)
r = (n the real)
(n the real &) = r

# An enumeration is a type that corresponds to a list of possible words. For each possible word, an integer value is associated, as well as a definition, with a syntax composed of the single word.

An enumeration can be defined with a call to a binding to std/enum

example:
use std

enum :week day: is monday, tuesday, wednesday, thursday, friday, saturday, sunday=6.

let the week day D = sunday
print D as int  (: will print 6 :)
print saturday as int (: will probably print 5 :)

# A constant C type is defined with a prefix and optionally a suffix (both are text literals)

A C type can be defined with a call to a binding to std/ctype

example:
use std
bind :C-type <text> <word> (<text>): to std/ctype  (: also defined in std.argl :)
C-type "unsigned char" table "[256]"

# A parametric type is defined as a macro with a type return type. Only the first and second call of its body are used (if present): the first call generates the C type prefix, while the second generates the C type suffix.

A parametric type may match itself as well as a version of itself with no parameter specified.

examples:
use std
bind :macro <syntax> (-> <type>) <code>: to std/funcdef+macro

macro : { pointer | <type pointed> * } : -> type     (: a parametric type :)
  Cgen (pointed.prefix) "*"
  Cgen (pointed.suffix)
macro : * <pointer p> : -> p.pointed &  {Cgen p}      (: dereferencing :)
macro : & <any & a> : -> (type of a)* {Cgen "&"a}     (: address of :)

let int x = 3
let (int*) px = &x
let int y = *(px)


(: another parametric type: a function pointer :)
macro :function [<type params>...] (-> <type ret>): -> type {
  Cgen (ret.prefix) "(*"
  Cgen ")(" (@params) ")" (ret.suffix)
}

(: calling such a function: :)
macro : <function f> \[] : -> (f.ret)
  Cgen "(*"f")()"

macro : <function f> \[ [<any params> ... 1,] ] : -> (f.ret)
  Cgen "(*"f")("@params")"

let the (function real real -> real) funcptr
funcptr = .:<real a,real b>:. ->real {a+b}
print funcptr[1.2 3.4]    (: prints 4.6 :)

# A variable is a definition that holds a modifiable value.

If its syntax has parameters, they are ignored.

A variable can be defined by calling a binding to std/vardef.

examples:
use std
let int i
let real j, k, l, m
let :a variable with anything type:
let :auto-typed: = "some text.."
let it be auto-typed
print it            (: prints "some text.." :)

# A function is a definition that, when called, executes a piece of code then results in a return value, the same way a C function works.

The type of the return value is the return type of the function.

The last call of the body code must have a return a type that matches the return type of the function.

A return value can also be given with a call to a binding to std/return.

A function can be defined by calling a binding to std/funcdef.

examples:
use std

.: identity <int x> :. -> int {x}

bind :function <syntax> -> <type>  <code>: to std/funcdef

function : f <real x> : -> real {
  return 0.0 if x == 0.0
  1.0 / x
}

# A macro is a definition that, when called, executes a piece of code then results in a return value, like a function, but "inline": it directly generates its body code in which the calls to its parameters are replaced by their value, instead of a C function call.

The type of the return value is the return type of the macro.

The last call of the body code must have a return a type that matches the return type of the macro.

All parameters in the syntax of the macro are defined as local variables visible only in the macro definition; in particular, the return type of a macro can call its parameters.

A macro can be defined by calling a binding to std/funcdef with the macro binding option.

examples:
use std

=:PI Macro:= -> real {3.1415926535897932384626433832795028841971693993751}
printf "%.16f\n" PI Macro  (: prints 3.1415926535897931 :)

bind : macro <syntax, type, code> : to std/funcdef+macro

macro :<num N> PI: (real)
  use math
  N * M_PI

printf "%.16f\n" (2 PI)   (: prints 6.2831853071795862 :)

# A binding is a definition that corresponds to a built-in function inside a module (currently only the pseudo module std is available).

A binding can be defined by calling a binding to std/bind.

Also, there is always an implicit binding to std/bind.

example:
bind :type of <any>: to std/typeof

# An implicit definitions is a definition that is made by the compiler before reading program sources.

There exist only 2 implicit definitions :

  • :use [{<word>|<text>}, ...] {<word>|<text>}: is implicitly bound to std/use and is used to import definitions from external Argile sources files, by searching in certain directories.
  • :bind <syntax> to <word module>/<word bind>: is implicitly bound to std/bind and is used to make bindings in the std pseudo module

# A returned type matches an expected type when either :

  • returned is exactly the same as expected
  • returned is a union and expected is exactly the same as one of its possible variant type
  • expected is a union and returned is exactly the same as one of its possible variant type
  • returned and expected refer to the same parametric type, but expected has no parameter specified
  • there exists a path between expected and returned in the graph of implicit type casters (including classes hierarchies)

But a returned type does not match an expected type when either :

  • expected is by reference, but returned is not
  • expected is by raw class, but returned isn't a class

# An implicit type caster is an ability of a type to be automatically casted into another type. By default, a type does not have any implicit caster.

An implicit caster can be given to a type by calling a binding to std/autocast.

example:
use std
autocast text -> int {atoi text}
let int x
x = "42"   (: same as x=atoi("42") :)

# Implicit sub-calls are searched in a call by testing, in order of priority:

# A scope is the domain of visibility of a set of definitions: it defines where these definitions can be called.

Concretely, the scope of a definition is the code in which it is defined, as well as its sub-codes.

example:
use std

if true {

   let :some variable: = "some text"

   print some variable		(: this is valid:)

   do {
     print some variable	(: this is also valid :)
   } while false

}
else
   print some variable		(: compilation fails here :)

There is also the main code of each Argile source, that is also called the global scope (but local to each file); when a file is imported using a binding to std/use, its main definitions are included in a new scope placed between the scope of the call to the std/use binding and the previous (upper) scope.

When a call is being compiled, visible definitions are searched in an order according to these priorities:

  • closest scope where the definition is made first
  • then closest call making the definition before the current one
  • then closest call making the definition after the current one

In the following example, definitions are tried in this order: def_1 first, then def_2, def_3, ... , def_12 (where def_n refers to both a definition and the call making it)

def_10;
def_9;
{
  def_6;
  def_5;
  {
    def_2;
    def_1;
    a call being compiled;
    def_3;
    def_4;
  }
  def_7;
  def_8;
}
def_11;
def_12;

Moreover, each code block is compiled in two passes:

  • in the first pass, only definitions that may contribute to making new definitions are tried
  • in the second pass, all definitions are tried

therefore, it is possible to make a definition after it is called.


Copyright © 2009,2010,2011,2014 The Argile authors (see details here).

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included here.