ARGON
Check-in [971887d31c]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updated CHROME, with thoughts from reading Design Concepts in Programming Languages and reading about Idris.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 971887d31ce54b193603171e3895d7a055dfef8b
User & Date: alaric 2017-07-11 20:59:55
Context
2017-07-11
21:01
Fixed wiki markup links check-in: c1d292a038 user: alaric tags: trunk
20:59
Updated CHROME, with thoughts from reading Design Concepts in Programming Languages and reading about Idris. check-in: 971887d31c user: alaric tags: trunk
2017-06-02
21:57
Expounded on Iodine Object Push, and wondered about special support for it within MERCURY. check-in: 6ad45702f4 user: alaric tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to intro/chrome.wiki.

1
2
3
4
5
6
7
8
9
10
11
12
CHROME is the high-level programming language that the bulk of ARGON
itself, and user applications written above ARGON, will be written
in. [./hydrogen.wiki|HYDROGEN] already provides a programming
language, but it's terribly low level, and "unsafe"; HYDROGEN has an
unprecedented level of access to the low-level aspects of its own
implementation. In HYDROGEN one can write code which writes random
numbers to random locations in memory until the system falls
over. Therefore, HYDROGEN code cannot be accepted from untrusted
sources. Apart from low-level libraries, things involved in
bootstrapping, and the occasional bit of hand-coded stuff that
requires particularly high performance, as much HYDROGEN code as
possible should be generated by higher-level language compilers - such



|
|







1
2
3
4
5
6
7
8
9
10
11
12
CHROME is the high-level programming language that the bulk of ARGON
itself, and user applications written above ARGON, will be written
in. [./hydrogen.wiki|HYDROGEN] already provides a programming
language, but it's terribly low level, and "unsafe"; HYDROGEN has a
significant level of access to the low-level aspects of its own
implementation. In HYDROGEN one can write code which writes random
numbers to random locations in memory until the system falls
over. Therefore, HYDROGEN code cannot be accepted from untrusted
sources. Apart from low-level libraries, things involved in
bootstrapping, and the occasional bit of hand-coded stuff that
requires particularly high performance, as much HYDROGEN code as
possible should be generated by higher-level language compilers - such
36
37
38
39
40
41
42





43




44
45
46
47
48
49
50
51
52
53
54
55
56
57
58




















59
60















61



62
63
64




































65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
As a language, it's a purely functional language with a strong LISPy
flavour, representing code in the [./iron.wiki|IRON] data model rather
than s-expressions. The IRON data model also serves as the fundamental
data model of CHROME itself; CHROME programs manipulate IRON
values. Functions are first-class and bindings are lexically scoped,
so functions close over their arguments.






Rather than using monads, mutation is modelled using uniqueness typing.





[http://www.snell-pym.org.uk/archives/2009/08/18/generic-functions/|Generic
functions] are used to provide polymorphism, rather than class- or
prototype-based object orientation.

Hygienic macros (with a module system based on the
[./carbon.wiki|CARBON] namespaces used for IRON symbols) are provided
for syntactic extensibility. However, they are "second-class values";
they are bound in the lexical environment just like functions and
other values, but they're not fully first-class as there is a
constraint that macros in code to be compiled <i>must be statically
deducible</i>, in that every reference to them must be through an
expression whose value can be found at compile time. Attempting to
apply a macro at run time is an error.





















Bounded-space tail calls and first-class partial continuations provide
for flow control.



















Dynamically-scoped parameters are a boon in Scheme, so we'll have
them, too.





































In summary, it looks a lot like Scheme written in IRON syntax, without
mutation and with generic functions in the core; and with macros as
second-class values.

Clearly, I need to write a lot more detail on this. Details matter
tremendously in programming language design, but while major
innovation happens at the large-scale of a language design, the
details are usually a matter of "not making mistakes" rather than
"being clever" - so relatively tedious!

<h1>Implementation</h1>

The CHROME implementation appears (to the rest of the system) to be an
interpreter, accepting CHROME expressions encoded in IRON and
returning the IRON value they evaluate to; however, closures are
represented as executable code, so evaluating the definition of a
function results in its compilation, something like how FORTH (and,
therefore, HYDROGEN) interprets source code to produce compiled
code. All the ARGON system components other than HYDROGEN (written in
a platform-specific language), [./helium.wiki|HELIUM] and IRON
(written in HYDROGEN), and the CHROME boostrap are written in CHROME -
even CHROME itself; the "CHROME bootstrap" is a very simple and naive







>
>
>
>
>
|
>
>
>
>

|
|
|











>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

>
>
>



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
|
|











|







36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
As a language, it's a purely functional language with a strong LISPy
flavour, representing code in the [./iron.wiki|IRON] data model rather
than s-expressions. The IRON data model also serves as the fundamental
data model of CHROME itself; CHROME programs manipulate IRON
values. Functions are first-class and bindings are lexically scoped,
so functions close over their arguments.

Laziness can come with a performance cost, but the Haskell folks have
done lots of good work on strictness analysis. In exchange, you can
write conditional constructs without needing macros, streams, infinite
data structures, some performance <em>gains</em>, and easier semantics
for recursive expressions.

Uniqueness typing has [many many
advantages](http://home.pipeline.com/~hbaker1/Use1Var.html), so we'll
use that instead of nasty monads. But we can do monads too, when
they're useful.

Haskell-style typeclasses and generic functions are used to provide
polymorphism, rather than class- or prototype-based object
orientation.

Hygienic macros (with a module system based on the
[./carbon.wiki|CARBON] namespaces used for IRON symbols) are provided
for syntactic extensibility. However, they are "second-class values";
they are bound in the lexical environment just like functions and
other values, but they're not fully first-class as there is a
constraint that macros in code to be compiled <i>must be statically
deducible</i>, in that every reference to them must be through an
expression whose value can be found at compile time. Attempting to
apply a macro at run time is an error.

Macros are arbitrary code, given access to the input expression as a
"syntax object", decorating the original source code with useful
metadata such as the position in the original input source code
expression (so error locations can be mapped back) and the lexical
environments in effect. Lexical environments are first-class objects;
in fact, they're CARBON knowledge bases containing name->value
bindings, type information, and other arbitrary declarations
(documentation, typeclasses, etc).

But for common cases, we'll implement <tt>syntax-rules</tt> style
rewriting macros as a library in terms of the core's low-level macros.

The language defines a range of primitives, for all the usual
arithmetic and other core functions that can't practically be
implemented otherwise. These primitives are what core language symbols
are bound to. They are a subtype of functions, in that they can be
applied directly, but unlike functions in general, they allow equality
comparison. A macro can tell if an application is a primitive, and
which one it is.

Bounded-space tail calls and first-class partial continuations provide
for flow control, but I need to think more about how that will
interact with the uniqueness types. A continuation may contain many
closed-over unique values, so the continuation itself <em>must</em> by
a unique object. That's fine for implementing threading with
continuations, but what about <tt>amb</tt>, exceptions, and so on?
Perhaps we need a means for capturing a non-unique continuation that
only works if the environment we close over in the continuation
contains no unique values. Code that handles mutation by passing a
chain of unique values through it, but which contains try/throw/catch
exceptions, would need the <tt>throw</tt> to embed ALL unique values
currently in scope (but not in scope at the <tt>catch</tt>) in the
exception object to avoid leaving them in the abandoned continuation;
so invoking a continuation (and thus abandoning the current
continuation) would be forbidden if the current continuation closed
over unique values (and was therefore unique). You'll need to
explicitly destroy them, or pass them to the new continuation.

There's no need for <tt>unwind-protect</tt> or <tt>dynamic-wind</tt>
in a pure language, thankfully.

Dynamically-scoped parameters are a boon in Scheme, so we'll have
them, too.

Static typing is important, as it can help to detect a wide range of
programming errors. However, it can't become a straitjacket. It's
useful in Scheme to be able to <tt>(read)</tt> arbitrary
s-expressions. Haskell's typeclass generics go a long way, but it
should be perfectly possible to have some bindings that we know
nothing static about - which are entirely dynamically typed -
alongside bindings with very narrowly defined static types, which can
be implemented fully unboxed in the compiled code, with everything
specialised, and type violations detected at compile time. So let's
have a type hierarchy that's a pointed complete partial order, giving
us an "Any" type we can use in those cases.

Dependant types are awesome, too. Let's have those while we're at
it. We're now wel into "Type system is Turing complete" territory, so
we'd better have function totality as part of the type system to keep
our type expressions halting.

[Idris](https://www.idris-lang.org/) does a pretty epic job of making
such a type system work in practice, but I do find the Haskelly syntax
for types a bit obtuse at times; it's too pattern-matchy for me. The
way constraints are written in "design by contract" languages (eg,
Eiffel) are much more readable, to my eyes at least - and as far as I
can see, identical in expressiveness. Design by Contract and dependent
types are much the same idea, just expressed in different ways, so
let's bring them together; we can statically check the constraints
that are amenable to that, and leave anything else to run time
checking. It just becomes another aspect of gradual typing.

Currying is sometimes handy, but (I find) often conflates what's going
on; so let's not put currying in the core syntax. We'll still have
single-argument lambdas, but we'll pass tuples to them, thereby making
it practical to get at the tuple as a first-class value to implement
variadic functions. The tuples will normally be unique values, created
afresh, so the system will have little difficulty in keeping them off
the heap.

In summary, it looks a lot like a Scheme/Clean/Idris hybrid written in
IRON syntax, without mutation and with generic functions in the core;
and with macros as second-class values.

Clearly, I need to write a lot more detail on this. Details matter
tremendously in programming language design, but while major
innovation happens at the large-scale of a language design, the
details are usually a matter of "not making mistakes" rather than
"being clever" - so relatively tedious!

<h1>Implementation</h1>

The CHROME implementation appears (to the rest of the system) to be an
interpreter, accepting CHROME expressions encoded in IRON and
returning the IRON value they evaluate to. However, closures are
represented as executable code, so evaluating the definition of a
function results in its compilation, something like how FORTH (and,
therefore, HYDROGEN) interprets source code to produce compiled
code. All the ARGON system components other than HYDROGEN (written in
a platform-specific language), [./helium.wiki|HELIUM] and IRON
(written in HYDROGEN), and the CHROME boostrap are written in CHROME -
even CHROME itself; the "CHROME bootstrap" is a very simple and naive
100
101
102
103
104
105
106













































allowing them to directly access hardware interfaces. User code will
not be given this option - any access to hardware facilities beyond
the standard library will have to be provided indirectly through
system components that expose a "safe" access-controlled interface.

I'm leaning towards the CPS transformation as an implementation model,
[http://lambda-the-ultimate.org/node/2406|as documented elsewhere].




















































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
allowing them to directly access hardware interfaces. User code will
not be given this option - any access to hardware facilities beyond
the standard library will have to be provided indirectly through
system components that expose a "safe" access-controlled interface.

I'm leaning towards the CPS transformation as an implementation model,
[http://lambda-the-ultimate.org/node/2406|as documented elsewhere].

The structure of the implementation is a pipeline:

<ol>

<li>Input IRON source code is converted into syntax objects,
node-for-node. Every node is annotated with the path to it from the
root in the original source, and is bound to the initial default
lexical environment, which is filled with bindings from modules the
source code imports.</li>

<li>Macro-expansion occurs, by walking the source tree from the top
down, performing partial evaluation; names are looked up in the
environment, and the values they refer to are substituted for them, so
environments are eliminated as we move down the expression. When an
application turns out to be a macro application, the macro is applied
to the body syntax object, and the result (with a possibly modified
environment, if the macro chose to do so) replaces the subexpression
for further expansion. Macros may recursively invoke the expander on
their arguments, in order to manipulate the post-expansion code; if
the input to the expander had no free variables then this will be
fully reduced to the kernel language, but otherwise, it may only be
partially reduced (and the expansion of the result of the macro will
ensure that it is fully reduced). Where a macro creates new syntax
objects afresh, they will inherit the environment and location path
from the macro application syntax object; where it substitutes values
into a template defined in the macro body, then the nodes in that
template will keep their environment and location path from the macro
definition; and nodes passed through from the macro call will, of
course, preserve their metadata from the input. Therefore, every node
in the output has useful metadata.</li>

<li>The result of macro-expansion is an expression in the minimal
kernel language, which is then ripe for type inference, and uniqueness
and totality verification. The partial evaluation process will have
simplified the code to a great extent, by performing applications of
known functions to known arguments and the like, and the code is at
this point an arbitrary DAG rather than a tree: shared subexpressions
may exist.</li>

<li>The result is an expression in the kernel language with full type
information and trust that the type information is correct, which the
implementation can then interpret or compile as it sees fit.</li>

</ol>