Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Updated CHROME, with thoughts from reading Design Concepts in Programming Languages and reading about Idris. |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
971887d31ce54b193603171e3895d7a0 |
User & Date: | alaric 2017-07-11 20:59:55 |
Context
2017-07-11
| ||
21:01 | Fixed wiki markup links check-in: c1d292a038 user: alaric tags: trunk | |
20:59 | Updated CHROME, with thoughts from reading Design Concepts in Programming Languages and reading about Idris. check-in: 971887d31c user: alaric tags: trunk | |
2017-06-02
| ||
21:57 | Expounded on Iodine Object Push, and wondered about special support for it within MERCURY. check-in: 6ad45702f4 user: alaric tags: trunk | |
Changes
Changes to intro/chrome.wiki.
1 2 3 | CHROME is the high-level programming language that the bulk of ARGON itself, and user applications written above ARGON, will be written in. [./hydrogen.wiki|HYDROGEN] already provides a programming | | | | 1 2 3 4 5 6 7 8 9 10 11 12 | CHROME is the high-level programming language that the bulk of ARGON itself, and user applications written above ARGON, will be written in. [./hydrogen.wiki|HYDROGEN] already provides a programming language, but it's terribly low level, and "unsafe"; HYDROGEN has a significant level of access to the low-level aspects of its own implementation. In HYDROGEN one can write code which writes random numbers to random locations in memory until the system falls over. Therefore, HYDROGEN code cannot be accepted from untrusted sources. Apart from low-level libraries, things involved in bootstrapping, and the occasional bit of hand-coded stuff that requires particularly high performance, as much HYDROGEN code as possible should be generated by higher-level language compilers - such |
︙ | ︙ | |||
36 37 38 39 40 41 42 | As a language, it's a purely functional language with a strong LISPy flavour, representing code in the [./iron.wiki|IRON] data model rather than s-expressions. The IRON data model also serves as the fundamental data model of CHROME itself; CHROME programs manipulate IRON values. Functions are first-class and bindings are lexically scoped, so functions close over their arguments. | > > > > > | > > > > | | | > > > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | | | 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | As a language, it's a purely functional language with a strong LISPy flavour, representing code in the [./iron.wiki|IRON] data model rather than s-expressions. The IRON data model also serves as the fundamental data model of CHROME itself; CHROME programs manipulate IRON values. Functions are first-class and bindings are lexically scoped, so functions close over their arguments. Laziness can come with a performance cost, but the Haskell folks have done lots of good work on strictness analysis. In exchange, you can write conditional constructs without needing macros, streams, infinite data structures, some performance <em>gains</em>, and easier semantics for recursive expressions. Uniqueness typing has [many many advantages](http://home.pipeline.com/~hbaker1/Use1Var.html), so we'll use that instead of nasty monads. But we can do monads too, when they're useful. Haskell-style typeclasses and generic functions are used to provide polymorphism, rather than class- or prototype-based object orientation. Hygienic macros (with a module system based on the [./carbon.wiki|CARBON] namespaces used for IRON symbols) are provided for syntactic extensibility. However, they are "second-class values"; they are bound in the lexical environment just like functions and other values, but they're not fully first-class as there is a constraint that macros in code to be compiled <i>must be statically deducible</i>, in that every reference to them must be through an expression whose value can be found at compile time. Attempting to apply a macro at run time is an error. Macros are arbitrary code, given access to the input expression as a "syntax object", decorating the original source code with useful metadata such as the position in the original input source code expression (so error locations can be mapped back) and the lexical environments in effect. Lexical environments are first-class objects; in fact, they're CARBON knowledge bases containing name->value bindings, type information, and other arbitrary declarations (documentation, typeclasses, etc). But for common cases, we'll implement <tt>syntax-rules</tt> style rewriting macros as a library in terms of the core's low-level macros. The language defines a range of primitives, for all the usual arithmetic and other core functions that can't practically be implemented otherwise. These primitives are what core language symbols are bound to. They are a subtype of functions, in that they can be applied directly, but unlike functions in general, they allow equality comparison. A macro can tell if an application is a primitive, and which one it is. Bounded-space tail calls and first-class partial continuations provide for flow control, but I need to think more about how that will interact with the uniqueness types. A continuation may contain many closed-over unique values, so the continuation itself <em>must</em> by a unique object. That's fine for implementing threading with continuations, but what about <tt>amb</tt>, exceptions, and so on? Perhaps we need a means for capturing a non-unique continuation that only works if the environment we close over in the continuation contains no unique values. Code that handles mutation by passing a chain of unique values through it, but which contains try/throw/catch exceptions, would need the <tt>throw</tt> to embed ALL unique values currently in scope (but not in scope at the <tt>catch</tt>) in the exception object to avoid leaving them in the abandoned continuation; so invoking a continuation (and thus abandoning the current continuation) would be forbidden if the current continuation closed over unique values (and was therefore unique). You'll need to explicitly destroy them, or pass them to the new continuation. There's no need for <tt>unwind-protect</tt> or <tt>dynamic-wind</tt> in a pure language, thankfully. Dynamically-scoped parameters are a boon in Scheme, so we'll have them, too. Static typing is important, as it can help to detect a wide range of programming errors. However, it can't become a straitjacket. It's useful in Scheme to be able to <tt>(read)</tt> arbitrary s-expressions. Haskell's typeclass generics go a long way, but it should be perfectly possible to have some bindings that we know nothing static about - which are entirely dynamically typed - alongside bindings with very narrowly defined static types, which can be implemented fully unboxed in the compiled code, with everything specialised, and type violations detected at compile time. So let's have a type hierarchy that's a pointed complete partial order, giving us an "Any" type we can use in those cases. Dependant types are awesome, too. Let's have those while we're at it. We're now wel into "Type system is Turing complete" territory, so we'd better have function totality as part of the type system to keep our type expressions halting. [Idris](https://www.idris-lang.org/) does a pretty epic job of making such a type system work in practice, but I do find the Haskelly syntax for types a bit obtuse at times; it's too pattern-matchy for me. The way constraints are written in "design by contract" languages (eg, Eiffel) are much more readable, to my eyes at least - and as far as I can see, identical in expressiveness. Design by Contract and dependent types are much the same idea, just expressed in different ways, so let's bring them together; we can statically check the constraints that are amenable to that, and leave anything else to run time checking. It just becomes another aspect of gradual typing. Currying is sometimes handy, but (I find) often conflates what's going on; so let's not put currying in the core syntax. We'll still have single-argument lambdas, but we'll pass tuples to them, thereby making it practical to get at the tuple as a first-class value to implement variadic functions. The tuples will normally be unique values, created afresh, so the system will have little difficulty in keeping them off the heap. In summary, it looks a lot like a Scheme/Clean/Idris hybrid written in IRON syntax, without mutation and with generic functions in the core; and with macros as second-class values. Clearly, I need to write a lot more detail on this. Details matter tremendously in programming language design, but while major innovation happens at the large-scale of a language design, the details are usually a matter of "not making mistakes" rather than "being clever" - so relatively tedious! <h1>Implementation</h1> The CHROME implementation appears (to the rest of the system) to be an interpreter, accepting CHROME expressions encoded in IRON and returning the IRON value they evaluate to. However, closures are represented as executable code, so evaluating the definition of a function results in its compilation, something like how FORTH (and, therefore, HYDROGEN) interprets source code to produce compiled code. All the ARGON system components other than HYDROGEN (written in a platform-specific language), [./helium.wiki|HELIUM] and IRON (written in HYDROGEN), and the CHROME boostrap are written in CHROME - even CHROME itself; the "CHROME bootstrap" is a very simple and naive |
︙ | ︙ | |||
100 101 102 103 104 105 106 | allowing them to directly access hardware interfaces. User code will not be given this option - any access to hardware facilities beyond the standard library will have to be provided indirectly through system components that expose a "safe" access-controlled interface. I'm leaning towards the CPS transformation as an implementation model, [http://lambda-the-ultimate.org/node/2406|as documented elsewhere]. | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 | allowing them to directly access hardware interfaces. User code will not be given this option - any access to hardware facilities beyond the standard library will have to be provided indirectly through system components that expose a "safe" access-controlled interface. I'm leaning towards the CPS transformation as an implementation model, [http://lambda-the-ultimate.org/node/2406|as documented elsewhere]. The structure of the implementation is a pipeline: <ol> <li>Input IRON source code is converted into syntax objects, node-for-node. Every node is annotated with the path to it from the root in the original source, and is bound to the initial default lexical environment, which is filled with bindings from modules the source code imports.</li> <li>Macro-expansion occurs, by walking the source tree from the top down, performing partial evaluation; names are looked up in the environment, and the values they refer to are substituted for them, so environments are eliminated as we move down the expression. When an application turns out to be a macro application, the macro is applied to the body syntax object, and the result (with a possibly modified environment, if the macro chose to do so) replaces the subexpression for further expansion. Macros may recursively invoke the expander on their arguments, in order to manipulate the post-expansion code; if the input to the expander had no free variables then this will be fully reduced to the kernel language, but otherwise, it may only be partially reduced (and the expansion of the result of the macro will ensure that it is fully reduced). Where a macro creates new syntax objects afresh, they will inherit the environment and location path from the macro application syntax object; where it substitutes values into a template defined in the macro body, then the nodes in that template will keep their environment and location path from the macro definition; and nodes passed through from the macro call will, of course, preserve their metadata from the input. Therefore, every node in the output has useful metadata.</li> <li>The result of macro-expansion is an expression in the minimal kernel language, which is then ripe for type inference, and uniqueness and totality verification. The partial evaluation process will have simplified the code to a great extent, by performing applications of known functions to known arguments and the like, and the code is at this point an arbitrary DAG rather than a tree: shared subexpressions may exist.</li> <li>The result is an expression in the kernel language with full type information and trust that the type information is correct, which the implementation can then interpret or compile as it sees fit.</li> </ol> |