ARGON
Check-in [f22fe263a5]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:More elaboration of CARBON
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:f22fe263a5231c28bb39d474ebcd02a47ccb649e
User & Date: alaric 2012-12-06 20:23:12
Context
2013-06-27
15:42
Thoughts on content-addressible storage check-in: 9ed1933bce user: alaric tags: trunk
2012-12-06
20:23
More elaboration of CARBON check-in: f22fe263a5 user: alaric tags: trunk
13:29
Put in links to in-progress intro pages, so I remember they exist.

Elaborated somewhat on carbon and iodine. check-in: 5b2e596027 user: alaric tags: trunk

Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to intro/carbon.wiki.

295
296
297
298
299
300
301







302
303
304
305
306
307
308
...
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362



















































































363
364
365
366































367
368
369
370
371
372
373
...
463
464
465
466
467
468
469
470

471
472
473

474








475
476
477
478
479
480

Perhaps the most interesting thing about them is that they can "chain"
onto other knowledge bases (of any type), which will be consulted to
satisfy queries along with the main knowledge base. In the event of
any conflict, the main knowledge base has priority, and the chained
knowledge bases are listed in priority order when configured into the
knowledge base.








<h1>TUNGSTEN sections: Persistent KBs</h1>

Persistent storage of entity state in TUNGSTEN is handled by CARBON,
which uses the low-level B-Tree storage management of TUNGSTEN to
present a number of knowledge bases, each corresponding to a "section"
of the entity's TUNGSTEN storage, in close cooperation with
................................................................................

Note that the entity does not know its own CARBON name, as it might
not have one or might have many, so when opening a TUNGSTEN section
KB, an initial value for the default namespace needs to be supplied.

<h1>Remote KBs via [./mercury.wiki|MERCURY]</h1>

TODO: Caching and bittorrent.

Note that as the CARBON protocol opens up TUNGSTEN knowledge base
sections, it needs to know an initial default namespace - so the
CARBON name being used to access the entity must be supplied in
requests. If there is none (eg, we are just working from a raw EID),
then that EID is mapped into a CARBON symbol of the form
<tt><nowiki>/argon/eid/VERSION NUMBER/CLUSTER ID IN
BASE64[/CLUSTER-LOCAL PART OF ENTITY ID IN BASE64[/PERSONA
CLASS/PERSONA PARAMETERS IN BASE64]]</nowiki></tt>. <tt>/argon/eid</tt>
offers a gateway that resolves these names back to the encoded entity
IDs; if no cluster-local part is provided, then an empty string is
used, producing the EID of the cluster entity corresponding to the
cluster.

<h2>Magic super turbo performance optimisations</h2>

TODO: Direct access via MERCURY to the published TUNGSTEN sections,
including support for ACLs and personae (use the persona
class). Sketch the framework of support for a CDN by configuring the
cluster to forward the published TUNGSTEN sections of a configured
list of its entities out to nominated caching servers, and how clients
can be configured to contact caching servers.

How rules can be written whose body is not another CARBON query, but
instead a pointer to a request that is sent via MERCURY back to the
origin entity, actually causing entity code to run. This is necessary
for the interesting cases where we're not just publishing information
via CARBON, but instead exposing an actual service that generates or
otherwise obtains information on demand, ranging from computational
services, access to continuously-changing information sources such as
sensors, processes that require access to secret information to
generat a result, access to existing information systems (such as
gateways to the Internet or "legacy systems"), and so on.




















































































<h1>The Directory</h1>

A single global root EID, run by a non-profit foundation with a
suitable governance structure to prevent it ever being monopolised.
































<dl>

<dt><tt>/argon</tt></dt>

<dd>Where ARGON system software is published from. This is delegated
to a non-profit foundation (which may or may not be the same one as
................................................................................
Also talk about the subsequent ability to demand that a local copy of
any given namespace subtree be kept in the cluster at all times, in
effect overriding its original name but pointing to a snapshot stored
within the cluster. Note that CHROME modules list
their dependencies, which can be used to recursively local-copy
them. This is used to ensure that critical resources are available
"offline", and to configure the cluster to use a specific version of
something rather than "the latest".


Such local copies do not change when upstream changes occur, but an
administrator can view a list of newly available things and opt to

upgrade, or downgrade. This is like "installing software".









The local copies are stored in one or more nominated WOLFRAM
distributed caches, with an infinite replication factor and no expiry
timestamp or drop priority so they are kept until otherwise
specified. By default, they go to the cluster cache, so are replicated
to every node.







>
>
>
>
>
>
>







 







<
<
|












|
<
<
<
<
<
<
<












>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>




>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
>



>
|
>
>
>
>
>
>
>
>






295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
...
328
329
330
331
332
333
334


335
336
337
338
339
340
341
342
343
344
345
346
347
348







349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
...
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602

Perhaps the most interesting thing about them is that they can "chain"
onto other knowledge bases (of any type), which will be consulted to
satisfy queries along with the main knowledge base. In the event of
any conflict, the main knowledge base has priority, and the chained
knowledge bases are listed in priority order when configured into the
knowledge base.

Conflicts are detected by consulting the metadata attached to
tuple type symbols, which provide rules about what other
tuples conflict with tuples using that type symbol.

TODO: Re-read that book on updating logical databases and explain how
to handle this!

<h1>TUNGSTEN sections: Persistent KBs</h1>

Persistent storage of entity state in TUNGSTEN is handled by CARBON,
which uses the low-level B-Tree storage management of TUNGSTEN to
present a number of knowledge bases, each corresponding to a "section"
of the entity's TUNGSTEN storage, in close cooperation with
................................................................................

Note that the entity does not know its own CARBON name, as it might
not have one or might have many, so when opening a TUNGSTEN section
KB, an initial value for the default namespace needs to be supplied.

<h1>Remote KBs via [./mercury.wiki|MERCURY]</h1>



TODO: Note that as the CARBON protocol opens up TUNGSTEN knowledge base
sections, it needs to know an initial default namespace - so the
CARBON name being used to access the entity must be supplied in
requests. If there is none (eg, we are just working from a raw EID),
then that EID is mapped into a CARBON symbol of the form
<tt><nowiki>/argon/eid/VERSION NUMBER/CLUSTER ID IN
BASE64[/CLUSTER-LOCAL PART OF ENTITY ID IN BASE64[/PERSONA
CLASS/PERSONA PARAMETERS IN BASE64]]</nowiki></tt>. <tt>/argon/eid</tt>
offers a gateway that resolves these names back to the encoded entity
IDs; if no cluster-local part is provided, then an empty string is
used, producing the EID of the cluster entity corresponding to the
cluster.

<h2>Dynamic services</h2>








How rules can be written whose body is not another CARBON query, but
instead a pointer to a request that is sent via MERCURY back to the
origin entity, actually causing entity code to run. This is necessary
for the interesting cases where we're not just publishing information
via CARBON, but instead exposing an actual service that generates or
otherwise obtains information on demand, ranging from computational
services, access to continuously-changing information sources such as
sensors, processes that require access to secret information to
generat a result, access to existing information systems (such as
gateways to the Internet or "legacy systems"), and so on.

Note that this is only for READING data. Anything that causes changes
to the world needs to be a separate MERCURY action via an EID.

<h2>Direct publishing from TUNGSTEN</h2>

TODO: Direct access via MERCURY to the published TUNGSTEN sections,
including support for ACLs and personae (use the persona
class).

Sketch the framework of support for a CDN by configuring the
cluster to forward the published TUNGSTEN sections of a configured
list of its entities out to nominated caching servers, distributed
close to spots of anticipated demand.

<h2>Caching</h2>

TODO: Talk about cache-control metadata on CARBON results in the
protocol, how they're generated from metadata in TUNGSTEN or
explicitly via dynamic services, and how they can be cached by the
client entity (or a shared proxy?) to reduce load on the server and
decrease latency/bandwidth usage for the client.

<h2>Peer-to-peer sharing</h2>

TODO: Talk about how multiple concurrent downloads of the same
knowledge packet (perhaps detected because it's from the same TUNGSTEN
static section, perhaps detected by some hashing scheme) can be
detected and subsequent requests for it told to fetch already-fetched
blocks from peers who have already downloaded it, sharing the
distribution load in the manner of BitTorrent.

To support this, downloads of large CARBON responses are automatically
split into blocks by byte range, and a MERCURY connection used to
stream them down. However, the protocol on that connection allows for
the server to suggest that a block be fetched from another MERCURY
endpoint (actually, a list of them is included, to be tried in some
order until success is obtained), along with the hash the block should
have if it's not been tampered with. The client can still explicitly
request the block from the server, though, if it can't fetch it from a
peer. So the server serves as both a tracker and a seeder of last
resort, in Bittorrent parlance.

This is also the mechanism by which clients can be directed to CDN
servers that have been set up.

<h2>Why's it so complex?</h2>

The CARBON-over-MERCURY protocol is fairly complex, and here's my
justification for why.

On one end of the spectrum, I want it to be as fast as DNS for the
common case of following the series of links that let one resolve a
symbol into information about it. The basic request for information
about a name is a simple MERCURY protocol operation; as long as the
request and response fit into an MTU, it can be handled as a single
UDP packet in each direction, just like DNS. And bigger responses can
be handled by performing an [./iridium.wiki|IRIDIUM] connection
handshake and then streaming the results.

However, in the common case of public published data without any ACLs,
those responses can be lifted direct from disk (or in-memory disk
cache!) on any node in the cluster without needing to fire up a
[./lithium.wiki|LITHIUM] handler for the entity being asked. And those
responses can be cached in the client cluster, meaning that any other
requests from within the same cluster can be satisfied from the
cache. And for very large responses, all the clusters that need it at
the same time can cooperate in a peer-to-peer broadcast network to
distribute it efficiently.

And where high demand is anticipated, you can pay the expense of
setting up CDN servers around the world, which the latest static data
is published to, and which clients are transparently directed to using
the peer-to-peer protocol; they're basically configurable extra
seeders, which new versions of the data are automatically sent to.

And in the less-common case where data isn't published in advance - it
can gateway back to the parent entity to compute data on the fly,
transparently to the end-user.

We're trying to cover a lot of cases here under a single unified
interface. So it's a bit complex, but I think that's justified in the
complexity it removes from elsewhere in the system.

<h1>The Directory</h1>

A single global root EID, run by a non-profit foundation with a
suitable governance structure to prevent it ever being monopolised.

TODO: Explain the means by which a knowledge base can assign EIDs to
objects, and how this can be used to recursively seek out published
information about the object identified by an IRON symbol, by starting
from the root EID and asking it for tuples involving objects
identified by symbols which are prefixes of a target symbol (a special
kind of query baked into the CARBON-over-MERCURY protocol), which at
the higher levels of the tree will usually just be pointers to EIDs
associated with parents of the target object; these can be recursively
queried, gathering any information about the target symbol that comes
up in the process, until we run out of parent EIDs to ask.

Therefore, any symbol maps to a chain of entities, containing at least
the root directory entity, and usually containing subsequent child
directory entities. The symbol may itself map to an entity, in which
case that entity can be asked what information the entity has about
"itself". If not, then the nearest parent entity is the authoratative
source of information about that object.

Therefore, any entity can create an arbitrarily large subtree of
objects within itself, using its own global name as a prefix, without
needing to actually create entities; they can be purely informational
objects, containing information but without any identity as an
entity. Or the entity can attach EIDs to them that are actually just
personae of its own EID; this is particularly useful for gateways to
external systems, which can map the external information structure in
a CARBON directory tree of objects, each of which appears as an entity
acting as a gateway to behaviour that is mapped to the remote
system. Or an entity may create actual entities as offspring of itself
and then add them to a directory it exports, making them independent
while still being children of itself in the CARBON tree.

<dl>

<dt><tt>/argon</tt></dt>

<dd>Where ARGON system software is published from. This is delegated
to a non-profit foundation (which may or may not be the same one as
................................................................................
Also talk about the subsequent ability to demand that a local copy of
any given namespace subtree be kept in the cluster at all times, in
effect overriding its original name but pointing to a snapshot stored
within the cluster. Note that CHROME modules list
their dependencies, which can be used to recursively local-copy
them. This is used to ensure that critical resources are available
"offline", and to configure the cluster to use a specific version of
something rather than "the latest". This effectively overrides the
remote cache-control headers with a local directive.

Such local copies do not change when upstream changes occur, but an
administrator can view a list of newly available things and opt to
upgrade, or downgrade when older versions are still available. This is
like "installing software". A mechanism to keep the current version
archived away somewhere for later downgrading, or even to fetch the
current remote version direct into the archive to try later or
offline, would be desirable. This suggests a storage model where a
given CARBON prefix maps to a dictionary mapping "source CARBON
name:version identifier" pairs to CARBON knowledge bundles, with an
ability to select one element from the dictionary as "current", and a
requirement for a "version" property that can be expressed with a
suitable CARBON tuple, to know what version is in a given bundle.

The local copies are stored in one or more nominated WOLFRAM
distributed caches, with an infinite replication factor and no expiry
timestamp or drop priority so they are kept until otherwise
specified. By default, they go to the cluster cache, so are replicated
to every node.