Ease of administration is quite fundamental to the design of ARGON. Running a secure installation of a computer system shouldn't have to be hard, and ARGON is designed so that you administer the cluster as a whole, with nodes only needing attention for matters relating purely to that node.

This page focusses on running an ARGON cluster; as discussed elsewhere it's possible to have single-node ARGON setups too, but they are largely a degenerate case of cluster management; they're a cluster and a node rolled into one.

The administrator's view of a cluster

Volumes

Within a cluster, entities are grouped into storage "volumes". The cluster's configuration controls which volumes are stored on which nodes, how that storage is handled (full replication of every entity to every node, or sharded replication where every entity is replicated to a configurable number of nodes), and which nodes should perform that entity's computations. However, this is an implementation detail, which different clusters might handle differently. The only way it's exposed outside of the cluster is that the information required to contact an entity over the network needs to identify the nodes that can perform computations for that entity (in order to handle the request). And that's down to the volume containing that entity. So, despite volumes being an implementation detail of a cluster, the details of the cluster hosting an entity is not public knowledge - only details of the particular volume. In principle, volumes could be migrated between clusters.

Every volume has a "volume entity" that represents it in the ARGON world. However, it can be any entity whatsoever - there is no special "volume entity protocol" it needs to provide. It cannot be deleted, and it needs to exist purely because the procedure for creating a new entity involves specifying an existing entity to create the new entity "alongside", because the new entity is created in the same volume as the existing one. So every volume must have an entity in it to act as the initial point to create other entities from.

The Cluster

Every cluster has a cluster entity that represents the entire cluster. This entity is a repository for metadata about the cluster itself that needs to be known to every node in the cluster. It is provided as part of the ARGON system, and it also happens to be the volume entity for a "cluster volume" that is replicated to every single node in the cluster. It contains the list of volumes in the cluster, including the volume configuration for each:

A UUID
A human-readable name
A special role flag ("None", "Cluster", "Cluster Security", "Node") for volumes of special significance.
Which nodes the volume is stored on
The replication configuration (full replication, or sharded, specifying how many replicas to make of each entity)
Which nodes should perform computation for entities in that volume (with priorities, so there can be fallbacks in case of failure or overload). If any of these nodes do not store the volume's data, they still get a copy of the volume's private key so they can handle requests.
Which of those nodes' IP addresses should be published to the world as contact points for handling requests to entities within the volume).
The volume's public key for day to day use
The volume's master public key, only for signing new public keys
The volume's classification (which, for a node volume, also functions as the clearance for the corresponding node; and for the cluster security volume, is the set of security labels available in the cluster).

There will also be some other cluster-wide configuration, as mentioned elsewhere on this site.

There is no need for a list of nodes, as every node has a corresponding node volume, and there's a list of all volumes. Just look for volumes with the role flag set to "Node" and you have a list of nodes and the public keys you can use to communicate with them securely.

Although it is possible, if the ACL set on the cluster entity permits you, to create other entities in the cluster volume, I'm not sure if you'd want to. They will be replicated to every node in the cluster, but for every use case I can imagine you'd be better off creating a task-specific volume, even if you replicate it to every node in the cluster anyway. Additionally, it's possible for there to be ARGON nodes with no local TUNGSTEN storage; if any of these are in the cluster then attempts to create entities in the cluster volume will fail as they have nowhere to store them.

That cluster-wide configuration is cryptographically signed, using the public key of the "Cluster Security" volume (which is in the list, so every node can check it). The cluster entity will only accept an updated configuration if it is signed by the Cluster Security volume key (the CURRENT one, not the new one being supplied...).

The "Cluster Security" volume is only replicated to trusted nodes in the cluster, because it contains secrets. Its volume entity is the special "Cluster Security Entity" (CSE), provided as part of the ARGON system, which contains the following:

Two private keys for every volume, corresponding to their two public keys in the cluster entity's volume list
Symmetric keys for every volume (used for internal communications)

It provides a cluster administration protocol to allow cluster admins to alter the configuration, both the shared part stored in the cluster entity and the secret bits in the CSE. If thse shared part changes, the CSE submits this to the cluster entity and, if the cryptographic signature validates, replicates it across the cluster.

Again, it would be possible to create other nodes in the cluster security volume, but I can't think of a good reason to do so.

Nodes

Each physical node gets an entity, too, and it's the volume entity of a volume unique to that node. This is why the cluster's configuration didn't include a list of nodes, just a list of volumes - the volumes with role "Node" represent the nodes. The node entity is created by the system when a new node is installed.

The storage of this entity contains the node's hardware configuration, and the ARGON kernel/boot code to run on that entity. Because the volume is replicated purely to that one node, the node has a local copy of its software and configuration so it can boot itself and then (using its local copy of the fully-replicated cluster entity in the cluster volume) communicate with other nodes in the cluster.

The node entity does not contain references to software modules to implement its interfaces (although it may publish a reference to a user interface module to enable humans to interact with it), because all requests to the node entity are directly handled by the NITROGEN component of the ARGON kernel on that node.

If the node has user-interface hardware (display screens, keyboards, sound hardware, pointing devices, etc) then, as well as the ARGON kernel, it can have a suitable implementation of NEON, the user interface framework. This will be loaded as a "kernel module" to drive the UI hardware and provide the ARGON user interface.

The node entity will also provide SNMP-like instrumentation of node health and resource utilisation, for centralised hardware maintenance, and APIs to perform administrative operations like reboots or shutdowns.

It will provide interfaces to local resources accessible to that node, such as things advertised to it on the network.

And finally, it will provide APIs to access low-level hardware interfaces on the node, such as system busses and memory mapped I/O. Although these are MERCURY APIs, many of them may only be accessible to entities running on the same physical node, and they may cause exciting effects beyond what is normally possible through a MERCURY API - such as registering arbitrary handlers within the calling entity as LITHIUM entrypoints for HELIUM hardware interrupt handlers, creating magic bytevector objects that are backed on memory-mapped hardware, and so on.

The node entity is also the starting point for creating other entities in the node's volume, if the node has TUNGSTEN storage, leading to them being uniquely stored on that node. This might be useful to the owner of a node such as a laptop, to be able to create local stuff, but it's more importantly used for driver entities for hardware attached to that node...

Devices

A physical resource such as a printer would be represented within ARGON as an entity providing a "print" protocol. This protocol would allow querying of the physical capabilities of the device, and submitting a print job using the standard "object push" protocol which lets people send things to it. Each job in the queue should be accessible as an entity itself, presumably by the printer entity returning its own entity ID with the job queue entry ID in the persona field; this virtual entity publishes job information, and provides an interface to cancel the job (through the standard administrative interface to delete an entity), pause it, resume it, and so on.

This printer entity would contain state pertaining to the current queue, and reference a software module to provide the printing API, and configuration to specify a hardware driver software module. It would be created inside the node volume of the node that printer hardware is attached to (unless it's a networked printer, in which case it can be created anywhere) by an administrator. But if it's attached via USB or some other interface, then the administrator would need to add an entry to the node entity's ACL enabling the printer entity to access the USB subsystem via the node entity's low-level hardware APIs.

If the node is capable of auto-detecting the device, it might create this entity all by itself if it doesn't already exist.

Note that this driver model only applies to hardware outside of "user interface" equipment, which is handled by NEON; although NEON will probably provide some additional protocols on the node entity itself (being part of the kernel, it can Just Do That (tm)) for administrators to see if that node's NEON implementation currently has a logged-in user, to pop a message up on the screen ("This system will shut down for maintenance in ten minutes"), etc. However, a node with NEON-type hardware might be configured not to run NEON at all, but instead to make that hardware available via driver entities - perhaps if a sound output device isn't being used as part of a user interface, but is driving a loudspeaker for public-address purposes.

The view of a junior systems administrator

There's a bunch of servers, desktop machines, laptops, mobile devices and embedded control computers, called a cluster. Your job is to sit at your nice ARGON workstation and pass the time until a little alert pops up that something's gone wrong, whereupon you switch to the cluster status display that the cluster entity presents in its user interface to sufficiently trusted users (a nice map of the network topology joining the systems of the cluster together) and see what's flashing red. You acknowledge the alert, then go and look at the node or network link that's in trouble, and repair or replace it. Sometimes one of the nodes is just plain broken, in which case you tell the cluster to forget about it since it's not coming back. Sometimes you get new computers to add to the cluster, in which case you have to ask the cluster security entity to create a new node. This adds the node to the cluster's configuration (and creates a keypair for the node's volume), and then lets you download an installation image on a USB key (with the node volume's private key, a copy of the cluster entity, and the new node entity's state). You go and boot the new node from it. After answering a few questions about network setup (unless you're using DHCP) and confirming that it's OK to reformat the entire disk, it sits there for a while talking to the other nodes to get up to date, then the new node shows up as live on your cluster status display.

The other source of work is users. Sometimes, new users have to be created. You browse to the "Users" container entity in your cluster (you might have more than one, if you have a large and complex user base) and choose "Create New" then select the entity template "User", thereby creating a fresh new user agent entity whose ACL gives you administrative rights. Give them a name and a password, then head over to the access control database and assign them to some groups. And sometimes you delete users, too.

As usual, most of your time will be spent dealing with broken equipment and clueless users. Hopefully, ARGON just gets out of your way and automates everything it can.

A senior systems administrator

You have a nice cluster, with a few junior sysadmins to keep everything up and running. And you have users, who want to store lots of entities in your nice cluster.

Your job is to decide how to do it.

You create high-level administrative subdivisions called volumes within the cluster, by browsing to the cluster entity and telling it to do so. There's a "Cluster" volume already waiting for you when the system is installed, containing an entity for the cluster itself. And there's a volume correspoding to each node, representing entities which are only available on that one node, which contains the node's own administrative entity. And there's a Cluster Security volume for trusted storage of the cluster's core cryptographic secrets.

You create volumes with names like "Management Users", "Shop Floor Users", "Customers", and the like. You grant permission to create entities in these volumes to your underlings. You create a CARBON directory in the cluster volume and start to put links to useful entities that your users will need into it, which become available as starting points to users browsing from machines in your cluster. You tell it which nodes are on the same LAN as each other, and then how those LANs are joined, so it can more efficiently route its group communications. You tell it which nodes are trusted with sensitive volumes, and which aren't. You monitor system load across the cluster, and alter the settings on volumes to trade off availability, performance, and resource utilisation, and order new nodes whenever you need more storage or processing capacity.

As usual, most of your time will be spent dealing with internal politics, budget battles and clueless suppliers. Hopefully, ARGON just gets out of your way and automates everything it can.

What's going on under the hood

Many of the underlying mechanisms here have been explained above and in the Devloper's view, but a few things of special interest to administrators deserve elaboration.

Access control lists

As mentioned in the User's view and the developer's view, MERCURY requests via the network can be unauthenticated, authenticated by the originating entity ID, or authenticated via an arbitrary "pseudonym", which is a public key with some CARBON knowledge attached. Users can choose which to use, and code running in an entity's security context gets to choose too.

The MERCURY and CARBON handler configurations in the $MERCURY and $CARBON slices in the entity's state (or the corresponding slices for persona schemes) contain access control lists for MERCURY and CARBON access to the entity.

These lists specify who can access the MERCURY API (or particular operation with an API) or a piece of CARBON knowledge, and the entries in the list can be:

"*" for any request, including anonymous ones.
A specific entity ID.
A specific entity name (resolved to an ID via CARBON, and cached for a while).
A pseudonym's public key.
The name of another access control list published in CARBON (which will be fetched and cached for a while).

The latter option allows for the creation of central lists, which resources can access in their ACLs, allowing administrators to add an entity or a pseudonum to the central list to grant them access to a wide variety of services. These lists effectively represent "roles" in a role-based access control system, albeit with storage inverted (the role references the user, rather than vice versa) in a manner more suited to distributed management.

An ACL can also contain blacklist entries; requests matching a blacklist entry are rejected, even if another entry would allow them.

An ACL entry can also be annotated with other configuration. Currently, the only defined option is a proof-of-work cost which must be submitted by requests from outside the cluster in order to access the endpoint.

Network layout configuration

An optional part of the cluster configuration is a map of the network layout, nominating groups of nodes and attaching some metadata to those groups relating to their network connectivity. For instance, a group of nodes might be marked as being connected with a trusted network (up to a given security clearance), in which case internal communications between those nodes can reduce or omit encryption and authentication checks. Groups might also be labelled as having economically preferential connectivity, meaning that communication between nodes in those groups is cheaper than between nodes in different groups, and the TUNGSTEN replication protocol will take that into account to attempt to minimise inter-group traffic at the cost of increased intra-group traffic. A group might specify expected bandwidth and packet loss rates within that group, to provide better starting defaults for the IRIDIUM link analysis protocol.

Entity administration

There is a hardcoded MERCURY API exposed by every entity (except the virtual entities created by minting an entity ID with a persona field). Even if the entity's state is completely empty and there isn't even a $MERCURY slice, this API is available, and its behaviour is provided by the ARGON kernel - the entity's state cannot reference code to handle it.

An entity's state can provide an ACL to restrict access to the admin API in $MERCURY, but the volume configuration in the cluster entity contains a volume admin ACL for each volume, which can also grant access. So somebody given access to all or part of the admin API in the volume's master admin ACL gains that access to every entity in the volume. Also, every entity is granted full access to its own admin API as part of the entity's security context.

The admin API provides the following abilities:

Delete an entity (unless it's the volume entity).
Directly view and edit the entity's state in TUNGSTEN.
View and edit the subset of the entity's state that is the ACLs for the entity in the $MERCURY and $CARBON slices (this is provided as a convenience, and to enable this ability to be granted in an ACL without granting access to the entire state).
Put the entity into debug mode, and then trace/pause/introspect execution of code within that entity's context.
Create a new entity in the same volume as this entity.

The latter ability is the only way to create new entities (apart from volume entities, which are created by the system when a volume is created); to create entities in a volume, you need to be granted this ability on an existing entity in the volume.

When you invoke the create-an-entity handle, you need to provide an initial state for the entity. Normally, this would at least contain an ACL giving you full access to the admin API so you can then do something with it.

Volume administration

To create a new volume, you need to ask the Cluster Security Entity to create it for you. If it accepts your request (ACLs permitting), it will mint a new keypair for the volume and add it to the list, and roll out a new cluster-wide configuration to all nodes. Your request needs to contain the basic configuration for the volume, including the volume's admin ACL that grants access to the admin interface of every node in the volume. This creates an empty entity to be the volume entity, and returns its ID. You can then use the admin ACL powers you will have granted yourself to set up that volume entity.

Although any arbitrary entity can be the volume entity, and it cannot be delete via the entity admin API, there is one slightly special thing about the volume entity. Much like the entity admin API, it has a second magically hardcoded MERCURY API implemented by the kernel, the volume admin API. As with the entity admin API, the entity's own state can do no more than provide an ACL for the volume admin API, but there is no volume-configuration-based ACL override for this one; the volume administrator can use their volume-configuration-based admin powers to edit the ACL of the volume entity to let themselves have the volume admin API.

The volume admin API provides the following abilities:

Delete the volume
List the IDs of all entities in the volume, the amount of storage space they're consuming, which nodes they're replicated to, and the ID of the entity that originally created them (which may be missing for the cluster and cluster security entities, which are created during cluster creation, as is the node entity for the first node in the cluster).

Mandatory access control

As explored in more detail in the Security page, the cluster configuration assigns every node a hierarchy of security classifications. Every node is assigned one or more security classifications, indicating that it is trusted to handle entity data at those classifications. Every volume is also assigned a set of classifications, indicating that only resources trusted with all of those classifications may process entities within that volume.