Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | More elaboration of CARBON |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
f22fe263a5231c28bb39d474ebcd02a4 |
User & Date: | alaric 2012-12-06 20:23:12 |
Context
2013-06-27
| ||
15:42 | Thoughts on content-addressible storage check-in: 9ed1933bce user: alaric tags: trunk | |
2012-12-06
| ||
20:23 | More elaboration of CARBON check-in: f22fe263a5 user: alaric tags: trunk | |
13:29 |
Put in links to in-progress intro pages, so I remember they exist.
Elaborated somewhat on carbon and iodine. check-in: 5b2e596027 user: alaric tags: trunk | |
Changes
Changes to intro/carbon.wiki.
︙ | ︙ | |||
295 296 297 298 299 300 301 302 303 304 305 306 307 308 | Perhaps the most interesting thing about them is that they can "chain" onto other knowledge bases (of any type), which will be consulted to satisfy queries along with the main knowledge base. In the event of any conflict, the main knowledge base has priority, and the chained knowledge bases are listed in priority order when configured into the knowledge base. <h1>TUNGSTEN sections: Persistent KBs</h1> Persistent storage of entity state in TUNGSTEN is handled by CARBON, which uses the low-level B-Tree storage management of TUNGSTEN to present a number of knowledge bases, each corresponding to a "section" of the entity's TUNGSTEN storage, in close cooperation with | > > > > > > > | 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 | Perhaps the most interesting thing about them is that they can "chain" onto other knowledge bases (of any type), which will be consulted to satisfy queries along with the main knowledge base. In the event of any conflict, the main knowledge base has priority, and the chained knowledge bases are listed in priority order when configured into the knowledge base. Conflicts are detected by consulting the metadata attached to tuple type symbols, which provide rules about what other tuples conflict with tuples using that type symbol. TODO: Re-read that book on updating logical databases and explain how to handle this! <h1>TUNGSTEN sections: Persistent KBs</h1> Persistent storage of entity state in TUNGSTEN is handled by CARBON, which uses the low-level B-Tree storage management of TUNGSTEN to present a number of knowledge bases, each corresponding to a "section" of the entity's TUNGSTEN storage, in close cooperation with |
︙ | ︙ | |||
321 322 323 324 325 326 327 | Note that the entity does not know its own CARBON name, as it might not have one or might have many, so when opening a TUNGSTEN section KB, an initial value for the default namespace needs to be supplied. <h1>Remote KBs via [./mercury.wiki|MERCURY]</h1> | < < | < | < < < < < < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 | Note that the entity does not know its own CARBON name, as it might not have one or might have many, so when opening a TUNGSTEN section KB, an initial value for the default namespace needs to be supplied. <h1>Remote KBs via [./mercury.wiki|MERCURY]</h1> TODO: Note that as the CARBON protocol opens up TUNGSTEN knowledge base sections, it needs to know an initial default namespace - so the CARBON name being used to access the entity must be supplied in requests. If there is none (eg, we are just working from a raw EID), then that EID is mapped into a CARBON symbol of the form <tt><nowiki>/argon/eid/VERSION NUMBER/CLUSTER ID IN BASE64[/CLUSTER-LOCAL PART OF ENTITY ID IN BASE64[/PERSONA CLASS/PERSONA PARAMETERS IN BASE64]]</nowiki></tt>. <tt>/argon/eid</tt> offers a gateway that resolves these names back to the encoded entity IDs; if no cluster-local part is provided, then an empty string is used, producing the EID of the cluster entity corresponding to the cluster. <h2>Dynamic services</h2> How rules can be written whose body is not another CARBON query, but instead a pointer to a request that is sent via MERCURY back to the origin entity, actually causing entity code to run. This is necessary for the interesting cases where we're not just publishing information via CARBON, but instead exposing an actual service that generates or otherwise obtains information on demand, ranging from computational services, access to continuously-changing information sources such as sensors, processes that require access to secret information to generat a result, access to existing information systems (such as gateways to the Internet or "legacy systems"), and so on. Note that this is only for READING data. Anything that causes changes to the world needs to be a separate MERCURY action via an EID. <h2>Direct publishing from TUNGSTEN</h2> TODO: Direct access via MERCURY to the published TUNGSTEN sections, including support for ACLs and personae (use the persona class). Sketch the framework of support for a CDN by configuring the cluster to forward the published TUNGSTEN sections of a configured list of its entities out to nominated caching servers, distributed close to spots of anticipated demand. <h2>Caching</h2> TODO: Talk about cache-control metadata on CARBON results in the protocol, how they're generated from metadata in TUNGSTEN or explicitly via dynamic services, and how they can be cached by the client entity (or a shared proxy?) to reduce load on the server and decrease latency/bandwidth usage for the client. <h2>Peer-to-peer sharing</h2> TODO: Talk about how multiple concurrent downloads of the same knowledge packet (perhaps detected because it's from the same TUNGSTEN static section, perhaps detected by some hashing scheme) can be detected and subsequent requests for it told to fetch already-fetched blocks from peers who have already downloaded it, sharing the distribution load in the manner of BitTorrent. To support this, downloads of large CARBON responses are automatically split into blocks by byte range, and a MERCURY connection used to stream them down. However, the protocol on that connection allows for the server to suggest that a block be fetched from another MERCURY endpoint (actually, a list of them is included, to be tried in some order until success is obtained), along with the hash the block should have if it's not been tampered with. The client can still explicitly request the block from the server, though, if it can't fetch it from a peer. So the server serves as both a tracker and a seeder of last resort, in Bittorrent parlance. This is also the mechanism by which clients can be directed to CDN servers that have been set up. <h2>Why's it so complex?</h2> The CARBON-over-MERCURY protocol is fairly complex, and here's my justification for why. On one end of the spectrum, I want it to be as fast as DNS for the common case of following the series of links that let one resolve a symbol into information about it. The basic request for information about a name is a simple MERCURY protocol operation; as long as the request and response fit into an MTU, it can be handled as a single UDP packet in each direction, just like DNS. And bigger responses can be handled by performing an [./iridium.wiki|IRIDIUM] connection handshake and then streaming the results. However, in the common case of public published data without any ACLs, those responses can be lifted direct from disk (or in-memory disk cache!) on any node in the cluster without needing to fire up a [./lithium.wiki|LITHIUM] handler for the entity being asked. And those responses can be cached in the client cluster, meaning that any other requests from within the same cluster can be satisfied from the cache. And for very large responses, all the clusters that need it at the same time can cooperate in a peer-to-peer broadcast network to distribute it efficiently. And where high demand is anticipated, you can pay the expense of setting up CDN servers around the world, which the latest static data is published to, and which clients are transparently directed to using the peer-to-peer protocol; they're basically configurable extra seeders, which new versions of the data are automatically sent to. And in the less-common case where data isn't published in advance - it can gateway back to the parent entity to compute data on the fly, transparently to the end-user. We're trying to cover a lot of cases here under a single unified interface. So it's a bit complex, but I think that's justified in the complexity it removes from elsewhere in the system. <h1>The Directory</h1> A single global root EID, run by a non-profit foundation with a suitable governance structure to prevent it ever being monopolised. TODO: Explain the means by which a knowledge base can assign EIDs to objects, and how this can be used to recursively seek out published information about the object identified by an IRON symbol, by starting from the root EID and asking it for tuples involving objects identified by symbols which are prefixes of a target symbol (a special kind of query baked into the CARBON-over-MERCURY protocol), which at the higher levels of the tree will usually just be pointers to EIDs associated with parents of the target object; these can be recursively queried, gathering any information about the target symbol that comes up in the process, until we run out of parent EIDs to ask. Therefore, any symbol maps to a chain of entities, containing at least the root directory entity, and usually containing subsequent child directory entities. The symbol may itself map to an entity, in which case that entity can be asked what information the entity has about "itself". If not, then the nearest parent entity is the authoratative source of information about that object. Therefore, any entity can create an arbitrarily large subtree of objects within itself, using its own global name as a prefix, without needing to actually create entities; they can be purely informational objects, containing information but without any identity as an entity. Or the entity can attach EIDs to them that are actually just personae of its own EID; this is particularly useful for gateways to external systems, which can map the external information structure in a CARBON directory tree of objects, each of which appears as an entity acting as a gateway to behaviour that is mapped to the remote system. Or an entity may create actual entities as offspring of itself and then add them to a directory it exports, making them independent while still being children of itself in the CARBON tree. <dl> <dt><tt>/argon</tt></dt> <dd>Where ARGON system software is published from. This is delegated to a non-profit foundation (which may or may not be the same one as |
︙ | ︙ | |||
463 464 465 466 467 468 469 | Also talk about the subsequent ability to demand that a local copy of any given namespace subtree be kept in the cluster at all times, in effect overriding its original name but pointing to a snapshot stored within the cluster. Note that CHROME modules list their dependencies, which can be used to recursively local-copy them. This is used to ensure that critical resources are available "offline", and to configure the cluster to use a specific version of | | > > | > > > > > > > > | 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 | Also talk about the subsequent ability to demand that a local copy of any given namespace subtree be kept in the cluster at all times, in effect overriding its original name but pointing to a snapshot stored within the cluster. Note that CHROME modules list their dependencies, which can be used to recursively local-copy them. This is used to ensure that critical resources are available "offline", and to configure the cluster to use a specific version of something rather than "the latest". This effectively overrides the remote cache-control headers with a local directive. Such local copies do not change when upstream changes occur, but an administrator can view a list of newly available things and opt to upgrade, or downgrade when older versions are still available. This is like "installing software". A mechanism to keep the current version archived away somewhere for later downgrading, or even to fetch the current remote version direct into the archive to try later or offline, would be desirable. This suggests a storage model where a given CARBON prefix maps to a dictionary mapping "source CARBON name:version identifier" pairs to CARBON knowledge bundles, with an ability to select one element from the dictionary as "current", and a requirement for a "version" property that can be expressed with a suitable CARBON tuple, to know what version is in a given bundle. The local copies are stored in one or more nominated WOLFRAM distributed caches, with an infinite replication factor and no expiry timestamp or drop priority so they are kept until otherwise specified. By default, they go to the cluster cache, so are replicated to every node. |