View Ticket
Ticket UUID: 1d491d7ec289ee532db6e58344e09531a819faf1
Title: Archive sync facility
Status: Open Type: Feature_Request
Severity: Cosmetic Priority: 2_Medium
Subsystem: Archival_Frontend Resolution: Open
Last Modified: 2016-09-25 18:48:29
Version Found In:
I propose to keep mirrors of my personal archive in various places, in multiple Ugarit vaults.

Therefore, these mirrors of my archive will diverge!

So it would be useful to have a tool where I can do:

ugarit archive-sync src-ugarit.conf dest-ugarit.conf

It'll look for all objects in src but not in dest, or in both but with different metadata, and copy them (or just the metadata) from src to dest.


ugarit archive-sync --merge src-ugarit.conf dest-ugarit.conf

...which will copy objects in src but not in dest to dest, and vice versa, and objects in both but with different metadata will bring up an editor with the metadata for merging, and apply it to both.

User Comments:
alaric added on 2015-06-16 13:00:09:

Here's a plan for how to implement this.

To start with, we need a function that takes two vaults and "synchronises" a given hash from source to destination, by ensuring that it is fully present in the destination. This is easily done; if the hash exists in the destination, we are done. Otherwise, read the object from the source, dispatch on its type to extract all referenced hashes from it, and recursively synchronise them from source to dest; then copy the object from source to dest.

This can be exposed via the API and CLI as a "tag mirror" function, that does the following, given two vaults and a tag name on each:

  1. Read tag from source
  2. Sanity-check tag on dest. It should either not exist, or point to a hash that is in the history of the source chain, as otherwise we'll be orphaning the current contents of that tag!
  3. Sync the object pointed to by the source tag to the destination vault
  4. Update the destination tag to point to the new hash

This is useful for, say, mirroring a given snapshot tag to an offsite vault, or distributing central updates to an archive tag out to a number of local vaults.

But given that functionality, we can now write an "archive inter-vault merge" function, as an API and a CLI command. Given source/destination vaults and tag names, we can:

  1. Check that the tag exists on the source and is an archive tag; and it either does not exist or exists and is an archive tag on the dest.
  2. Read entire current metadata from source tag, into a hashtable on the key.
  3. Read entire current metadata from destination tag, removing identical objects from the hash table, and for objects that exist in both but have different metadata, invoking a callback to find the metadata to use (or #f to skip it, removing it from the hashtable). Objects from the destination not found in the hashtable can be ignored. The callback may abort by never calling its continuation, so dynamic-wind anything of importance around it.
  4. The hashtable is now a manifest of things to transfer. Synchronise all content hashes, then import the hashtable as a single import manifest.

The CLI can provide the following options for the conflict resolution callback:

  • Source metadata wins
  • Destination metadata wins
  • Newest metadata wins (based on the mtime of the import the metadata came from)
  • Concatenate the metadata together into one, except for properties identical in both.
  • Create a text file containing the concatenated metadata (with many comments explaining what came from where and what duplicates were removed), and run an arbitrary shell command on it. If the file comes back with valid metadata, use it. If it comes back just reading "skip", skip that object by returning #f. If it comes back just reading "abort", abort the operation. If it comes back with anything else, re-run the command with the result, plus an error message explaining the problem.

This will give us a way to synchronise changes to an archive into another archive, which can later be extended to offer a bi-directional synchronisation.

It also gives us, as a side-effect, a way to mirror a tag of either kind from one vault to another; a lower-level operation, but one that can be used on archives and snapshots, and which ensures hash consistency.

alaric added on 2016-09-25 18:48:29:
Also, we should support having a filter on the source archive (using the existing search interface) - only matching archive entries get synced across!