Ticket Hash: | dae5e21ffcd3d517e021d8b855fb86ff7d9a271a | |||
Title: | Archival mode | |||
Status: | Closed | Type: | Feature_Request | |
Severity: | UNSPECIFIED | Priority: | 2_Medium | |
Subsystem: | Archival_Frontend | Resolution: | Open | |
Last Modified: | 2014-11-02 01:49:53 | |||
Version Found In: | ||||
Description: | ||||
Currently, I have been implementing Ugarit's backup facility through its "snapshot" mode, but it's meant to be a backup *and archival* system.
Whereas snapshot mode takes a filesystem tree and adds it to a chain of snapshots of the same tree rooted at a tag, archival mode takes a filesystem tree and inserts it into a differently-structured thing called a library, also rooted at a tag. A library is implemented as a chain of snapshot-like blocks, each of which refers to the previous library in the chain, has a small amount of metadata, and points to a contents block, However, the contents is an s-expression stream of metadata entries. Each metadata entry has a hash (pointing to the root block of the archived filesystem tree, which may often be a raw file rather than a directory), then an alist mapping metadata keys to values. The metadata for a given archived filesystem tree may be superceded by later libraries in the chain, in which case the earlier metadata is ignored. The library metadata should be cached by the front-end, in an SQLite database, all keyed on the tag name. The hash of the latest library is stored in the cache, so that whenever the archive is opened, it can be compared to the current state of the library tag and the chain followed (processing updates as we go) until the previous point is found, thereby only importing the latest changes. The metadata of a given filesystem tree in the library is the metadata attached to it by the library entry, plus any metadata attached to the top-level library block itself, which is inherited to all metadata created in that library. The default virtual filesystem presented by the One that comes to mind is to specify a number of metadata keys. The virtual filesystem then has a directory level per metadata key, within which all filesystem trees with the given set of values, matching a global filter restriction, are found. By setting a global restriction of alaric added on 2012-04-16 13:20:06 UTC: Now, when we import a file into the archive by snapshotting it and then introducing a metadata record about it into an archive delta, we must check to see if the file already exists in that archive, so as to not overwrite previous rich metadata with naff initial auto-generated metadata. However, it might be nice to read in the previous metadata and append a new "archived from" entry specifying the hostname, location, and time. As the metadata is an alist, it will be easy to do this as long as "archived from" is a single property, so we can tie together the hostname/location/time triple as a single item. alaric added on 2012-05-04 10:35:07 UTC: So the metadata alist of a file might include one or more of: (archived-from "2011-04-32 22:45:01" "anger" "/home/alaric/projects/foo" "backup" "tar.gz") One would hope that the extension would be the same (modulo case?) for all the archives, but we can never be sure :-) When building search queries, it would therefore be nice to be able to say (second archived-from) to extract the second field from archived-from - or maybe even to define a table of aliases so we can say archived-from-hostname. | ||||
User Comments: | ||||
alaric added on 2014-11-02 01:49:53:
The basics are now there. From the command line, you can import a manifest of objects, search for objects matching a query string, list available properties of objects matching a query string, list available values of a property for objects matching a query string, stream a chosen object to stdout (if it's a file), or extract a chosen object to the filesystem. Next steps are [9c3ac71f94] for generic property-based explorer in the VFS, and [fff691ada2] for customised views, and [33fd928177] for a manifest generator. Outside of archival mode, but enhancing its utility tremendously, are a 9p/fuse/puffs client to allow mounting a vault as a read-only filesystem; and replicated storage [f1f2ce8cdc] - with archival mode, the vault starts to become the primary storage for data, rather than just a backup, and so internal vault replication for resilience becomes all the more important. Future work of note includes an archive tagging GUI [7b6588068f], a public gallery viewer for images [5b07f64457], and support for storing emails in an archive [ea1b7f9ad7]. |