Installation
Install Chicken Scheme using their installation instructions.
Ugarit can then be installed by typing (as root):
chicken-install ugarit
See the chicken-install manual for details if you have any trouble, or wish to install into your home directory.
Setting up a vault
Firstly, you need to know the vault identifier for the place you'll be storing your vaults. This depends on your backend. The vault identifier is actually the command line used to invoke the backend for a particular vault; communication with the vault is via standard input and output, which is how it's easy to tunnel via ssh.
Local filesystem backends
These backends use the local filesystem to store the vaults. Of course, the "local filesystem" on a given server might be an NFS mount or mounted from a storage-area network.
Logfile backend
The logfile backend works much like the original Venti system. It's append-only - you won't be able to delete old snapshots from a logfile vault, even when I implement deletion. It stores the vault in two sets of files; one is a log of data blocks, split at a specified maximum size, and the other is the metadata: an sqlite database used to track the location of blocks in the log files, the contents of tags, and a count of the logs so a filename can be chosen for a new one.
To set up a new logfile vault, just choose where to put the two parts. It would be nice to put the metadata file on a different physical disk to the logs directory, to reduce seeking. If you only have one disk, you can put the metadata file in the log directory ("metadata" is a good name).
You can then refer to it using the following vault identifier:
"backend-fs splitlog ...log directory... ...metadata file..."
SQLite backend
The sqlite backend works a bit like a Fossil repository; the storage is implemented as a single file, which is actually an SQLite database containing blocks as blobs, along with tags and configuration data in their own tables.
It supports unlinking objects, and the use of a single file to store everything is convenient; but storing everything in a single file with random access is slightly riskier than the simple structure of an append-only log file; it is less tolerant of corruption, which can easily render the entire storage unusable. Also, that one file can get very large.
SQLite has internal limits on the size of a database, but they're quite large - you'll probably hit a size limit at about 140 terabytes.
To set up an SQLite storage, just choose a place to put the file. I
usually use an extension of .vault
; note that SQLite will
create additional temporary files alongside it with additional
extensions, too.
Then refer to it with the following vault identifier:
"backend-sqlite ...path to vault file..."
Filesystem backend
The filesystem backend creates vaults by storing each block or tag in its own file, in a directory. To keep the objects-per-directory count down, it'll split the files into subdirectories. Because of this, it uses a stupendous number of inodes (more than the filesystem being backed up). Only use it if you don't mind that; splitlog is much more efficient.
To set up a new filesystem-backend vault, just create an empty
directory that Ugarit will have write access to when it runs. It will
probably run as root in order to be able to access the contents of
files that aren't world-readable (although that's up to you), so
unless you access your storage via ssh or sudo to use another user to
run the backend under, be careful of NFS mounts that have
maproot=nobody
set!
You can then refer to it using the following vault identifier:
"backend-fs fs ...path to directory..."
Proxying backends
These backends wrap another vault identifier which the actual storage task is delegated to, but add some value along the way.
SSH tunnelling
It's easy to access a vault stored on a remote server. The caveat is that the backend then needs to be installed on the remote server! Since vaults are accessed by running the supplied command, and then talking to them via stdin and stdout, the vault identified needs only be:
"ssh ...hostname... '...remote vault identifier...'"
Cache backend
The cache backend is used to cache a list of what blocks exist in the proxied backend, so that it can answer queries as to the existance of a block rapidly, even when the proxied backend is on the end of a high-latency link (eg, the Internet). This should speed up snapshots, as existing files are identified by asking the backend if the vault already has them.
The cache backend works by storing the cache in a local sqlite file. Given a place for it to store that file, usage is simple:
"backend-cache ...path to cachefile... '...proxied vault identifier...'"
The cache file will be automatically created if it doesn't already exist, so make sure there's write access to the containing directory.
- WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
If you use a cache on a vault shared between servers, make sure that you either:
- Never delete things from the vault
or
- Make sure all access to the vault is via the same cache
If a block is deleted from a vault, and a cache on that vault is not aware of the deletion (as it did not go "through" the caching proxy), then the cache will record that the block exists in the vault when it does not. This will mean that if a snapshot is made through the cache that would use that block, then it will be assumed that the block already exists in the vault when it does not. Therefore, the block will not be uploaded, and a dangling reference will result!
Some setups which *are* safe:
- A single server using a vault via a cache, not sharing it with anyone else.
- A pool of servers using a vault via the same cache.
- A pool of servers using a vault via one or more caches, and maybe some not via the cache, where nothing is ever deleted from the vault.
- A pool of servers using a vault via one cache, and maybe some not via the cache, where deletions are only performed on servers using the cache, so the cache is always aware.
Writing a ugarit.conf
ugarit.conf
should look something like this:
(storage <vault identifier>) (hash tiger "<salt>") [double-check] [(compression [deflate|lzma])] [(encryption aes <key>)] [(cache "<path>")|(file-cache "<path>")] [(rule ...)]
Hashing
The hash line chooses a hash algorithm. Currently Tiger-192
(tiger
), SHA-256 (sha256
), SHA-384
(sha384
) and SHA-512 (sha512
) are supported;
if you omit the line then Tiger will still be used, but it will be a
simple hash of the block with the block type appended, which reveals
to attackers what blocks you have (as the hash is of the unencrypted
block, and the hash is not encrypted). This is useful for development
and testing or for use with trusted vaults, but not advised for use
with vaults that attackers may snoop at. Providing a salt string
produces a hash function that hashes the block, the type of block, and
the salt string, producing hashes that attackers who can snoop the
vault cannot use to find known blocks (see the "Security model"
section below for more details).
I would recommend that you create a salt string from a secure entropy source, such as:
dd if=/dev/random bs=1 count=64 | base64 -w 0
Whichever hash function you use, you will need to install the required Chicken egg with one of the following commands:
chicken-install -s tiger-hash # for tiger chicken-install -s sha2 # for the SHA hashes
Compression
lzma
is the recommended compression option for
low-bandwidth backends or when space is tight, but it's very slow to
compress; deflate or no compression at all are better for fast local
vaults. To have no compression at all, just remove the
(compression ...)
line entirely. Likewise, to use
compression, you need to install a Chicken egg:
chicken-install -s z3 # for deflate chicken-install -s lzma # for lzma
WARNING: The lzma egg is currently rather difficult to install, and needs rewriting to fix this problem.
Encryption
Likewise, the (encryption ...)
line may be omitted to have no
encryption; the only currently supported algorithm is aes (in CBC
mode) with a key given in hex, as a passphrase (hashed to get a key),
or a passphrase read from the terminal on every run. The key may be
16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
hex key, just supply it as a string, like so:
(encryption aes "00112233445566778899AABBCCDDEEFF")
...for 128-bit AES,
(encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
...for 192-bit AES, or
(encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
...for 256-bit AES.
Alternatively, you can provide a passphrase, and specify how large a key you want it turned into, like so:
(encryption aes (24|32 "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
I would recommend that you generate a long passphrase from a secure entropy source, such as:
dd if=/dev/random bs=1 count=64 | base64 -w 0
Finally, the extra-paranoid can request that Ugarit prompt for a passphrase on every run and hash it into a key of the specified length, like so:
(encryption aes (24|32 prompt))
(note the lack of quotes around prompt
, distinguishing it from a passphrase)
Please read the Security model documentationfor details on the implications of different encryption setups.
Again, as it is an optional feature, to use encryption, you must install the appropriate Chicken egg:
chicken-install -s aes
Caching
Ugarit can use a local cache to speed up various operations. If a path
to a file is provided through the cache
or
file-cache
directives, then a file will be created at
that location and used as a cache. If not, then a default path of
~/.ugarit-cache
will be used instead.
WARNING: If you use multiple different vaults from the same UNIX account, and the same tag names are used in those different vaults, and you use the default cache path (or explicitly specify cache paths that point to the same file), you will get a somewhat confused cache. The effects of this will be annoying (searches finding things that then can't be fetched) rather than damaging, but it's still best avoided!
The cache is used to cache snapshot records and archive import records. This is used by operations that extract snapshot history and archive objects; snapshots are stored in a linked list of snapshot objects, each referring to the previous snapshot. Therefore, reading the history of a snapshot tag requires reading many objects from the storage, which can be time-consuming for a remote storage! Similarly, archives are represented as a linked list of imports, and searching for an object in the archive can involve traversing the chain of imports until a match is found (and then searching on until the end to see if any further matches can be found!). The cache is even more important for archive imports, as it not only keeps a local copy of all the import information, it also records the "current" metadata of every object in the archive (so that we don't need to search through superceded previous versions of the metadata of an object when looking for something), and uses B-tree indexes to enable fast searching of the cached metadata.
If you configure the cache path with This significantly speeds up subsequent snapshots of a filesystem
tree. The file cache maps filenames to (mtime,size,hash) tuples; as it
scans the filesystem, if it finds a file in the cache and the mtime
and size have not changed, it will assume it is already stored under
the specified hash. This saves it from having to read the entire file
to hash it and then check if the hash is present in the vault. In
other words, if only a few files have changed since the last snapshot,
then snapshotting a directory tree becomes an O(N) operation, where N
is the number of files, rather than an O(M) operation, where M is the
total size of files involved.
WARNING: If you use a file cache, and a file is cached in it but then
subsequently deleted from the vault, Ugarit will fail to re-upload it
at the next snapshot. If you are using a file cache and you go
deleting things from your vault (should that be implemented in
future), you'll want to flush the cache afterwards. We might implement
automatic removal of deleted files from the local cache, but file
caches on other Ugarit installations that use the same vault will not
be aware of the deletion.
For example:
Be careful to put a set of parentheses around each configuration
entry. White space isn't significant, so feel free to indent things
and wrap them over lines if you want.
Keep copies of this file safe - you'll need it to do extractions!
Print a copy out and lock it in your fire safe! Ok, currently, you
might be able to recreate it if you remember where you put the
storage, but encryption keys and hash salts are harder to remember...
file-cache
rather
than just cache</cache>, then as well as the snapshot/archive
metadata caching, you will also enable file hash caching.
Other options
double-check
, if present, causes Ugarit to perform extra
internal consistency checks during backups, which will detect bugs but
may slow things down.
Example
(storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata'")
(hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
(encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
(compression lzma)
(file-cache "/var/ugarit/cache")