Ugarit command-line reference

Your first backup

Think of a tag to identify the filesystem you're backing up. If it's /home on the server gandalf, you might call it gandalf-home. If it's the entire filesystem of the server bilbo, you might just call it bilbo.

Then from your shell, run (as root):

# ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>

For example, if we have a ugarit.conf in the current directory:

# ugarit snapshot ugarit.conf -c localhost-etc /etc

Specify the -c flag if you want to store ctimes in the vault; since it's impossible to restore ctimes when extracting from an vault, doing this is useful only for informational purposes, so it's not done by default. Similarly, atimes aren't stored in the vault unless you specify -a, because otherwise, there will be a lot of directory blocks uploaded on every snapshot, as the atime of every file will have been changed by the previous snapshot - so with -a specified, on every snapshot, every directory in your filesystem will be uploaded! Ugarit will happily restore atimes if they are found in a vault; their storage is made optional simply because uploading them is costly and rarely useful.

Exploring the vault

Now you have a backup, you can explore the contents of the vault. This need not be done as root, as long as you can read ugarit.conf; however, if you want to extract files, run it as root so the uids and gids can be set.

$ ugarit explore ugarit.conf

This will put you into an interactive shell exploring a virtual filesystem. The root directory contains an entry for every tag; if you type ls you should see your tag listed, and within that tag, you'll find a list of snapshots, in descending date order, with a special entry current for the most recent snapshot. Within a snapshot, you'll find the root directory of your snapshot under contents, and the detailts of the snapshot itself in propreties.sexpr, and will be able to cd into subdirectories, and so on:

> ls
localhost-etc/ <tag>
> cd localhost-etc
/localhost-etc> ls
current/ <snapshot>
2015-06-12 22:49:34/ <snapshot>
2015-06-12 22:49:25/ <snapshot>
/localhost-etc> cd current
/localhost-etc/current> ls
log.sexpr <file>
properties.sexpr <inline>
contents/ <dir>
/localhost-etc/current> cat properties.sexpr
((previous . "a140e6dbe0a7a38f8b8c381323997c23e51a39e2593afb61")
 (mtime . 1434102574.0)
 (contents . "34eccf1f5141187e4209cfa354fdea749a0c3c1c4682ec86")
 (stats (blocks-stored . 12)
  (bytes-stored . 16889)
  (blocks-skipped . 50)
  (bytes-skipped . 6567341)
  (file-cache-hits . 0)
  (file-cache-bytes . 0))
 (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
 (hostname . "ahe")
 (source-path . "/etc")
 (notes)
 (files . 112)
 (size . 6563588))
/localhost-etc/current> cd contents
/localhost-etc/current/contents> ls
zoneinfo <symlink>
vconsole.conf <symlink>
udev/ <dir>
tmpfiles.d/ <dir>
systemd/ <dir>
sysctl.d/ <dir>
sudoers.tmp~ <file>
sudoers <file>
subuid <file>
subgid <file>
static <symlink>
ssl/ <dir>
ssh/ <dir>
shells <symlink>
shadow- <file>
shadow <file>
services <symlink>
samba/ <dir>
rpc <symlink>
resolvconf.conf <symlink>
resolv.conf <file>
-- Press q then enter to stop or enter for more...
q
/localhost-etc/current/contents> ls -ll resolv.conf
-rw-r--r--     0     0 [2015-05-23 23:22:41] 78B/-: resolv.conf
key: #f
contents: "e33ea1394cd2a67fe6caab9af99f66a4a1cc50e8929d3550"
size: 78
ctime: 1432419761.0

As well as exploring around, you can also extract files or directories (or entire snapshots) by using the get command. Ugarit will do its best to restore the metadata of files, subject to the rights of the user you run it as.

Type help to get help in the interactive shell.

The interactive shell supports command-line editing, history and tab completion for your convenience.

Extracting things directly

As well as using the interactive explore mode, it is also possible to directly extract something from the vault, given a path.

Given the sample vault from the previous example, it would be possible to extract the README.txt file with the following command:

$ ugarit extract ugarit.conf /Test/current/contents/README.txt

Forking tags

As mentioned above, you can fork a tag, creating two tags that refer to the same snapshot and its history but that can then have their own subsequent history of snapshots applied to each independently, with the following command:

$ ugarit fork <ugarit.conf> <existing tag> <new tag>

Merging tags

And you can also merge two or more tags into one. It's possible to merge a bunch of tags to make an entirely new tag, or you can merge a tag into an existing tag, by having the "output" tag also be one of the "input" tags.

The command to do this is:

$ ugarit merge <ugarit.conf> <output tag> <input tags...>

For instance, to import your classical music collection into your main musical collection, you might do:

$ ugarit merge ugarit.conf my-music my-music classical-music

Or if you want to create a new all-music archive from the archives bobs-music and petes-music, you might do:

$ ugarit merge ugarit.conf all-music bobs-music petes-music

Archive operations

Importing

To import some files into an archive, you must create a manifest file listing them, and their metadata. The manifest can also list metadata for the import as a whole, perhaps naming the source of the files, or the reason for importing them.

The metadata for a file (or an import) is a series of named properties. The value of a property can be any Scheme value, written in Scheme syntax (with strings double-quoted unless they are to be interpreted as symbols), but strings and numbers are the most useful types.

You can use whatever names you like for properties in metadata, but there are some that the system applies automatically, and an informal standard of sorts, which is documented in docs/archive-schema.wiki.

You can produce a manifest file by hand, or use the Ugarit Manifest Maker to produce one for you. You do this by installing it like so:

$ chicken-install ugarit-manifest-maker

And then running it, giving it any number of file and directory names on the command line. When given directories, it will recursively scan them to find all the files contained therein and put them in the manifest; it will not put directories in the manifest, although it is perfectly legal for you to do so when writing a manifest by hand. This is because the manifest maker can't do much useful analysis on a directory to suggest default metadata for them (so there isn't much point in using it), and it's far more useful for it to make it easy for you to import a large number of files individually by referencing the directory containing them.

The manifest is sent to standard output, so you need to redirect it to a file, like so:

$ ugarit-manifest-maker ~/music > music.manifest

You can specify command-line options, as well. -e PATTERN or --exclude=PATTERN introduces a glob pattern for files to exclude from the manifest, and -D KEY=VALUE or --define=KEY=VALUE provides a property to be added to every file in the manifest (as opposed to an import property, that is part of the metadata of the overall import). Note that VALUE must be double-quoted if it's a string, as per Scheme value syntax.

One might use this like so:

$ ugarit-manifest-maker -e *.txt -D rating=5 ~/favourite-music > music.manifest

The manifest maker simplifies the writing of manifests for files, by listing the files in manifest format along with useful metadata extracted from the filename and the file itself. For supported file types (currently, MP3 and OGG music files), it will even look inside the file to extract metadata.

The manifest file it generates will contain lots of comments mentioning things it couldn't automatically analyse (such as unknown OGG/ID3 tags, or unknown types of files); and for metadata properties it thinks might be relevant but can't automatically provide, it suggests them with an empty property declaration, commented out. The idea is that, after generating a manifest, you read it by hand in a text editor to attempt to improve it.

The format of a manifest file

Manifest files have a relatively simple format. The are based on Scheme s-expressions, so can contain comments. From any semicolon (not in a string or otherwise quoted) to the end of the line is a comment, and #; in front of something comments out that something.

Import metadata properties are specified like so:

(KEY = VALUE)

...where, as usual, VALUE must be double-quoted if it's a string.

Files to import, with their metadata, are specified like so:

(object "PATH OF FILE TO IMPORT"
  (KEY = VALUE)
  (KEY = VALUE)...
)

The closing parenthesis need not be on a line of its own, it's conventionally placed after the closing parenthesis of the final property.

Ugarit, when importing the files in the manifest, will add the following properties if they are not already specified:

import-path: The path the file was imported from
dc:format: A guess at the file's MIME type, based on the extension
mtime: The file's modification time (as the number of seconds since the UNIX epoch)
ctime: The file's change time (as the number of seconds since the UNIX epoch)
filename: The name of the file, stripped of any directory components, and including the extension.

The following properties are placed in the import metadata, automatically:

hostname: The hostname the import was performed on.
manifest-path: The path to the manifest file used for the import.
mtime: The time (in seconds since the UNIX epoch) at which the import was committed.
stats: A Scheme alist of statistics about the import (number of files/blocks uploaded, etc).

So, to wrap that all up, here's a sample import manifest file:

(notes = "A bunch of old CDs I've finally ripped")

(object "/home/alaric/newrip/track01.mp3"
  (filename = "track01.mp3")
  (dc:format = "audio/mpeg")

  (dc:publisher = "Go! Beat Records")
  (dc:created = "1994")
  (dc:contributor = "Portishead")
  (dc:subject = "Trip-Hop")
  (superset:size = 1)
  (superset:index = 1)
  (set:title = "Dummy")
  (set:size = 11)
  (set:index = 1)
  (dc:creator = "Portishead")
  (dc:title = "Wandering Star")

  (mtime = 1428962299.0)
  (ctime = 1428962299.0)
  (file-size = 4703055))

;;... and so on, for ten more MP3s on this CD, then several other CDs...

Actually importing a manifest

Well, when you finally have a manifest file, importing it is easy:

$ ugarit import <ugarit.conf> <archive tag> <manifest path>

How do I change the metadata of an already-imported file?

That's easy; the "current" metadata of a file is the metadata of its most recent. Just import the file again, in a new manifest, with new metadata, and it will overwrite the old. However, the old metadata is still preserved in the archive's history; tags forked from the archive tag before the second import will still see the original state of the archive, by design.

Exploring

Archives are visible in the explore interface. For instance, an import of some music I did looks like this:

> ls
localhost-etc/ <tag>
archive-tag/ <tag>
> cd archive-tag
/archive-tag> ls
history/ <archive-history>
/archive-tag> cd history
/archive-tag/history> ls
2015-06-12 22:53:13/ <import>
/archive-tag/history> cd 2015-06-12 22:53:13
/archive-tag/history/2015-06-12 22:53:13> ls
log.sexpr <file>
properties.sexpr <inline>
manifest/ <import-manifest>
/archive-tag/history/2015-06-12 22:53:13> cat properties.sexpr
((stats (blocks-stored . 2046)
        (bytes-stored . 1815317503)
        (blocks-skipped . 9)
        (bytes-skipped . 8388608)
        (file-cache-hits . 0)
        (file-cache-bytes . 0))
 (log . "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
 (mtime . 1434135993.0)
 (contents . "fcdd5b996914fdcac1e8a6cfbc67663e08f6eaf0cc952e21")
 (hostname . "ahe")
 (notes . "A bunch of music, imported as a demo")
 (manifest-path . "/home/alaric/tmp/test.manifest"))
/archive-tag/history/2015-06-12 22:53:13> cd manifest
/archive-tag/history/2015-06-12 22:53:13/manifest> ls
1d4269099189234eefeb80b95370eaf280730cf4d591004d:03 The Lemon Song.mp3 <file>
7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3 <file>
64092fa12c2800dda474b41e5ebe8c948f39a59ee91c120b:09 How Many More Times.mp3 <file>
1d79148d1e1e8947c50b44cf2d5690588787af328e82eeef:2-07 Going to California.mp3 <file>
e3685148d0d12213074a9fdb94a00e05282aeabe77fa60d5:1-01 You Shook Me.mp3 <file>
d73904f371af8d7ca2af1076881230f2dc1c2cf82416880a:03 Strangers.mp3 <file>
9c5a0efb7d397180a1e8d42356d8f04c6c26a83d3b05d34a:09 Uptight.mp3 <file>
01a069aec2e731e18fcdd4ecb0e424f346a2f0e16910f5e9:07 Numb.mp3 <file>
7ea1ab7fbd525c40e21d6dd25130e8c70289ad56c09375b0:08 She.mp3 <file>
009dacd8f3185b7caeb47050002e584ab86d08cf9e9aceec:1-03 Communication Breakdown.mp3 <file>
26d264d629e22709f664ed891741f690900d45cd4fd44326:1-03 Dazed and Confused.mp3 <file>
d879761195faf08e4e95a5a2398ea6eefb79920710bfeab6:1-10 Band Introduction _ How Many More Times.mp3 <file>
83244601db42677d110fc8522c6a3cbbc1f22966a779f876:06 All My Love.mp3 <file>
5eebee9a2ad79d04e4f69e9e2a92c4e0a8d5f21e670f89da:07 Tangerine.mp3 <file>
dd6f1203b5973ecd00d2c0cee18087030490230727591746:2-08 That's the Way.mp3 <file>
c0acea15aa27a6dd1bcaff1c13d4f3d741a40a46abeca3fc:04 The Crunge.mp3 <file>
ea7727ad07c6c82e5c9c7218ee1b059cd78264c131c1438d:1-02 I Can't Quit You Baby.mp3 <file>
10fda5f46b8f505ca965bcaf12252eedf5ab44514236f892:14 F.O.D..mp3 <file>
a99ca9af5a83bde1c676c388dc273051defa88756df26e95:1-03 Good Times Bad Times.mp3 <file>
b5d7cfe9808c7fc0dedbd656d44e4c56159cbd3c2ed963bb:1-15 Stairway to Heaven.mp3 <file>
79c87e3c49ffdac175c95aae071f63d3a9efdf2ddb84998c:08.Batmilk.ogg <file>
-- Press q then enter to stop or enter for more...
q
/archive-tag/history/2015-06-12 22:53:13/manifest> ls -ll 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
-r--------     -     - [2015-04-13 21:46:39] -/-: 7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
key: #f
contents: "7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
import-path: "/home/alaric/archive/sorted-music/Led Zeppelin/Led Zeppelin/04 Dazed and Confused.mp3"
filename: "04 Dazed and Confused.mp3"
dc:format: "audio/mpeg"
dc:publisher: "Atlantic"
dc:subject: "Classic Rock"
dc:title: "Dazed and Confused"
dc:creator: "Led Zeppelin"
dc:created: "1982"
dc:contributor: "Led Zeppelin"
set:title: "Led Zeppelin"
set:index: 4
set:size: 9
superset:index: 1
superset:size: 1
ctime: 1428957999.0
file-size: 15448903

Searching

However, the explore interface to an archive is far from pleasant. You need to go to the correct import, and find your file by name, and then identify it with a big long name composed of its hash and the original filename to find its properties and extract.

I hope to add property-based searching to explore mode in future (which is why you need to go into a history directory within the archive directory, as other ways of exploring the archive will appear alongside). This will be particularly useful when the explore-mode virtual filesystem is mounted over 9P!

However, even that interface, being constrained to look like a filesystem, will be limited. The ugarit command-line tool provides a very powerful search interface that exposes the full power of the archive metadata.

Metadata filters

Files (and directories) in an archive can be searched for using "metadata filters", which are descriptions of what you're looking for that the computer can understand. They are represented as Scheme s-expressions, and can be made up of the following components:

#t: This filter matches everything. It's not very useful.
#f: This filter matches nothing. It's not very useful.
(and FILTER FILTER...): This filter matches files for which all of the inner filters match.
(or FILTER FILTER...): This filter matches files for which any of the inner filters match.
(not FILTER): This filter matches files which do not match the inner filter.
(= ($ PROP) VALUE): This filter matches files which have the given PROPerty equal to that VALUE in their metadata.
(= key HASH): This filter matches the file with the given hash.
(= ($import PROP) VALUE): This filter matches files which have the given PROPerty equal to that VALUE in the metadata of the import that last imported them.

Searching an archive

For a start, you can search for files matching a given metadata filter in a given archive. This is done with:

$ ugarit search <ugarit.conf> <archive tag> <filter>

For instance, let's look for music by Led Zeppelin:

$ ugarit search ugarit.conf music '(or
   (= ($ dc:creator) "Led Zeppelin")
   (= ($ dc:contributor) "Led Zeppelin"))'

The result looks like the explore-mode view of an archive manifest, listing the file's hash followed by its title and extension:

7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382:04 Dazed and Confused.mp3
834a1619a59835e0c27b22801e3c829b40be583dadd19770:2-08 No Quarter.mp3
9e8bc4954838bd9c671f275eb48595089257185750d63894:1-12 I Can't Quit You Baby.mp3
6742b3bebcdd9cae5ec5403c585935403fa74d16ed076cf2:02 Friends (1).mp3
07d161f4bd684e283f7f2cf26e0b732157a8e95ef66939c3:05 Carouselambra.mp3
[...]

What of all our lovely metadata? You can view that if you add the word "verbose" to the end of the command line, which allows you to specify alternate output formats:

$ ugarit search ugarit.conf music '(or
   (= ($ dc:creator) "Led Zeppelin")
   (= ($ dc:contributor) "Led Zeppelin"))' verbose

Now the output looks like:

object a444ff6ef807b080b536155f58d246d633cab4a0eabef5bf
        (ctime = 1428958660.0)
        (dc:contributor = "Led Zeppelin")
        (dc:created = "2008")
        (dc:creator = "Led Zeppelin")
[... all the usual file properties omitted ...]
        import a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c
                (stats = ((blocks-stored . 2046) (bytes-stored . 1815317503) (blocks-skipped . 9) (bytes-skipped . 8388608) (file-cache-hits . 0) (file-cache-bytes . 0)))
                (log = "b2a920f962c12848352f33cf32941e5313bcc5f209219c1a")
[... all the usual import properties omitted ...]
object b4cadf48b2c07ccf0303fc4064b292cb222980b0d4223641
        (ctime = 1428958673.0)
        (dc:contributor = "Led Zeppelin")
        (dc:created = "2008")
        (dc:creator = "Led Zeppelin")
        (dc:creator = "Jimmy Page/John Paul Jones/Robert Plant")
[...and so on...]

As you can see, it lists the hash of each file, its metadata, the hash of the import that last imported it, and the metadata of that import.

That's quite verbose, so you'd probably be wanting to take that as input to another program to do something nicer with it. But it's laid out for human reading, not for machine parsing. Thankfully, we have other formats for that, alist and alist-with-imports.

Try this:

$ ugarit search ugarit.conf music '(or
   (= ($ dc:creator) "Led Zeppelin")
   (= ($ dc:contributor) "Led Zeppelin"))' alist

This outputs one Scheme s-expression list per match, the first element of which is the hash as a string, the rest of which is an alist of properties:

("7cb253a4886b3e0051ea8cc0e78fb3a0160307a2c37c8382"
 (ctime . 1428957999.0)
 (dc:contributor . "Led Zeppelin")
 (dc:created . "1982")
 (dc:creator . "Led Zeppelin")
[... elided file properties ...]
 (superset:index . 1)
 (superset:size . 1))
("77c960d09eb21ed72e434ddcde0bd3781a4f3d6ee7a6eb66"
 (ctime . 1428958981.0)
 (dc:contributor . "Led Zeppelin")
[...]

$ ugarit search ugarit.conf music '(or
   (= ($ dc:creator) "Led Zeppelin")
   (= ($ dc:contributor) "Led Zeppelin"))' alist-with-imports

This outputs one s-expression per list per match, with four elements. The first is the key string, the second is an alist of file properties, the third is the import's hash, and the last is an alist containing the import's properties. It looks like:

("64fa08a0080aee6ef501c408fd44dfcc634cfcafd8006fc4"
 ((ctime . 1428958683.0)
  (dc:contributor . "Led Zeppelin")
  (dc:created . "2008")
  (dc:creator . "Led Zeppelin")
[... elided file properties ...]
  (superset:index . 1)
  (superset:size . 1))
 "a43f7a7268ee8b18381c20d7573add5dbf8781f81377279c"
 ((stats (blocks-stored . 2046)
         (bytes-stored . 1815317503)
[... elided manifest properties ...]
  (manifest-path . "test.manifest")))
("4cd56f916a63399b252976e842dcae0b87f058b5a60c93a4"
 ((ctime . 1428958437.0)
  (dc:contributor . "Led Zeppelin")
[...]

And finally, you might just want to get the hashes of matching files (which are particularly useful for extraction operations, which we'll come to next). To do this, specify a format of "keys", which outputs one line per match, containing just the hash:

$ ugarit search ugarit.conf music '(or
   (= ($ dc:creator) "Led Zeppelin")
   (= ($ dc:contributor) "Led Zeppelin"))' keys

ce6f6484337de772de9313038cb25d1b16e28028136cc291
6af5c664cbfa1acb22a377e97aee35d94c0fc003d239dd0c
92e91e79b384478b5aab31bf1b2ff9e25e7e2c4b48575185
6ddb9a41d4968468a904f05ecf7e0e73d2c7c7ad76bc394b
a074dddcef67cd93d92c6ffce845894aa56594674023f6e1
4f65f735bbb00a6fda4bc887b370b3160f55e5e07ec37ffa
97cc8b8ba70c39387fc08ef62311b751aea4340d636eb421
72358dbe3eb60da42eadcf6de325b2a6686f4e17ea41fa60
[...]

However, to write filter expressions, you need to know what properties you have available to search on. You might remember, or go for standard properties, or look at existing files in verbose mode to find some; but you can also just ask Ugarit what properties it has in an archive, like so:

$ ugarit search-props <ugarit.conf> <archive tag>

You can even ask what properties are available for files matching an existing filter:

$ ugarit search-props <ugarit.conf> <archive tag> <filter>

This is useful if you're interested in further narrowing down a filter, and so only care about properties that files already matching that filter have.

For a bunch of music files imported with the Ugarit Manifest Maker, you can expect to see something like this:

ctime
dc:contributor
dc:created
dc:creator
dc:format
dc:publisher
dc:subject
dc:title
file-size
filename
import-path
mtime
set:index
set:size
set:title
superset:index
superset:size

Now you know what properties to search, next you'll be wanting to know what values to look for. Again, Ugarit has a command to query the available values of any given property:

$ ugarit search-values <ugarit.conf> <archive tag> <property>

And you can limit that just to files matching a given filter:

$ ugarit search-values <ugarit.conf> <archive tag> <filter> <property>

The resulting list of values is ordered by popularity, so the most widely-used values will be listed first. Let's see what genres of music were in my sample of music files I imported:

$ ugarit search-values test.conf archive-tag dc:subject

The result is:

Classic Rock
Alternative & Punk
Electronic
Trip-Hop

Ok, let's now use a filter to find out what artists (dc:creator) I have that made Trip-Hop music (what even IS that?):

$ ugarit search-values test.conf archive-tag \
    '(= ($ dc:subject) "Trip-Hop")' \
    dc:creator

The result is:

Portishead

Ah, OK, now I know what "Trip-Hop" is.

Extracting

All this searching is lovely, but what it gets us, in the end, is a bunch of file hashes. Perhaps we might want to actually play some music, or look at a photo, or something. To do that, we need to extract from the archive.

We've already seen the contents of an archive in the explore mode virtual filesystem, so we could go into the archive history, find the import, go into the manifest, pick the file out there, and use get to extract it, but that would be yucky. Thankfully, we have a command-line interface to get things from archives, in one of two ways.

Firstly, we can extract a file (or a directory tree) from an archive, out into the local filesystem:

$ ugarit archive-extract <ugarit.conf> <archive tag> <hash> <target>

The "target" is the name to give it in the local filesystem. We could pull out that Led Zeppelin song from our search results above, like so:

$ ugarit archive-extract test.conf archive-tag \
    ce6f6484337de772de9313038cb25d1b16e28028136cc291 foo.mp3

We now have a foo.mp3 file in the current directory.

However, sometimes it would be nicer to have it streamed to standard output, which can be done like so:

$ ugarit archive-stream <ugarit.conf> <archive tag> <hash>

This lets us write a command such as:

$ ugarit archive-stream test.conf archive-tag \
    ce6f6484337de772de9313038cb25d1b16e28028136cc291 | mpg123 -

...to play it in real time.