Ugarit Archive Metadata Schema
Any symbol can be used as an archive metadata property name, but here are some suggested ones, defined for the sake of interoperability.
Where possible, we have used the
Dublin
Core vocabulary, as it's a good fit for the kinds of things archive
mode is designed for. Properties imported from Dublin Core are
identified with a dc:
prefix.
Some of these properties are automatically applied by the import process. However, if these properties are specified in the import manifest file, then the specified value from the manifest overrides the default.
Import properties
These are properties applied to an import object, rather than to an individual object in an archive.
Internal
These properties are all provided by the system itself, and must not be specified in an import manifest.
previous
(hash)- The hash of a previous import. If there is no instance of this property, then this is the first import in a sequence. If there are more than one instances, then this is a merge.
contents
(hash)- The hash of the imported archive manifest. This is probably not of much interest beyond the Ugarit internals.
mtime
(number)- The UNIX timestamp of the import.
log
(hash)- The hash of the import log file.
stats
(alist)- An alist of import statistics.
manifest-path
- The path to the manifest filename that was used for the import.
hostname
- The hostname on which the import was performed.
Core object properties
These object properties apply usefully to almost anything in an archive.
import-path
- The path the file was imported from, as taken from the import manifest file. (DEFAULT: The path from the manifest file)
filename
- The name of the file, including the extension (if applicable), but
not any directory path. This is usually the name the file had when it
was imported (eg, the latter part of
import-path
), but if it was imported from some temporary file name while the system knows of a "proper" filename other than that, they may differ. (DEFAULT: The import path, minus any directory path) dc:format
- The MIME type of the file. (DEFAULT: A MIME type guessed from the
file extension)
file-size
- The size of the file. If it's a directory, then this is the sum of the sizes of the files within it, not including any directory metadata.
mtime
(number)- The mtime of the file when it was imported, as a UNIX timestamp.
ctime
- The ctime of the file when it was imported, as a UNIX timestamp.
dc:title
- The title of the object. This should be a proper human-readable title, not just a filename, where possible.
dc:description
- A longer description of the object.
Object properties for music
Music files should put the song title in
dc:title
. dc:creator
- The creator of the piece, generally the artist name.
dc:contributor
- Some other contributor to the piece, other than the artist.
dc:publisher
- The name of the publisher.
dc:created
- The creation date, in
YYYY-MM-DD
form. dc:subject
- The name of the genre.
set:title
- The title of the album.
set:index
- Track number within the album.
set:size
- Track count within the album.
superset:index
- For multi-disk albums, the disk number.
superset:size
- For multi-disk albums, the number of disks.
Object properties for photographs
Use
dc:description
for a description of the photo. dc:creator
- The name of the photographer.
dc:subject
- Something in the photograph (names of photographed people or things, or more general keywords)
dc:spatial
- The name of the place the photo was taken, or coordinates as a geo: URL.
dc:temporal
- The name of the event the photograph was from.
dc:created
- The creation timestamp of the photo, in YYYY-MM-DD format,
optionally with a 24-hour UTC HH:MM:SS time.
Object properties for PDF/PS/ebooks
Use
dc:title
for the title of the work. dc:creator
- The name of the author.
dc:subject
- A subject or keyword.
dc:created
- The creation date in YYYY-MM-DD format.
dc:publisher
- The name of the publisher.
dc:identifier
- An ISBN, ISSN, or similar identifier, in
URN format (eg:
urn:isbn:0451450523
). dc:source
- The original URL the thing was downloaded from.
Other useful Dublin Core properties
dc:alternative
- An alternative title.
dc:extent
- Size, duration, etc. Not the size of the file in bytes, but the duration of a recording, the size of an image in pixels, etc.
dc:language
- The language of the object.
en
,en-GB
,jbo
, etc. dc:license
- A description of the license the file is under.
dc:accessRights
- A space-separted list of names of groups that should be allowed to
access the object, under some means of publishing all or part of an
archive.
public
should refer to unrestricted access.Please contribute!
The above are the conventions I have started to settle towards with the kinds of things I am using Ugarit archives for. If you use it for something else, please drop me a line and I'll be glad to help you choose a good schema, and publish the results here for others to share!