Magic Pipes: Artifact [e6d707a493]

Artifact e6d707a493c11bf75c41bc7fb8bb2c00e0eceb0a:

File README.wiki — part of check-in [d2367e4f41] at 2013-08-16 15:01:34 on branch trunk — Better examples all round (user: alaric size: 28819)
Magic Pipes is a suite of tools to construct powerful Unix shell
pipelines that operate on structured data.

Conventional shell pipeline tools - grep, sed, awk, and friends - work
on lines, and have rather crude support for handling structure within
the lines. This makes them fine for dealing with line-oriented data
with simple structure within, but dealing with complex structure
within the lines quickly descends into a hell of fragile separator
handling; the shell pipelines become more minutae than meat.

Magic Pipes aims to fix that. The Magic Pipes tools read from standard
input and write to standard output so they can be combined into
pipelines, but rather than dealing with a line at a time, they deal
with an s-expression at a time.

The tools fall into a few groups; input tools generate s-expressions
from other data formats (CSV files, for instance, or traditional
line-oriented data) or sources (directory or process listings). Output
tools convert s-expressions into other formats. Data processing tools
map s-expression to s-expressions in useful ways. Interface tools
transform s-expressions by consulting external data sources, such as
database files or Web services.

The original inspiration was my blog post at
[http://www.snell-pym.org.uk/archives/2009/06/25/magic-pipes/], but
the spec has been refined since then.

<h1>General usage</h1>

In general, Magic Pipes tools read s-expressions from standard input
and write them to standard output (although many do only one or the
other of those, rather than both). They process the s-expressions one
at a time, based on the command-line arguments, which will often
include snippets of Scheme source code that evaluate to procedures
which are applied to input s-expressions or intermediate results.

Supplied Scheme code has access to a rich standard environment,
including:

  *  [http://www.schemers.org/Documents/Standards/R5RS/|R5RS Scheme]
  *  [http://wiki.call-cc.org/man/4/Extensions%20to%20the%20standard|Core Chicken extensions]
  *  [http://api.call-cc.org/doc/data-structures|The Chicken data-structures unit]
  *  [http://api.call-cc.org/doc/srfi-1|SRFI-1] (list utilities)
  *  [http://api.call-cc.org/doc/srfi-12|SRFI-13] (string utilities)
  *  [http://api.call-cc.org/doc/srfi-69|SRFI-69] (hash tables)
  *  [http://api.call-cc.org/doc/alist-lib|alist-lib]
  *  <tt>(mplog <i>format</i> <i>args...</i>)</tt> uses Chicken's
     [http://api.call-cc.org/doc/extras/printf|printf] formatting
     system to output strings to the standard error port.

In addition, any Chicken standard module or installed egg can easily
be used by referencing the module name, and utility code can be
included from files. The following command-line arguments can be used
with any Magic Pipes tool to affect the environment in which supplied
Scheme code is run, and to execute code before normal
processing:

  *  <tt>-u <i>module</i></tt> Use the specified unit.
  *  <tt>-d <i>expr</></tt> Evaluate the supplied expression.
  *  <tt>-i <i>filename</i></tt> Evaluate the contents of the supplied file.

The options take effect in the order they are supplied. Code executed
as a result of them has access to the standard input, output and error
ports, so they can read some of the input before normal processing
occurs, and likewise generate output before normal processing. Any
definitions imported or created due to the use of the above options
are available to the code in subsequent options from the above, and
are available to user code executed during normal processing.

Finally, it is possible to run code after normal processing:

  *  <tt>-e <i>expr</i></tt> Evaluate the supplied expression.

As you might expect, multiple expressions are executed in the order
supplied, can access any definitions generated or imported by earlier
code, and have access to standard output and error; standard input
will have all been used up by this point, so attempting to read from
it is an error.

Generally, user code executed during normal processing only has access
to the standard error port; attempts to read from standard input or
write to standard output are errors.

Data s-expressions read to and from standard input and output are read
using a special reader that disables non-standard or unsafe read
syntax. Also, [http://srfi.schemers.org/srfi-38/srfi-38.html|srfi-38]
syntax is read and written to express shared structure.

<h1>The Tools</h1>

<h2>Data processing</h2>

<h3><tt>mpmap <i>procedure-expr</i></tt></h3>

Reads s-expressions from standard input, and applies the procedure
expression <i>procedure-expr</i> to each of them. The results are then
written to standard output. The procedure expression may return an
arbitrary number of values; all non-undefined values are output in
order (separated by a single space), and a newline is written after
the values from each application of the procedure expression. It is
legal for the procedure expression to return zero values.

<h4>Examples</h4>

<h5>Basic operation</h5>

<pre>$ echo '"Alaric" "was" "here"' | mpmap string-reverse
"ciralA"
"saw"
"ereh"</pre>

<h5>Returning multiple values</h5>
<pre>$ echo "1 2 3" | mpmap '(lambda (x) (values x (+ x 1)))'
1 2
2 3
3 4</pre>

Note how the two values returned for each input s-expr are output on
the same line, for clarity.

<h5>Defining utilities with <tt>-d</tt></h5>
<pre>$ echo "1 2 3" | mpmap -d '(define (foo x) (values x (+ x 1)))' 'foo'
1 2
2 3
3 4</pre>

<h5>Using procedures from Chicken modules</h5>
<pre>$ echo "1 2 3" | mpmap -u srfi-4 'make-u8vector'
#u8(112)
#u8(74 61)
#u8(136 51 131)</pre>

<h5>Omitting undefined values</h5>
<pre>$ echo "#t #f #t" | mpmap '(lambda (x) (values x (if x "Yes" (void)) x))'
#t "Yes" #t
#f #f
#t "Yes" #t</pre>

<h5>Omitting entire inputs</h5>

<pre>$ echo "1 2 3 4 5 6" | mpmap '(lambda (x) (if (even? x) x (values)))'

2

4

6
</pre>

When we produce zero values with <tt>values</tt>, only a blank line is
output.

<h3><tt>mpfilter <i>predicate-expr</i></h3>

Read s-expressions from standard input, applies the procedure-expr to
them, and writes the s-expressions to standard output if the predicate
returned a true value. The predicate may write to standard error, but
standard input and output are inaccessible.

<h4>Examples</h4>

<pre>$ echo "1 2 3 4 5 6" | mpfilter 'even?'
2
4
6</pre>

<h3><tt><nowiki>mpfold [-n <i>integer</i>] [-a <i>expr</i>] [-t] [-o <i>expr</i>] <i>expr</i> [<i>expr</i>]</nowiki></tt></h3>

The first expression evaluates to a two-argument procedure, the second
(the initial accumulator) to any value; an unspecified value is the
default if none is specified. (-o) specifies a single-argument output
procedure; the identity function is the default.

Applies the procedure to each s-expression from the input
in turn, with the current accumulator as the second argument. At the
end, outputs the result of applying the output procedure to the final
accumulator.

If (-n) is specified, then rather than just the single "current input
s-expression", the first expression's procedure is passed the
specified number of previous s-expressions as well, in extra
arguments. (-n) defaults to 0. The first call to the procedure will
be performed when sufficient s-expressions have been read (eg, (-n)+1
s-expressions) unless (-a) is specified, when the procedure will be
called for every s-expression, with the result of the expression
passed to (-a) used as a "default" for slots that cannot yet be filled.

If (-t) is specified, then the current, and the specified number of
previous, sexprs are passed as a single list, rather than as separate
arguments to the procedure. The procedure is called for every input
s-expression, with the list being truncated if sufficient previous
values are not available; (-a) is ignored.

current-input-port is banned, but current-error-port and
current-output-port are accessible.

<h4>Examples</h4>

<h5>Basic operation</h5>

<pre>$ echo "1 2 3" | mpfold + 0
6</pre>

<h5>Using <tt>-o</tt></h5>

<pre>$ echo "1 2 3" | mpfold -o 'number->string' + 0
"6"</pre>

<pre>$ echo "1 2 3" | mpfold -o '(lambda (x) (* x 2))' + 0
12</pre>

<h5>Using <tt>mplog</tt> to demonstrate how the fold-proc is
called</h5>

<pre>$ echo "1 2 3 4 5 6" | mpfold '(lambda x (mplog "~S" x) 0)' 0
(1 0)
(2 0)
(3 0)
(4 0)
(5 0)
(6 0)
0</pre>

The fold-proc is called with two arguments - the current input s-expr
and the accumulator value.

<h5>Adding context with <tt>-n</tt></h5>

<pre>$ echo "1 2 3 4 5 6" | mpfold -n 2 '(lambda x (mplog "~S" x) 0)' 0
(3 2 1 0)
(4 3 2 0)
(5 4 3 0)
(6 5 4 0)
0</pre>

Note how the fold-proc is not called for <tt>1</tt> or <tt>2</tt>; the
first value passed to it is <tt>3</tt> with <tt>2</tt> and <tt>1</tt>
as the two requested context items. The fold-proc is now called with
four arguments; the current item, the previous two items, and the
accumulator value.

<h5>Using <tt>-a</tt> as the default context</h5>

<pre>$ echo "1 2 3 4 5 6" | mpfold -n 2 -a '#f' '(lambda x (mplog "~S" x) 0)' 0
(1 #f #f 0)
(2 1 #f 0)
(3 2 1 0)
(4 3 2 0)
(5 4 3 0)
(6 5 4 0)
0
</pre>

Note how the fold-proc is now called for all the input sexprs, with
<tt>#f</tt> filling in for "missing context".

<h5>Using <tt>-t</tt> to pass a list into the fold-proc</h5>

<pre>$ echo "1 2 3 4 5 6" | mpfold -n 2 -t '(lambda x (mplog "~S" x) 0)' 0
((1) 0)
((2 1) 0)
((3 2 1) 0)
((4 3 2) 0)
((5 4 3) 0)
((6 5 4) 0)
0
</pre>

The fold-proc now receives two arguments again; the first is a list
containing the current input sexpr and up to 2 (as requested by
<tt>-n</tt>) previous ones. We are invoked for every input
s-expression, and the list is simply shorter if there have not been
two previous sexprs yet.

<h5>Moving average with final mean</h5>

<pre>$ echo "3 7 2 4 6 4 1" | mpfold -n 2 \
       -o '(lambda (x) `(mean ,(/ (cdr x) (car x))))' \
       '(lambda (x h1 h2 acc) (write `(moving-mean ,(/ (+ x h1 h2) 3))) (newline) (if acc (cons (+ (car acc) 1) (+ (cdr acc) x)) (cons 3 (+ x h1 h2))))' '#f'
(moving-mean 4)
(moving-mean 4.33333333333333)
(moving-mean 4)
(moving-mean 4.66666666666667)
(moving-mean 3.66666666666667)
(mean 3.85714285714286)</pre>

Computing the final mean correctly when using <tt>-n</tt> without
<tt>-a</tt> is a little complicated.

Here's a simpler way:

<pre>$ echo "3 7 2 4 6 4 1" | mpfold -n 2 -a '#f' \
       -o '(lambda (x) `(mean ,(/ (cdr x) (car x))))' \
       '(lambda (x h1 h2 acc) (when (and h1 h2) (write `(moving-mean ,(/ (+ x h1 h2) 3))) (newline)) (cons (+ (car acc) 1) (+ (cdr acc) x)))' '(0 . 0)'
(moving-mean 4)
(moving-mean 4.33333333333333)
(moving-mean 4)
(moving-mean 4.66666666666667)
(moving-mean 3.66666666666667)
(mean 3.85714285714286)</pre>

<h3>mpsort (FIXME)</h3>
<h3>mpgroup (FIXME)</h3>
<h3>mptree (FIXME)</h3>
<h3>mprandom (FIXME)</h3>
<h3>mpshuffle (FIXME)</h3>
<h3>mphead (FIXME)</h3>
<h3>mptail (FIXME)</h3>
<h3>mpsxpath (FIXME)</h3>

<h2>Input</h2>

<h3>mpps (FIXME)</h3>
<h3>mpls (FIXME)</h3>
<h3><nowiki><tt>mpre {<i>sre</i>|-p <i>pcre</i>] [-o <i>output-proc</i>]</tt></nowiki></h3>

The argument, if present, must be a valid SRE; or, if (-p) is used, a
POSIX regexp. If not present, it defaults to "(seq bos (* any) eos)"

Reads in lines of text from stdin and converts them to s-expressions
by applying the regular expression. Lines that do not match the regexp
are ignored.

If (-o) is specified, then the expression must be a single-argument
procedure which is applied to each irregex match object to generate
the output s-expression. If not, then a default is used which has the
following behaviour:

If the regexp has no captures, then the entire matching string is
returned.

If it has only numbered submatches, then a list of the submatches is
returned.

If it has named (and maybe also numbered) submatches, then an alist of
them is returned, with names used where available and numbers where not.

<h4>Examples</h4>

<h5>Default output, numbered submatches</h5>

<pre>$ printf 'Hello world\nHello [world]\nDear [bob], I love you\n' |\
        mpre '(seq ($ (* any)) "[" ($ (* any)) "]" ($ (* any)))'
("Hello " "world" "")
("Dear " "bob" ", I love you")</pre>

<h5>Default output, named submatches</h5>

<pre>$ printf 'Hello world\nHello [world]\nDear [bob], I love you\n' |\
        mpre '(seq (=> pre (* any)) "[" (=> bracketed (* any)) "]" (=> post (* any)))'
((pre . "Hello ") (bracketed . "world") (post . ""))
((pre . "Dear ") (bracketed . "bob") (post . ", I love you"))</pre>

<h5>Parsing output of ls (as if we didn't have <tt>mpls</tt>)</h5>

<pre>$ ls -l *.scm | \
       mpre '(seq (=> perms (= 10 any)) (* space) (=> linkcount integer) (* space) (=> user symbol) (* space) (=> group symbol) (* space) (=> size integer) (* space) (=> modtime symbol (* space) integer (* space) integer ":" integer) (* space) (=> name (* any)))'
((perms . "-rw-r--r--") (linkcount . "1") (user . "alaric") (group . "users") (size . "7397") (modtime . "Aug 16 14:30") (name . "magic-pipes.scm"))
((perms . "-rw-r--r--") (linkcount . "1") (user . "alaric") (group . "users") (size . "5340") (modtime . "Aug 16 14:54") (name . "mpfold.scm"))
((perms . "-rw-r--r--") (linkcount . "1") (user . "alaric") (group . "users") (size . "2844") (modtime . "Aug 16 15:36") (name . "mpre.scm"))</pre>

<h3>mpjson-read (FIXME)</h3>
<h3>mpxml-read (FIXME)</h3>
<h3>mpcsv-read (FIXME)</h3>

<h2>Output</h2>

<h3>mpforeach (FIXME)</h3>
<h3>mpjson-write (FIXME)</h3>
<h3>mpxml-write (FIXME)</h3>
<h3>mpcsv-write (FIXME)</h3>
<h3>mpprintf (FIXME)</h3>

<h2>Interface</h2>

<h3>mplookup (FIXME)</h3>
<h3>mplookup-set (FIXME)</h3>
<h3>mplookup-delete (FIXME)</h3>
<h3>mplookup-dump (FIXME)</h3>

<h2>Parallelism</h2>

<h3>mpfork (FIXME)</h3>

<h1>Not yet implemented things</h1>

<h2>mpmerge, mpjoin, mpcogroup?</h2>

Do we need these more advanced operators from the database world, or
can they be done in other ways?

mpmerge would need to accept a list of file names and read from them
all (possibly including standard input as well), comparing
already-sorted input elements using a supplied comparison expression
similar to mpsort, and output the results in merged order.

mpcogroup would also accept a list of input file names (possibly
including standard input) and, for each, an expression mapping an
s-expression to a join key value. For each distinct join key value in
the entire input, it would output a list starting with the join key
value, followed by a (possibly empty) list of matching s-expressions
from each input file in order.

mpjoin would work much like mpcogroup, except that the output would
consist of the cross product of each group. Each s-expression in the
output would be a list with the join key value followed by one element
per input file, containing an s-expression from each file that
produced the same join expression.

mpcogroup/mpjoin might build up a hash table internally, then if it
reaches a certain limiting size, write it to a temporary sqlite file
and then continue writing into that until it's time to generate output.

<h2>Safe reader</h2>

<h2>File objects in runtime module</h2>

These should include a set of tools for manipulating "directory entry
alists" as created by mpls, but which will crop up elsewhere in
pipelines fed from an upstream mpls. These alists always have the
file name in the first element, followed by zero or more extra things
about the file; a set of procedures accept such an alist, or a raw
file name, as the first argument and return information about the
file. If the required information is in the alist it is returned
as-is, otherwise it is computed or otherwise obtained and the alist
mutated to insert the information (which is why the filename, which
never changes, is always first).

There is an accessor for the file's name (which may be a relative
path), the absolute path, the directory name, the file name without
the path, the file name without path or extension, and the extension;
and for everything returned by stat(2). There is also a procedure
that maps a file information alist (or just a raw string path) to a
file information alist in order to canonicalise raw paths out, and a
similar procedure which always returns an alist with an absolute path.

It is perfectly valid to add "custom" keys to the alist; Magic Pipes
will always prefix its own with "mp", to avoid namespace issues.

A few utilities such as "(file-older <file> <age>)" can be built on top of basic
tools such as "file-creation-time", too. To help with inputting ages
(which are going to be in seconds), provide "(days <N>) -> (* 86400
N)" and similar for minutes, seconds, weeks, etc.

<h2>Useful UNIX information procedures in runtime library</h2>

uid->username (see posix unit)
username->uid
gid->groupname
groupname->gid
ip->hostnames (see hostinfo egg)
hostname->ips
get-environment-variable (alised to $)

<pre>

** mptree <id-expr> <parent-expr> <children-expr> [<output-expr>]

Reads input s-expressions and organises them into a tree. For each
s-expression, the single-argument procedures that the first three
expressions evaluated to are called, yielding an identifier for the
s-expression, an identifier for its parent (or #f if it cannot be
obtained), and a list of identifiers of its children (or '() or #f if
they cannot be obtained). Using what information becomes available,
parent/child relationships are found between the s-expressions,
forming one or more trees. If conflicts arise (multiple parents for
the same s-expression), an error is signalled and processing stops. If
no errors occur, then a set (hopefully singleton) of roots
(s-expressions with no parents) is found, each at the head of a nice
tree.

If the output expression is supplied, then it is applied to each tree
in turn (in some arbitrary order). The trees are represented by "node"
record instances, which have the following accessors:

  *  <tt>(node-id NODE)</tt> returns the ID of the node.
  *  <tt>(node-data NODE)</tt> returns the s-expression.
  *  <tt>(node-parent NODE)</tt> returns the parent node (or #f).
  *  <tt>(node-children NODE)</tt> returns a list of child nodes.

(These come from a <tt>magic-pipes-runtime-tree</tt> module which is
automatically loaded.)

If no output expression is supplied, a default one is used which
renders the nodes as s-expressions with the node data as the first
element and the children thereafter, indented neatly to show the
structure.

** mpsort [-c] [-r] [-p <integer>] [<expr> [<expr>]]

The first expression must produce a two-argument comparison procedure,
and defaults to "smart<" if none is present. The second expression must
produce a single-argument key extraction procedure, which defaults to
the identity.

Reads in all the expressions from the input, sorts them by applying
the comparison procedure to the results of applying the extraction
procedure to the expressions, then returns the result.

If (-c) is specified, then the extraction procedure is assumed to be
expensive, and its result computed and cached at load time.

If (-r) is specified, then the sort order is reversed.

Provide smart< and smart> procedures, which compare things in a
type-agnostic way: < for numbers, string< for strings, recursive
testing for pairs and vectors.

As usual, the procedures have no access to current input or output
ports, but can write to the error port.

If (-p) is specified, then rather than sorting in-memory, we instead
start the specified number of threads, each of which reads
sexpressions from a bounded FIFO and sends them to a child mpsort
process. A master thread then reads sexpressions from standard input
and round-robins them to the FIFOs, skipping any FIFOs that are "full"
and blocking if they all are. Each child process also has a reader
thread that reads its sorted output and loads them into another FIFO,
and a final output thread merges the sorted FIFO outputs into a final
sorted output to standard output. #!eof is used as a marker in the
FIFOs to record the actual end of the file, to distinguish EOF from an
empty FIFO due to the source not having produced anything yet.

Or do we make a separate mpmerge tool that takes a list of filenames
on the command line along with extract and compare procedure
expressions, and invoke that using a set of FIFOs which the
sub-mpsorts feed out to?

Is it worth having an option to go multi-machine by running mpsort
from inetd (perhaps in parallel mode to use multiple cores) on remote
machines and parallelising via TCP rather than running a child
process? That would be kind of cool and not too hard.

Or for huge sorts (where there's not enough memory available), we
could have a flag that splits the input into temporary files of up to
a certain size, sorts them individually one by one, then merges the
results together.

** mplookup [-f|-F] {lookup <map file>|revlookup <map file>}

It would be convenient to have a simple command-line tool to handle
look-up tables, mapping one s-expression to one or more other
s-expressions. By default, each output s-expression is a list of
results from the corresponding input s-expression, which is empty if
the there is no mapping. If (-f) is specified then the first result is
returned only, not wrapped in a list, and #f used if there is none. If
(-F) is used then the first result is returned, and #f if there is
none or more than one.

File type detection is performed on the map file. There is support for
sqlite databases in a special format (ending .mbm; magic binary map),
or plain text files with a sequence of (<key> . <value>) pairs (ending
.msm; magic sexpr map) or /etc/aliases format files (default), which
are treated as string->string mappings.

*** mplookup-set <map file> [<expr1> <expr2>]

In the given map file (which, if nonexistant, is created), set expr1
to map to expr2.

If the exprs are omitted, then sexprs are read from standard input,
and must be pairs, the first element of which is treated as expr1 and
the second as expr2, and are all set into the map in order.

*** mplookup-delete <map file> [<expr>]

Deletes the given mapping from the given map file. If the expression
is omitted, then expressions are read from stdin and removed from the
map file. If the map file does not exist, an error is raised.

*** mplookup-dump <map file>

Spits out the contents of the map file as a sequence of pairs, with
the car being the key and the cdr the value. This can be piped into
mplookup-set to effect map file format conversions.

** FIXME: mprandom ???

Take random samples of the input - either pick any s-expression with a
given chance, or read all the s-expressions into RAM and pick N at
random

** FIXME: mpshuffle ???

Read input s-expressions into a list, shuffle, and output the result.

** FIXME: mpsysinfo ???

Can we portably get hold of system info tables - mounted filesystems,
network interfaces and addresses, process, memory usage, load average,
that sort of thing - and present a tool or suite of tools that turns
them into s-expressions on stdout? That would be handy for system
administration scripting.

** mpps [-u <user>]... [-p <pid>]... [-x <executable>]... [-c]
Outputs an information alist for every process matching the specified
criteria.

(-u) restricts output to processed owned by a specific user (or
users, if multiple instances of the option are used). Otherwise, all
user's processes are listed.

(-p) restricts output to the specific pid listed. Again, multiple
instances can be used to restrict to a set of pids.

(-x) restricts output to the specific executable listed. Once more,
multiple instances can be used to restrict to a set of executables.

If the above are combined in any way, then the different restrictions
are logically ANDed togather.

If (-c) is specified, then any process which is a child of a process
matched by the above restrictions is also included.

The process alist contains information from /proc, such as

mpname
mpexecutable
mpcwd
mpcmdline
mppid
mpppid
mpuid
mpgid
mppgrp
mpsid
mpnice
mpvsize
mprss
mpwchan

** FIXME: mphead/mptail ???
Picking (all but) the first or last N s-expressions is a useful
operation.
** mpflatten

Reads input s-expressions, and if they are lists, writes the elements
of the list as separate s-expressions, otherwise writes them as-is.

** mpfork [<command>|-x <integer>]...
Runs the given list of shell commands in parallel, distributing input
s-expressions to them atomically, and atomically merging their output
s-expressions to standard output. If any commands terminate before
their input is closed, mpfork terminates with an error.

(-x) specifies a multiplier factor; subsequent commands are
"repeated" that many times. (-x) defaults to 1, in practice.

Implementation: a pair of threads is spawned for each command, one
for input and one for output (standard error is left untouched). Each
thread has a single-sexpr buffer.

A master input thread reads s-exprs from standard input and places
them in the first empty input buffer in the list of command input
threads, in round-robin fashion, blocking if none are available.

A master output thread blocks until at least one output buffer is
full, then scans in round-robin fashion to find and empty it to
standard output.

Once input is closed, all the subprocess standard inputs are closed;
and once all the subprocesses have terminated, mpfork terminates.
** mpgroup [-a] [-t] [-f|-l] <expr>

The expression must be a single-argument procedure. It is applied to
each input s-expression to obtain a "key" for each input s-expression.

As usual, the procedure has no access to current input or output
ports, but can write to the error port.

If (-a) is specified, then the s-expressions are accumulated in memory
by their keys, into a hashtable. If (-f) is specified, the only the
first s-expression for each key is kept; if (-l) is specified, the
only the last is kept. At the end, the hash table is written out; if
(-t) is specified, it is written as one list per key, the first
element being the key value and the rest being the s-expressions with
that key. If (-t) is not specified, then it is just one list per key,
but without the key as the first element. The order of the keys listed
in undefined, but if neither (-f) nor (-l) are specified, the
s-expressions within a key are in the order they were read.

If (-a) is not specified, then the s-expressions are not accumulated
and spat out in a single batch; instead, they are output in the same
order that they were read in, but grouped into lists of s-expressions
having the same key in a contiguous run. If (-t) is specified, the key
value is prepended to the list. If (-f) is specified, then only the
first s-expression in each run of the same key value is listed (and if
(-t) is not specified, then it is output as-is rather than as a
single-element list). Likewise, if (-l) is specified, the only the
last s-expression in each run with of the same key value is listed,
and unless (-t) is specified, it's written as-is without a
single-element list enclosing it.

** mpforeach <expr>...

Run the supplied Scheme procedure(s) on each s-expression from the
input. Ignore anything returned, and the Scheme procedure can access
stdout/stderr if required, but has no access to stdin.


** FIXME: mpjson ???
Convert to/from JSON representation
** FIXME: mpxml ???
Parse xml->sxml, html->sxml (htmlprag), or sxml->xml
** FIXME: mpsxpath ???
Read in sexprs from stdin, treating them as sxml, and output sexprs
matching the supplied sxpath/xpath selector. Each input sexpr might
map to several output sexprs if there's multiple matches; have a flag
to wrap each input's outputs in a list.
** mpprintf [-n] <string>...

Calls "printf" on each input sexpr, with the arguments (concatenated
with spaces) as the format string. Appends a newline unless (-n) is
specified.

** mpls [-r|-R <expr>] [-f <expr>]... [<filename>]...

Write an "ls"-equivalent tool that outputs alists containing
information about each file, and optional filter expression(s) (-f)
which are ANDed together. By default, the filter accepts all files.

Give it the option (-r) to recurse, or (-R <expr>) to apply the
procedure to each directory encountered; recursion within will occur
if the expression is true. (-r) is really (-R 'any?').

Takes an optional list of files on the command line to just list
those, a la "ls".

In the expressions, current-input-port and current-output-port are
banned, but current-error-port is accessible.

mpls -r -f 'regular-file?' -f '(lambda (file) (file-older file (days 5)))'
 | mpforeach 'delete-file'
** FIXME: Some nice output plugins?
mpprintf is a start, but something feeding into the fmt egg, or some
kind of sxml template generation for HTML, might be cool in order to
make report generation easier. Perhaps wrappers for GraphViz and
gnuplot might be useful (converting sexpr syntax into that tool's
native syntax).
</pre>