Magic Pipes
Documentation
Not logged in

Magic Pipes is a suite of tools to construct powerful Unix shell pipelines that operate on structured data.

Conventional shell pipeline tools - grep, sed, awk, and friends - work on lines, and have rather crude support for handling structure within the lines. This makes them fine for dealing with line-oriented data with simple structure within, but dealing with complex structure within the lines quickly descends into a hell of fragile separator handling; the shell pipelines become more minutae than meat.

Magic Pipes aims to fix that. The Magic Pipes tools read from standard input and write to standard output so they can be combined into pipelines, but rather than dealing with a line at a time, they deal with an s-expression at a time.

The tools fall into a few groups; input tools generate s-expressions from other data formats (CSV files, for instance, or traditional line-oriented data) or sources (directory or process listings). Output tools convert s-expressions into other formats. Data processing tools map s-expression to s-expressions in useful ways. Database access tools provide read and write access to external databases.

The original inspiration was my blog post at http://www.snell-pym.org.uk/archives/2009/06/25/magic-pipes/, but the spec has been refined since then.

Here's a few examples of what can be done:

$ cat test.csv
Name,Age
Alaric,38
Jean,11
Mary,6
$ mpcsv-read < test.csv | mptable2alist -H | mpjson-write
{"Name":"Alaric","Age":"38"}
{"Name":"Jean","Age":"11"}
{"Name":"Mary","Age":"6"}
$ echo '((id . 10) (name . "Pastie"))' |
    mpsqlite -m update test.sqlite foods id
...executes the following SQL against test.sqlite:
UPDATE foods SET name = 'Pastie' WHERE id = 10
$ cat /etc/passwd | mpre '(seq (=> user (* any)) ":" (* any) ":" (=> uid integer) ":" (=> gid integer) ":" (=> name (* any)) ":" (=> homedir (* any)) ":" (=> shell (* any)))'
((user . "root") (uid . "0") (gid . "0") (name . "System administrator") (homedir . "/root") (shell . "/run/current-system/sw/bin/bash"))
...
$ mpls -R /home/alaric | \
   mpfilter '(lambda (de) (and (dirent-filename de) (string=? (dirent-filename de) "magic-pipes.scm")))' \
   | mpmap dirent-path
"/home/alaric/personal/projects/magic-pipes/magic-pipes.scm"

General usage

In general, Magic Pipes tools read s-expressions from standard input and write them to standard output (although many do only one or the other of those, rather than both). They process the s-expressions one at a time, based on the command-line arguments, which will often include snippets of Scheme source code that evaluate to procedures which are applied to input s-expressions or intermediate results.

The following command-line arguments can be used with any Magic Pipes tool to affect the environment in which supplied Scheme code is run, and to execute code before normal processing:

The options take effect in the order they are supplied. Code executed as a result of them has access to the standard input, output and error ports, so they can read some of the input before normal processing occurs, and likewise generate output before normal processing. Any definitions imported or created due to the use of the above options are available to the code in subsequent options from the above, and are available to user code executed during normal processing.

Finally, it is possible to run code after normal processing:

As you might expect, multiple expressions are executed in the order supplied, can access any definitions generated or imported by earlier code, and have access to standard output and error; standard input will have all been used up by this point, so attempting to read from it is an error.

Generally, user code executed during normal processing only has access to the standard error port; attempts to read from standard input or write to standard output are errors.

Data s-expressions read to and from standard input and output are read using srfi-38 syntax to express shared structure.

Runtime library

User code has a runtime library of useful things in its environment by default.

The Tools

Data processing

Input

Output

Database access