Meditations on a (Lisp-2) Macro

Buy one and get “Assorted Commentary on Lisp and Programming in General” for free!

Wednesday, 23 November 2022

I’d like to discuss a macro I happen to have written. The definition is not that long, and the idea is actually pretty simple. Have a look:

(defmacro subst-func-symbols (mode data &rest body)
  "Generic system of substituting function symbols in BODY.
Each function call and sharp-quoted function symbol in BODY is
recursively inspected, where the symbol in the function slot is
used to check for a substitution predicate.  The symbol is
checked for a property (that is determined by interring a symbol
will prepend `subst-' to MODE) for a substitution function.  If
found, the substitution function is called with two
arguments (the entire function call, and DATA), and is expected
to either return a new symbol that will replace the previous
function symbol, or nil in which case it will not be modified."
  (declare (indent 2))
  (unless (symbolp mode)
    (error "Mode must be a symbol"))
  (letrec ((prop (intern (format "subst-%S" mode)))
           (subst
            (lambda (form)
              (cond
               ((eq (car-safe form) 'quote) form)
               ((eq (car-safe form) 'function)
                `#',(let ((pred (get (cadr form) prop)))
                      (or (and pred (funcall pred form data))
                          (cadr form))))
               ((consp form)
                (cons
                 (let ((pred (get (car form) prop)))
                   (or (and pred (funcall pred form data))
                       (car form)))
                 (mapcar subst (cdr form))))
               (form)))))
    (macroexp-progn (mapcar subst body))))

All it does is replaces symbols designating functions.

This can be useful: Later on I will give an example of how this can be used to transform the Common Lisp-ish

(every #'plusp (loop for i from 1 to 10 collect i))

into valid and “modern” Emacs Lisp:

(cl-every #'cl-plusp (cl-loop for i from 1 to 10 collect i))

Note how every (a function) and loop (a macro), both in a function position got translated, but also plus, a (sharp-)quoted symbol was also caught.

Reminder: What makes a “Lisp 2”

(Warning: I am not a historian of Lisp nor an avid Implementer, so I might get a thing or two wrong here. If you catch any such mistake, please tell me.)

Lisp (in a broader sense) is split into two traditions: The slightly older “LISP 1.5”(McCarthy et al. 1962) family that nowadays mainly includes Scheme, and the slightly newer “Lisp 2” lineage that is what most people understand when they think of Lisp. Most old Lisp implementations (MACLISP, Interlisp, Lisp Machine Lisp, Franzlisp, …) along with Common Lisp and Emacs Lisp as two contemporary examples. By the way, when I say “slightly older” or “slightly newer”, I am talking about a difference of maybe a few years, which is totally negligible.

The schism between the two is mainly on the topic of namespacing. Prior to Lisp 2, a function was just another value that could be “assigned” to a symbol. This is nice because it makes passing a function as a value to a higher-order function more intuitive:

(map some-func some-lisp)

A disadvantage is that you have to watch out not to shadow your functions when binding variables:

;; Wrap a list in a list
(let ((list '(1 2 3)))
  (list list))

This would trigger a error because the value bound to list is (1 2 3) and not a function.

In a Lisp 2, we have a separate (so-called) name space for variable values and functions. This is done by having separate “variable slots” and “function slots”. While in most places, a symbol continues to evaluate to the value in the variable slot, a symbol in the position of a function is evaluated to the value in the function slot. In other words, a Lisp-1-eval given a list would execute

(apply (eval (car list)) (mapcar eval (cdr list)))

a Lisp-2-eval might do

(apply (symbol-function (car list)) (mapcar #'eval (cdr list)))

The function symbol-function would be responsible for querying the function slot (as opposed to symbol-value, that would query the value slot). Many people don’t like this, because you now can’t do things like

((if (remote? destination) send print) file)

but instead one is forced to either write

(funcall (if (remote-p destination) #'send #'print) file)

or worse still

(if (remote? destination)
    (send file)
    (print file))

This is because the first element of the list must be a symbol, so that it can be passed to symbol-function¹

At the same time, we notice that the function as a value is not bound to the value slot, so how do we use it. After all, this is Lisp: Functions are “first class citizens”. We have seen a few examples above, the trick is (sharp-)quoting. So what is what, and what does what do? In Common Lisp we would see the following:

(list 'foo #'foo)
;; => (foo #<FUNCTION foo>)

while Emacs Lisp would give us

(list 'foo #'foo)
;; => (foo foo)

Regular quoting inhibits evaluation, just as it always does, but sharp-quoting will either behave sort of like symbol-function and resolve the symbol to the functional object, or just behave like quoting. Both approaches work, since whenever you actually want to use the function you need to invoke apply/funcall, and they handle both a symbol with a bound function slot or a function object. The advantage of sharp-quoting in the latter case is that this can still be used as a hint to the Lisp compiler/interpreter that the symbol ought to have a bound function slot, and a warning may be issued if this is not know with certainty.

There is another slot that each symbol holds, namely that containing a property list. This can be understood as a sort of catch-all “etcetera” slot, that can be used for miscellaneous tagging of information to symbols. This is done with get and put:

(put 'foobar 'written-in-the-year 2022)
(get 'foobar 'written-in-the-year)      ;; => 2022

We can access the entire property list directly using symbol-plist:

(symbol-plist 'foobar)                  ;; => (written-in-the-year 2022)

Of course this is not necessary, you might just as well have a (global) hash table for each property you are concerned with:

(let ((table (make-hash-table)))
  (defun put-wity-tag (sym val)
    (assert (symbolp sym))
    (setf (gethash sym table) val))

  (defun get-wity-tag (sym val)
    (assert (symbolp sym))
    (gethash sym table)))

(set-wity-tag 'foo 2022)
(get-wity-tag 'foo)      ;; => 2022

Keeping these things in mind, one should be able to understand what the macro I want to discuss is using and what it is doing.

More details on this discussion can be found in (Gabriel and Pitman 1988).

The Parts of the Definitions

So let us look at the components of the definition, and why I consider it a very “Lisp-2” macro.

The Body, the Macro

(defmacro subst-func-symbols (mode data &rest body)

So of course we are dealing with some kind of a macro here. A variadic macro, that takes two arguments and all other arguments are interpreted as a “body”. The first two arguments don’t concern us for now. The third one is more interesting, because that is the reason we will be able to have the macro appears to be a part of the language:

(subst-func-symbols i-dont-care-about the-first-two-arguments
  (but whatever (is :in the))
  (body (will be) 'handled)
  (by "the" macro))

Most interesting languages have at least one lesson they teach a programmer, that applies beyond the language itself. Alan Perlis put it well:

A language that doesn’t affect the way you think about programming is not worth knowing.

In my opinion, one of the lessons of Lisp could be that the boundary we imagine between a language, the standard library and whatever we do in a language is imagined. Sure, when writing C or Java it is easier to draw a line and come up with some categorisation. But in the end, whenever you add a function or any other construct, you have extended the language in one way or another. Designing a language and designing a system are not as different as one would be inclined to imagine at first.

This is the sense in which programming is an art. Having a sense for how to design language – be it the at the core or at the periphery – is that part that is hard for formalise into a checklist, to statically check on the go or to teach in a course. That is not to say that these things don’t help. Many great paintings stand on the foundation of a solid understanding of geometric proportions and a careful study of color-theory. Like all art, there is an inter-subjective component that determines quality. The computer of course doesn’t care. When thinking about the subject in this way, the following famous quote from the SICP(Abelson and Sussman 1996) makes sense to me:

Programs must be written for people to read, and only incidentally for machines to execute.

An elegant language is often the foundation of a flexible system, capable of adjusting to changing requirements.

Stepping back, I find it interesting to reconsider what a macro is in a Lisp system. Compared to something like C, where the pre-processor is a language distinct language, Lisp macros behave just like any other procedure. They take arguments and return an expression, but while doing so can do any kind of computation or trigger side-effects that makes macro-expansion non-deterministic.

The real difference to a regular procedure is that the arguments it receives and result it returns are not evaluated. In Emacs Lisp (and to my knowledge a few Common Lisp implementations) this really means the implementation is a function denoted to be a macro. I remember being surprised to see that the function slot of a symbol used as a macro is really just a cons-cell, where the car is the symbol macro and the cdr is a regular function:

(symbol-function 'defun) ;; => (macro . ...)

Indentation Declaration

Skipping over the documentation string, we encounter

  (declare (indent 2))

What this does in Emacs Lisp is indicate how the macro should be indented

Indent calls to this function or macro according to INDENT-SPEC. This is typically used for macros, though it works for functions too. See Indenting Macros.

and specifically a number means

The first number arguments of the function are distinguished arguments; the rest are considered the body of the expression. A line in the expression is indented according to whether the first argument on it is distinguished or not. If the argument is part of the body, the line is indented lisp-body-indent more columns than the open-parenthesis starting the containing expression. If the argument is distinguished and is either the first or second argument, it is indented twice that many extra columns. If the argument is distinguished and not the first or second argument, the line uses the standard pattern.

Given a macro like

(defmacro foo (&rest args)
  "Just return ARGS."
  (declare (indent 2))
  args)

we see that it actually expands to

(prog1
    (defmacro foo (&rest args) ;; this is usually expanded further.
      "Just return ARGS."
      args)
  (function-put 'foo 'lisp-indent-function 2))

What each element in the declare argument list actually means, in other words what code should be appended after the actual definition of the macro (or procedure) is specified by the variable defun-declarations-alist. You can add anything you want to this list to define a new term².

In this case we set the symbol property lisp-indent-function to the value 2. Emacs takes care of the rest. There not much more to discuss on this point.

Error Handling

  (unless (symbolp mode)
    (error "Mode must be a symbol"))

As mentioned before, Macros are just functions and can behave like functions. One consequence of this is that we can raise errors during the macro-expansion phase³. Compare this to C where the best you could do is something like

#if some_condition
# error "Something went wrong"
#endif

but (to my knowledge) a preprocessor error cannot depend on the “value” of an argument to a macro function.

In this case I want to ensure that one argument – again, the non-evaluated element in the abstract syntax tree – is a symbol. That is to say that

(subst-func-symbols some-symbol ...)

will work, but

(subst-func-symbols (this-evaluates-to some-symbol) ...)

wont. The reason here is that I am interested in mode (the symbol designating the variable bound as an argument, that in turn is expected to have a symbol bound to it) at macro-expansion time. Passing mode to eval or symbol-value wouldn’t necessarily work, because this would give us what one might expect. The result would be the expression evaluated in the context of or variable bound during the macro-expansion.

Note that it would be possible to also shift the macro-expansion into the evaluation, by returning a macro that would construct the s-expression and pass it to eval. There would be little point in using a macro if this was always the indented use-case, but I believe that a general macro that could distinguish between compile-time constants that can ensure the macro is computed at macro-expansion time or if it has to be deferred to evaluation would be interesting.

Oh, and of course I should also say that we are using unless, one of the simplest useful macros. It is defined in Emacs Lisp as

(defmacro unless (cond &rest body)
  "If COND yields nil, do BODY, else return nil.
When COND yields nil, eval BODY forms sequentially and return
value of last one, or nil if there are none."
  (declare (indent 1) (debug t))
  (cons 'if (cons cond (cons nil body))))

And symbolp is a predicate function, that uses the traditional “…p” convention. To quote from the Jargon file:

The -P convention: turning a word into a question by appending the syllable “P”; from the LISP convention of appending the letter “P” to denote a predicate (a Boolean-values function). The question should expect a yes/no answer, though it needn’t. (See T and NIL.) At dinnertime: “Foodp?” “Yeah, I’m pretty hungry.” or “T!”; “State-of-the-world-P?” (Straight) “I’m about to go home.” (Humorous) “Yes, the world has a state.”

Variations on `let`

  (letrec

Not even considering the rest of the expression, I just wanted to comment on letrec. In general, we bind values in a body using let. This is known to be equivalent to the immediate application of a lambda expression (note that this is not related to η-conversion):

((lambda (x) ...) 5) ≡ (let ((x 5)) ...)

There are variations on let, to multiple ends. The most common one is let*, that when multiple forms are bound at once, each has access to each previous binding. So while

(let ((celsius 10)
      (farenheit (+ (/ celsius 9/5) 32)))
  (format t "~G°C is ~G°F" celsius farenheit))

would fail since celsius is not bound in (+ (/ celsius 9/5) 32),

(let* ((celsius 10)
       (farenheit (+ (/ celsius 9/5) 32)))
  (format t "~G°C is ~G°F" celsius farenheit))

works, because this is equivalent to

((lambda (celsius)
   ((lambda (farenheit)
      (format t "~G°C is ~G°F" celsius farenheit))
    (+ (/ celsius 9/5) 32))) 
 10)

while the let could be modelled using a variadic lambda expression and application.

Other variations include flet (“function let”) and labels (???) that bind the function slot of a symbol. Scoping-wise these two are respectively analogous to let and let*. macrolet does the same for local macros!

A popular macro is if-let, that combines let* (despite the name not let!) and if. This is useful for step-wise computations that might “fail” by returning nil. Only if all variables are “successfully” bound, we evaluate the consequent with the variables bound. Other variations of this macro include when-let (using when), and-let* (using and) and while-let (new in Emacs 29) and pcase-let* (using the pattern-matching macro pcase).

Another worthwhile mention from Elisp is cl-letf. I don’t know of any such implementation for Common Lisp – though it wouldn’t be difficult to write your own. What it does is not only bind “regular” variables, but also generalised variables. You can use this to emulate flet, by binding a function to the generalised variable (symbol-function foo).

Swinging-back to Lisp-1, Scheme has the “named let”, also implemented in Emacs Lisp as named-let⁴. This makes use of the association between let expressions and their equivalent functional forms, by allowing the let-binding to be recursively invoked as a function, where the arguments replace the bindings. This provides a “functional” alternative to using loops:

(let self ((i 1000) (x x0))
  (if (> i 0)
      (self (1- i) (- x (/ (f x) (df x))))
    x))

What might seem surprising to some is how self and refer it itself. Does this require implementation support, one might wonder. No, as mentioned above, in Emacs Lisp this can be implemented as a macro, that in turn makes use of the macro letrec.

I don’t know why, but understanding letrec took a while for me. The first time I remember encountering it was probably in “The Scheme Programming Language”(Dybvig 2009):

Like let, the letrec syntactic form includes a set of variable-value pairs, along with a sequence of expressions referred to as the body of the letrec.
(letrec ((var expr) ...) body1 body2 ...)
Unlike let, the variables var ... are visible not only within the body of the letrec but also within expr ....

Knowing what it does, and when it is used, I understand what is being said, but what helped me most is to consider letrec as a macro and take a look at the expanded form⁵:

(letrec ((foo bar) (baz qux)) ...)
≡
(let (foo baz) 
  (setq foo bar)
  (setq baz qux)
  ...)

What this allows us to do is to bind a function and ensure that the function has access to the binding, so the previous named-let example could also be expressed as either of the following:

...
≈
(letrec ((newton
          (lambda (i x)
            (if (> i 0)
                (funcall newton (1- i) (- x (/ (f x) (df x))))
              x))))
  (newton 1000 x0))
≈
(let ((i 1000) (x x0))
  (catch 'quit
    (while t
      (let ((x x) (i i))
    (if (> i 0)
        (setq x (- x (/ (f x) (df x)))
          i (1- i))
      (throw 'quit x))))))

Where the latter expansion would be preferable, as it does not involve function calls for each step of the computation.

But we won’t be using a named-let, that was just a distraction. The point was to demonstrate how letrec can be used, in this case to re-create something like labels (without the use of cl-labels in Emacs Lisp).

Symbols and their Representations

(prop (intern (format "subst-%S" mode)))

Moving on to the first binding of the letrec, we see that it has noting to do with what letrec has to offer, for it is just a regular, non-recursive expression bound to a symbol. So far, so boring. We will see more on that later on.

What I do think is interesting is the usage of intern. In principle, all it does is “convert” a string to a symbol, in such a way that the symbol will be eq to all other strings with the same string representation in the same obarray or package (depending on the Lisp). As to why this holds:

(eq (intern "foo") 'foo)                        ;=> t
(eq (make-symbol "foo") 'foo)                   ;=> nil
(eq (make-symbol "foo") (make-symbol "foo"))    ;=> nil

This is because make-symbol returns a symbol that is not “interned”, that is to say inserted into a collection of defined symbols. The critical lessen here is that comparing symbol names isn’t enough to conclude that two symbols are eq (or eql, equal, …):

(defun broken-sym-eq (s1 s2)
  (string= (symbol-name s1) (symbol-name s2)))

(broken-sym-eq (make-symbol "foo") (make-symbol "foo")) ;=> t

Here is my question: Is it “proper” to intern symbols? Sure, it must be done, at the very least by the reader, but should it ever be done if it is not necessary?

This depends on whether or not you want symbols to be opaque or not. As we have seen, in Lisp they are certainly not opaque, as each symbol has a print representation that you can query with symbol-name or an equivalent function. For symbols to be opaque, the reader would have to have some private, inaccessible obarray or be decoupled from the rest of the system – which I don’t think most people would say is worth it.

Some might question my motivation for distinguishing between opaque symbols and non-opaque, “introspectable” symbols in the first place. The fact that I haven’t given an argument for or against should indicate that this is an issue I am undecided on myself. The best sketch of a reason pro opaqueness is that if we want to understand a symbol in (a non-equality-)relation to any other object in the system, it should be strictly eq-ual to only itself. I’d even go as far as saying that the fact that symbols can be compared with one another in constant time is purely an implementation detail. The central point of a symbol could be argued to be the fact that you cannot gain access to a symbol that is eq to an existing symbol, unless you have access to it in the first place. This guarantee is broken when intern is introduced, because it allows us to access all symbols that have previously been interned – in so far we have access to the obarray that contains the symbol and was used to intern a symbol name.

To my understanding, Scheme doesn’t have the concept of an uninterned symbol: Both R6RS(Sperber et al. 2009, 49) and R7RS(Shinn et al. 2013, 44) just state that string->symbol just returns a symbol with the print representation of the input. And if we take a look at the specification of symbol=?, we get:

Returns #t if all the arguments are symbols and all have the same names in the sense of string=?. (Shinn et al. 2013, 44)

which is relevant because two symbols are eq? (ie. eq) if they are eql? (ie. eql)(Shinn et al. 2013, 31), and two symbols are eql? if they are symbol=?(Shinn et al. 2013, 30).

Interestingly, R6RS is less precise on what it means for symbols to be the same:

Returns #t if the symbols are the same, i.e., if their names are spelled the same(Sperber et al. 2009, 49).

I really like symbol, as a programming feature. It is one of the main things I tend to miss in conventional, “blub” language. But this is also not unique to Lisp, so I wanted to check how other languages that I am familiar but not that well versed in handle symbols.

Smalltalk: In the introduction of the Smalltalk-80: Language and Implementation Book(Goldberg and Robson 1983, 21) we read
Symbols are objects that represent strings used for names in the system. The literal representation of a symbol is a sequence of alphanumeric characters preceded by a pound sign, for example,
```
#:bill
#M63
```
There will never be two symbols with the same characters; each symbol is unique. This makes it possible to compare symbols efficiently.
(emphasis mine)

A little later on, when the formal definition of symbols is presented(Goldberg and Robson 1983, 168), where we not only see that symbols are treated the same way as Scheme does:

Symbols are arrays of Characters that are guaranteed to be unique.

Thus:

‘a string’ asSymbol == ‘a string’ asSymbol

answers true.

Where asSymbol is a method of the String class.

We also see that the Symbol class protocol defines a unary method intern, that would also behave like string->symbol, not Lisp’s intern.

Also interesting is that according to the class-hierarchy diagram(Goldberg and Robson 1983, 144), Symbol is actually a subclass of String?
Prolog: What we have been talking about as symbols are usually referred to as “Atoms” in Prolog (which are unrelated to “atoms” in Lisp, ie. any object that is not a cons-cell). The comparison is not that easy, because equality plays a different role in Prolog, than it does in Lisp or Smalltalk, because of the way Unification is used, but we can also check for equality directly:
```
?- foo == foo.
true.

?- foo == bar.
false.
```
Here foo and bar are symbols, while in the following
```
?- foo = Bar.
Bar = foo.
```
foo is a symbol, and Bar is a symbol that the system attempts to resolve, in this case by instantiating it with the symbol foo.

What == is test if two terms are identical(Covington, Nute, and Vellino 1993, 70), which in turns means they have the same AST(Covington, Nute, and Vellino 1993, 5). According to (Covington, Nute, and Vellino 1993, 21) the “sameness” of atoms (or “tokens”) is also just a matter of comparing their representation.

From what I see, and I might be mistaken here, there is no standard way to create a symbol from a string – or rather there is no such thing as a string in ISO Prolog? This might mean that standard Prolog actually has opaque symbols.

If we check actually existing implementation, we find atom_string for SWI Prolog, or atom_chars for GNU Prolog (the latter works on a list of symbols representing characters, instead of strings).

Edit (30Oct24): Florian points out, that ISO Prolog has atom_codes.

The Wikipedia article on symbols in programming goes into this in greater detail.

But returning to the initial code-snippet: What we are doing here is creating a symbol relative to the print representation of the symbol that was passed through as an argument to the macro. So if the input was foo, I actually want to get the symbol subst-foo. This is a common thing to do in Lisps. E.g. if I define a structure

(defstruct complex real imag) ;a structure with two members

the macro defstruct will also define and provide me with access-functions complex-p, complex-real, complex-imag that all were synthesised by interning the print representations of the structure and member names. This also means that even if one of the symbols wasn’t interned:

(let ((complex (make-symbol "cplx")))
  (eval `(cl-defstruct ,complex real imag)))

the symbols of the auxiliary function will still be accessible as “regular”, interned symbols from the default obarray:

(intern-soft "cplx-real") ;=> cplx-real

We will be using this symbol, specifically its symbol properties to decide how and what to substitute. As we will see later, this was absolutly not necessary (but I never said that this macro is a good demonstration of various Lisp-2 features).

Recursion with anonymous Functions

          (subst
            (lambda (form)
              (cond
               ((eq (car-safe form) 'quote) form)
               ((eq (car-safe form) 'function)
                `#',(let ((pred (get (cadr form) prop)))
                      (or (and pred (funcall pred form data))
                          (cadr form))))
               ((consp form)
                (cons
                 (let ((pred (get (car form) prop)))
                   (or (and pred (funcall pred form data))
                       (car form)))
                 (mapcar subst (cdr form))))
               (form))))

This is the “real meat” of the substitution procedure. The function will be invoked recursively on every expression of interest. To better discuss the macro, I’ll go through the procedure in blocks.

Conditional Expressions and pattern patching

            (lambda (form)
              (cond

While substituting, we will have to be considering the possible values an expression may consist of. I use cond in this case (as opposed to if), because we will be considering multiple alternative forms of expression, that are all “equally probable”.

The relation between cond and if is interesting, because both can be implemented as macros on top of one-another. Taking the Scheme variations (that don’t have convenience features like the implicit progn of the else-case for if or the special handling of empty bodies for each cond branch), they might look something like this:

(defmacro if-with-cond (test then else)
  `(cond 
    (,test ,then)
    (t ,else)))

(defmacro cond-with-if (&rest branchs)
  (let ((body nil))
    (dolist (branch (reverse branchs))
      (setq body `(if ,(car branch)
                      (progn ,@(cdr branch))
                    ,body)))
    body))

On most systems, both forms will be built into the language as special forms, but if you’d have to pick one the implement, I would have guessed that if would be the one to prefer, due to the relative simplicity. Nevertheless, one finds that in (McCarthy et al. 1962, 10) the S-Expression for the conditional form is the familiar cond. My guess is that this is due to the fact cond is more flexible and that macros were not available before 1963(Steele and Gabriel 1996, 274), (Hart 1963). Of course, given macros it would be (computationally) feasible to define either of the two in terms of lambda expressions, given a church-encoding for boolean values.

To my knowledge, the concept of a “conditional expression” had its genesis in LISP. Most other languages that derive from ALGOL, that have some kind of a conditional expression (often via a so-called “ternary operator”, e.g. in C test ? then : else), owe this to John McCarthy’s proposition(Perlis 1978, 85). It appears that the conditional expression as a concept in programming languages originated in Lisp, and it did not appear in any of Lisp’s ancestors. It also appears that the way we think of them wasn’t the initial conception(Stoyan 1984, p 302):

They are not listed as a kind of normal (nunerical) expressions. Instead they are one of of the alternatives for expressions that can occur as the argument of a GOTO-statement! These expressions are called “designational expressions”.

Returning back to the specific case, the point of cond here is to check what kind of an expression is contained in form. As seen in the next section, this will be done by manually inspecting the value. This is a low-level approach, that will work well enough, but anyone who has had experience with languages that support pattern matching will realise what I actually wanted to express. This is one of the criticisms that Philip Wadler raised against Scheme(Wadler 1987) that languages from the ML family (in his example Mirinda) handle better.

Of course this is not an inherent limitation of a Lisp system. Scheme has match.scm, Emacs Lisp has the powerful pcase macro, and Common Lisp as an unsurprising variation of solutions, in addition to the case-family of forms (that aren’t as powerful, since they don’t support pattern-matching via Unification).

Quoting, Sharp-quoting, Back-quoting and Un-quoting

               ((eq (car-safe form) 'quote) form)
               ((eq (car-safe form) 'function)
                `#',(let ((pred (get (cadr form) prop)))
                      (or (and pred (funcall pred form data))
                          (cadr form))))

This is the part of crux of the “real meat” of the substitution procedure. We want to do two things:

Inhibit substitution under quoted forms.
Handle the substitution of sharp-quoted symbols.

Both of these two checks rely on car-safe, an auxiliary function defined for Emacs Lisp. It is defined in the core, in C:

INLINE Lisp_Object
CAR_SAFE (Lisp_Object c)
{
  return CONSP (c) ? XCAR (c) : Qnil;
}

the intuitive translation into Lisp would be

(defun car-safe (c) (if (consp c) (car x) nil))

The reason for this is that car will raise an error indicating that the argument is not a cons-cell, if the argument is not a cons-cell.

Another approach would have been to have the cond body start with a

(cond 
  ((not (consp form)) form)
  ;; ...
  )

as to ensure that any non-cons-cell gets caught before we continue on to the other clauses. That way we would have the guarantee that car and cdr would be passed a legal value.

The first point is pretty trivial: If a form is quoted, we just return the form as such and replace nothing underneath. This point is open to debate, one might as well argue that the procedure should operate under quoting, which might be of use when passing a form to eval. Then again, there are plenty of other situations where this might break a program, so I’d prefer to err on the side of caution.

The second point, checks if we are dealing with a quoted function, ie. something of the form #'foo (or (function foo), which is what the reader expands the former into). As discussed above, what this actually does depends on the Lisp system. What we want to do is check if foo is to be substituted with some other symbol. We know the function will be sharp-quoted again, so the #' must remain. This is done using quasi-quoting. For those unfamiliar, this is a kind of quoting that has little holes we evaluate expressions in. We introduce a quasi-quoted term with a backtick (`), and the holes with a comma (,). The following two are equivalent:

`(this (list has) ,(+ 2 2) items)
≡
(list 'this (list 'list 'has) (+ 2 2) 'items)

Among other things, we note that

`',foo
≡
`(quote ,foo)
≡
(list 'quote foo)

and analogously

`'#,foo
≡
`(function ,foo)
≡
(list 'function foo)

This is precisely what happens in the function clause. As to what is being replaced, we will consider in the next section:

The Substitution Remains the Same

               ((consp form)
                (cons
                 (let ((pred (get (car form) prop)))
                   (or (and pred (funcall pred form data))
                       (car form)))
                 (mapcar subst (cdr form))))

The rule here is simple: For any cons-cell, which represents a function call, we try and replace the function symbol and apply the substitution procedure to all arguments. This mirrors function application in a Lisp-2.

So we finally arrive at the point where we figure out what is being replaced. This is where symbol properties enter the game: We will be using the symbol in the function slot⁶ to query a property that we constructed earlier using (intern (format "subst-%S" mode)). The property might hold a function that we invoke to query what the function slot is to be replaced with. If either no function was found or the function indicates that nothing is to be replaced, we re-insert the initial symbol into our new tree.

So what we conclude, is that this is just a framework for function substitution. How and what is replaced, is part of the input, specifically the mode argument. We’ll see a few examples soon.

Tying it all together

               (form))))
    (macroexp-progn (mapcar subst body))))

Finally, we have to start somewhere. We make use of macroexp-progn. If we check the macro signature, we see that body will contain a list of all arguments, which is not a valid expression in itself. It is the difference between (+ 1 1) and ((+ 1 1)). Since the macro has an “implicit progn”, we will expand to have an actual progn if necessary. The function macroexp-progn takes care of that by either prepending a progn to the list, or by just returning the car if the list consists of a single element.

Given this input expressions, we then proceed to invoke the substitution procedure discussed above, and might return a modified AST, which is also the return value of the macro – that is to say the code expansion.

A few Examples

Going back to the first example: Emacs used to be bundled with a “Common Lisp” compatibility library, that wasn’t really a “Common Lisp” compatibility library, but just attempts at reimplementing various Common Lisp functions in Emacs Lisp.

A while back the decision was made to deprecate this library, cl.el in favour of cl-lib.el, as a compromise(Monnier and Sperber 2020, 74:38). It provides the same functionality, but you have to prefix every function and macro with cl- to avoid namespaceing conflicts with its close relative, Emacs Lisp.

I have to admit, that there are certain cases where I’d be fine with stripping away the cl- prefixes. Especially when a section combines multiple common lisp compatibility functions and macros into one.

Initially I had assumed that the shorthand system, that was introduced in Emacs 28 – more hastily than carefully in my own opinion – would help me out. But it turns out that this can only expand a shorter, non-empty prefix into something else. It cannot be used to rename symbols entirely.

The topic of this text can help us remedy the inconvenience. To do so, we first pick a symbol that will be used to indicate what kind of substitution is taking place: We pick the intuitive cl. Next we iterate over all define symbols in the default obarray and find those that begin with cl- (taking advantage of symbol introspection). These are tagged with the symbol property subst-cl mapping to a constant function, that will return the respective prefixed form:

(require 'cl-lib)
(mapatoms
 (lambda (sym)
   (let ((name (symbol-name sym)))
     (when (string-match "\\`cl-\\([^-].*\\)" name)
       (put (intern (match-string 1 name))
        'subst-cl (lambda (_form _data) sym))))))

After having evaluated the code, we can try out a small text first:

(funcall (get 'first 'subst-cl) 'form 'data) ;=> cl-first

as expected. Now to the initial example can be tested:

(macroexpand-1
 '(subst-func-symbols cl nil
   (every #'plusp (loop for i from 1 to 10 collect i))))
;; => (cl-every #'cl-plusp (cl-loop for i from 1 to 10 collect i))

Just as was intended.

For the sake of convenience, we might also wrap this in yet another macro:

(defun common-lisp (&rest body)
  `(subst-func-symbols cl nil ,@body))

Another example where this could have been of use is ELPA package Compat, a backwards compatibility library for Emacs Lisp that I happen to maintain. In fact, the first prototype of this macro was mentioned on the development mailing list. The issue we were dealing with is that the library provides certain prefixed definitions, to avoid overwriting existing functions and macro⁷. E.g. in Emacs 26, the traditional, dyadic function assoc “acquired” a new optional argument testfn, comparable to Common Lisp’s :test keyword for assoc. One could use advice over override the default definition, but that would be too invasive for a package that is installed by default as a dependency on many systems nowadays.

Instead one might use a macro like this to perform the substitutions, similar to what was previously done with cl. The difference here would be that we could make use of that “data” argument to indicate the maximum version we wish to target:

(macroexpand-1
 '(subst-func-symbols compat 27
   (assoc some-key some-alist #'some-comparator)))
;; => (compat--26-assoc some-key some-alist #'some-comparator)
;; or
;; => (assoc some-key some-alist #'some-comparator)

What the exact expansion will be, depends on the current version of Emacs. If the proper definition of assoc is provided OOTB, we would want to use that, since as a core function, it will be more efficient than our replica. Otherwise we will fall back to some function we chose depending on the input and the data argument (in this case the value 27).

Note that the decision was ultimately made in favour of a simpler system, that wouldn’t involve the kind of traversal that the macro implements

So that is that. I found this an interesting investigation of how various Lisp-2 features could be used and how they relate to Lisp and Programming in general.

In conclusion, I have no idea who the target audience of this text is supposed to have been.

References

Abelson, Harold, and Gerald Jay Sussman. 1996. Structure and Interpretation of Computer Programs. The MIT Press.

Covington, Michael A, Donald Nute, and Andr e Vellino. 1993. “ISO Prolog.” A Summary of the Draft Proposed Standard.

Dybvig, R Kent. 2009. The Scheme Programming Language. MIT Press.

Gabriel, Richard P, and Kent M Pitman. 1988. “Endpaper: Technical Issues of Separation in Function Cells and Value Cells.” Lisp and Symbolic Computation 1 (1): 81–101. https://www.nhplace.com/kent/Papers/Technical-Issues.html.

Goldberg, Adele, and David Robson. 1983. Smalltalk-80: The Language and Its Implementation. Addison-Wesley Longman Publishing Co., Inc.

Hart, Timothy P. 1963. “MACRO Definitions for LISP.”

McCarthy, John, Paul W Abrahams, Daniel J Edwards, Timothy P Hart, and Michael I Levin. 1962. “LISP 1.5 Programmer’s Manual.”

Monnier, Stefan, and Michael Sperber. 2020. “Evolution of Emacs Lisp.” Proceedings of the ACM on Programming Languages 4 (HOPL): 1–55.

Perlis, Alan J. 1978. “The American Side of the Development of Algol.” In History of Programming Languages, 75–91.

Shinn, Alex, John Cowan, Arthur A Gleckler, et al. 2013. “Revised 7 Report on the Algorithmic Language Scheme.” Technical report. https://small.r7rs.org/attachment/r7rs.pdf.

Sperber, Michael, R Kent Dybvig, Matthew Flatt, Anton Van Straaten, Robby Findler, and Jacob Matthews. 2009. “Revised6 Report on the Algorithmic Language Scheme.” Journal of Functional Programming 19 (S1): 1–301. http://www.r6rs.org/final/r6rs.pdf.

Steele, Guy L, and Richard P Gabriel. 1996. “The Evolution of Lisp.” In History of Programming Languages—II, 233–330.

Stoyan, Herbert. 1984. “Early LISP History (1956-1959).” In Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, 299–310.

Wadler, Philip. 1987. “A Critique of Abelson and Sussman or Why Calculating Is Better Than Scheming.” ACM SIGPLAN Notices 22 (3): 83–94. https://dl.acm.org/doi/pdf/10.1145/24697.24706.

Strictly speaking this is an oversimplification, as ((lambda (foo) (list foo foo)) 'bar) is valid and works like a let.↩︎
As an example, I’ve recently been hacking on Agda’s Emacs mode. One interesting fact about Agda’s Emacs mode is that it currently just receives strings from the interpreter/compiler, reads these in and passes them straight to eval. Possibly risky, but luckily there is a comment that agrees with me. My proposal will be to add a declaration to defun-declarations-alist that first of all points out that this function may be invoked by Agda and what “type” (CL not proper typing) the arguments are to have. The arguments are also not evaluated.↩︎
Note that the “macro-expansion phase” does not have to strictly precede the “evaluation phase” with a break in between. If we dynamically evaluate a term that includes a macro, the two will happen directly after one another. The NEWS file for Emacs 24 (etc/NEWS.24) has an interesting notice on this topic under the heading “Emacs tries to macroexpand interpreted (non-compiled) files during load.”↩︎
Note that named-let has a simple form of TCO-optimisation. If you are interested in how this is done, you can see the implementation in Compat ↩︎
For Emacs Lisp the macro will attempt to expand a letrec to let* if there are no recursive bindings, so bar and qux are best to be understood as metavariables↩︎
As pointed out before, this is actually not a valid assumption. The function slot may also contain a (lambda ...) expression. Worse yet, when considering the previous case with sharp-quoted symbols, Emacs Lisp permits #'(foo bar) while Common Lisp would raise an error.↩︎
Which would be possible. You can even overwrite core functionality, just try evaluating (fset 'eval #'identity) and see how Emacs will slowly deteriorate↩︎