Buy one and get “Assorted Commentary on Lisp and Programming in General” for free!
Wednesday, 23 November 2022
I’d like to discuss a macro I happen to have written. The definition is not that long, and the idea is actually pretty simple. Have a look:
defmacro subst-func-symbols (mode data &rest body)
("Generic system of substituting function symbols in BODY.
Each function call and sharp-quoted function symbol in BODY is
recursively inspected, where the symbol in the function slot is
used to check for a substitution predicate. The symbol is
checked for a property (that is determined by interring a symbol
will prepend `subst-' to MODE) for a substitution function. If
found, the substitution function is called with two
arguments (the entire function call, and DATA), and is expected
to either return a new symbol that will replace the previous
function symbol, or nil in which case it will not be modified."
declare (indent 2))
(unless (symbolp mode)
(error "Mode must be a symbol"))
(intern (format "subst-%S" mode)))
(letrec ((prop (subst
(lambda (form)
(cond
(eq (car-safe form) 'quote) form)
((eq (car-safe form) 'function)
((let ((pred (get (cadr form) prop)))
`#',(or (and pred (funcall pred form data))
(cadr form))))
(consp form)
((cons
(let ((pred (get (car form) prop)))
(or (and pred (funcall pred form data))
(car form)))
(mapcar subst (cdr form))))
(
(form)))))mapcar subst body)))) (macroexp-progn (
All it does is replaces symbols designating functions.
This can be useful: Later on I will give an example of how this can be used to transform the Common Lisp-ish
every #'plusp (loop for i from 1 to 10 collect i)) (
into valid and “modern” Emacs Lisp:
1 to 10 collect i)) (cl-every #'cl-plusp (cl-loop for i from
Note how every
(a function) and loop
(a
macro), both in a function position got translated, but also
plus
, a (sharp-)quoted symbol was also caught.
(Warning: I am not a historian of Lisp nor an avid Implementer, so I might get a thing or two wrong here. If you catch any such mistake, please tell me.)
Lisp (in a broader sense) is split into two traditions: The slightly older “LISP 1.5”(McCarthy et al. 1962) family that nowadays mainly includes Scheme, and the slightly newer “Lisp 2” lineage that is what most people understand when they think of Lisp. Most old Lisp implementations (MACLISP, Interlisp, Lisp Machine Lisp, Franzlisp, …) along with Common Lisp and Emacs Lisp as two contemporary examples. By the way, when I say “slightly older” or “slightly newer”, I am talking about a difference of maybe a few years, which is totally negligible.
The schism between the two is mainly on the topic of namespacing. Prior to Lisp 2, a function was just another value that could be “assigned” to a symbol. This is nice because it makes passing a function as a value to a higher-order function more intuitive:
map some-func some-lisp) (
A disadvantage is that you have to watch out not to shadow your functions when binding variables:
;; Wrap a list in a list
let ((list '(1 2 3)))
(list list)) (
This would trigger a error because the value bound to
list
is (1 2 3)
and not a function.
In a Lisp 2, we have a separate (so-called) name space for variable
values and functions. This is done by having separate “variable slots”
and “function slots”. While in most places, a symbol continues to
evaluate to the value in the variable slot, a symbol in the position of
a function is evaluated to the value in the function slot. In other
words, a Lisp-1-eval
given a list
would
execute
apply (eval (car list)) (mapcar eval (cdr list))) (
a Lisp-2-eval
might do
apply (symbol-function (car list)) (mapcar #'eval (cdr list))) (
The function symbol-function
would be responsible for
querying the function slot (as opposed to symbol-value
,
that would query the value slot). Many people don’t like this, because
you now can’t do things like
if (remote? destination) send print) file) ((
but instead one is forced to either write
funcall (if (remote-p destination) #'send #'print) file) (
or worse still
if (remote? destination)
(
(send file)print file)) (
This is because the first element of the list must be a symbol, so
that it can be passed to symbol-function
1
At the same time, we notice that the function as a value is not bound to the value slot, so how do we use it. After all, this is Lisp: Functions are “first class citizens”. We have seen a few examples above, the trick is (sharp-)quoting. So what is what, and what does what do? In Common Lisp we would see the following:
list 'foo #'foo)
(;; => (foo #<FUNCTION foo>)
while Emacs Lisp would give us
list 'foo #'foo)
(;; => (foo foo)
Regular quoting inhibits evaluation, just as it always does, but
sharp-quoting will either behave sort of like
symbol-function
and resolve the symbol to the functional
object, or just behave like quoting. Both approaches work, since
whenever you actually want to use the function you need to invoke
apply
/funcall
, and they handle both a symbol
with a bound function slot or a function object. The advantage of
sharp-quoting in the latter case is that this can still be used as a
hint to the Lisp compiler/interpreter that the symbol ought to have a
bound function slot, and a warning may be issued if this is not know
with certainty.
There is another slot that each symbol holds, namely that containing
a property list. This can be understood as a sort of catch-all
“etcetera” slot, that can be used for miscellaneous tagging of
information to symbols. This is done with get
and
put
:
2022)
(put 'foobar 'written-in-the-year get 'foobar 'written-in-the-year) ;; => 2022 (
We can access the entire property list directly using
symbol-plist
:
symbol-plist 'foobar) ;; => (written-in-the-year 2022) (
Of course this is not necessary, you might just as well have a (global) hash table for each property you are concerned with:
let ((table (make-hash-table)))
(defun put-wity-tag (sym val)
(assert (symbolp sym))
(setf (gethash sym table) val))
(
defun get-wity-tag (sym val)
(assert (symbolp sym))
(gethash sym table)))
(
2022)
(set-wity-tag 'foo ;; => 2022 (get-wity-tag 'foo)
Keeping these things in mind, one should be able to understand what the macro I want to discuss is using and what it is doing.
More details on this discussion can be found in (Gabriel and Pitman 1988).
So let us look at the components of the definition, and why I consider it a very “Lisp-2” macro.
defmacro subst-func-symbols (mode data &rest body) (
So of course we are dealing with some kind of a macro here. A variadic macro, that takes two arguments and all other arguments are interpreted as a “body”. The first two arguments don’t concern us for now. The third one is more interesting, because that is the reason we will be able to have the macro appears to be a part of the language:
(subst-func-symbols i-dont-care-about the-first-two-argumentsthe))
(but whatever (is :in
(body (will be) 'handled)"the" macro)) (by
Most interesting languages have at least one lesson they teach a programmer, that applies beyond the language itself. Alan Perlis put it well:
A language that doesn’t affect the way you think about programming is not worth knowing.
In my opinion, one of the lessons of Lisp could be that the boundary we imagine between a language, the standard library and whatever we do in a language is imagined. Sure, when writing C or Java it is easier to draw a line and come up with some categorisation. But in the end, whenever you add a function or any other construct, you have extended the language in one way or another. Designing a language and designing a system are not as different as one would be inclined to imagine at first.
This is the sense in which programming is an art. Having a sense for how to design language – be it the at the core or at the periphery – is that part that is hard for formalise into a checklist, to statically check on the go or to teach in a course. That is not to say that these things don’t help. Many great paintings stand on the foundation of a solid understanding of geometric proportions and a careful study of color-theory. Like all art, there is an inter-subjective component that determines quality. The computer of course doesn’t care. When thinking about the subject in this way, the following famous quote from the SICP(Abelson and Sussman 1996) makes sense to me:
Programs must be written for people to read, and only incidentally for machines to execute.
An elegant language is often the foundation of a flexible system, capable of adjusting to changing requirements.
Stepping back, I find it interesting to reconsider what a macro is in a Lisp system. Compared to something like C, where the pre-processor is a language distinct language, Lisp macros behave just like any other procedure. They take arguments and return an expression, but while doing so can do any kind of computation or trigger side-effects that makes macro-expansion non-deterministic.
The real difference to a regular procedure is that the arguments it
receives and result it returns are not evaluated. In Emacs Lisp (and to
my knowledge a few Common Lisp implementations) this really means the
implementation is a function denoted to be a macro. I remember being
surprised to see that the function slot of a symbol used as a macro is
really just a cons-cell, where the car
is the symbol
macro
and the cdr
is a regular function:
symbol-function 'defun) ;; => (macro . ...) (
Skipping over the documentation string, we encounter
declare (indent 2)) (
What this does in Emacs Lisp is indicate how the macro should be indented
Indent calls to this function or macro according to INDENT-SPEC. This is typically used for macros, though it works for functions too. See Indenting Macros.
and specifically a number means
The first number arguments of the function are distinguished arguments; the rest are considered the body of the expression. A line in the expression is indented according to whether the first argument on it is distinguished or not. If the argument is part of the body, the line is indented lisp-body-indent more columns than the open-parenthesis starting the containing expression. If the argument is distinguished and is either the first or second argument, it is indented twice that many extra columns. If the argument is distinguished and not the first or second argument, the line uses the standard pattern.
Given a macro like
defmacro foo (&rest args)
("Just return ARGS."
declare (indent 2))
( args)
we see that it actually expands to
prog1
(defmacro foo (&rest args) ;; this is usually expanded further.
("Just return ARGS."
args)2)) (function-put 'foo 'lisp-indent-function
What each element in the declare
argument list actually
means, in other words what code should be appended after the actual
definition of the macro (or procedure) is specified by the variable
defun-declarations-alist
. You can add anything you want to
this list to define a new term2.
In this case we set the symbol property
lisp-indent-function
to the value 2. Emacs takes care of
the rest. There not much more to discuss on this point.
unless (symbolp mode)
(error "Mode must be a symbol")) (
As mentioned before, Macros are just functions and can behave like functions. One consequence of this is that we can raise errors during the macro-expansion phase3. Compare this to C where the best you could do is something like
#if some_condition
# error "Something went wrong"
#endif
but (to my knowledge) a preprocessor error cannot depend on the “value” of an argument to a macro function.
In this case I want to ensure that one argument – again, the non-evaluated element in the abstract syntax tree – is a symbol. That is to say that
(subst-func-symbols some-symbol ...)
will work, but
(subst-func-symbols (this-evaluates-to some-symbol) ...)
wont. The reason here is that I am interested in mode
(the symbol designating the variable bound as an argument, that in turn
is expected to have a symbol bound to it) at macro-expansion time.
Passing mode
to eval
or
symbol-value
wouldn’t necessarily work, because this would
give us what one might expect. The result would be the expression
evaluated in the context of or variable bound during the
macro-expansion.
Note that it would be possible to also shift the macro-expansion into
the evaluation, by returning a macro that would construct the
s-expression and pass it to eval
. There would be little
point in using a macro if this was always the indented use-case, but I
believe that a general macro that could distinguish between compile-time
constants that can ensure the macro is computed at macro-expansion time
or if it has to be deferred to evaluation would be interesting.
Oh, and of course I should also say that we are using
unless
, one of the simplest useful macros. It is defined in
Emacs Lisp as
defmacro unless (cond &rest body)
("If COND yields nil, do BODY, else return nil.
When COND yields nil, eval BODY forms sequentially and return
value of last one, or nil if there are none."
declare (indent 1) (debug t))
(cons 'if (cons cond (cons nil body)))) (
And symbolp
is a predicate function, that uses the
traditional “…p” convention. To quote from the Jargon file:
The -P convention: turning a word into a question by appending the syllable “P”; from the LISP convention of appending the letter “P” to denote a predicate (a Boolean-values function). The question should expect a yes/no answer, though it needn’t. (See
T
andNIL
.) At dinnertime: “Foodp?” “Yeah, I’m pretty hungry.” or “T
!”; “State-of-the-world-P?” (Straight) “I’m about to go home.” (Humorous) “Yes, the world has a state.”
let
(letrec
Not even considering the rest of the expression, I just wanted to
comment on letrec
. In general, we bind values in a body
using let
. This is known to be equivalent to the immediate
application of a lambda expression (note that this is not related to
η-conversion):
lambda (x) ...) 5) ≡ (let ((x 5)) ...) ((
There are variations on let
, to multiple ends. The most
common one is let*
, that when multiple forms are bound at
once, each has access to each previous binding. So while
let ((celsius 10)
(+ (/ celsius 9/5) 32)))
(farenheit (format t "~G°C is ~G°F" celsius farenheit)) (
would fail since celsius
is not bound in
(+ (/ celsius 9/5) 32)
,
let* ((celsius 10)
(+ (/ celsius 9/5) 32)))
(farenheit (format t "~G°C is ~G°F" celsius farenheit)) (
works, because this is equivalent to
lambda (celsius)
((lambda (farenheit)
((format t "~G°C is ~G°F" celsius farenheit))
(+ (/ celsius 9/5) 32)))
(10)
while the let
could be modelled using a variadic
lambda
expression and application.
Other variations include flet
(“function let”) and
labels
(???) that bind the function slot of a symbol.
Scoping-wise these two are respectively analogous to let
and let*
. macrolet
does the same for local
macros!
A popular macro is if-let
, that combines
let*
(despite the name not let
!) and
if
. This is useful for step-wise computations that might
“fail” by returning nil
. Only if all variables are
“successfully” bound, we evaluate the consequent with the variables
bound. Other variations of this macro include when-let
(using when
), and-let*
(using
and
) and while-let
(new in Emacs 29) and
pcase-let*
(using the pattern-matching macro
pcase
).
Another worthwhile mention from Elisp is cl-letf
. I
don’t know of any such implementation for Common Lisp – though it
wouldn’t be difficult to write your own. What it does is not only bind
“regular” variables, but also generalised
variables. You can use this to emulate flet
, by binding
a function to the generalised variable
(symbol-function foo)
.
Swinging-back to Lisp-1, Scheme has the “named let
”,
also implemented in Emacs Lisp as named-let
4.
This makes use of the association between let
expressions
and their equivalent functional forms, by allowing the
let
-binding to be recursively invoked as a
function, where the arguments replace the bindings. This provides a
“functional” alternative to using loops:
let self ((i 1000) (x x0))
(if (> i 0)
(1- i) (- x (/ (f x) (df x))))
(self ( x))
What might seem surprising to some is how self
and refer
it itself. Does this require implementation support, one might wonder.
No, as mentioned above, in Emacs Lisp this can be implemented as a
macro, that in turn makes use of the macro letrec
.
I don’t know why, but understanding letrec
took a while
for me. The first time I remember encountering it was probably in “The
Scheme Programming Language”(Dybvig 2009):
Like let, the
letrec
syntactic form includes a set of variable-value pairs, along with a sequence of expressions referred to as the body of theletrec
.(letrec ((var expr) ...) body1 body2 ...)
Unlike
let
, the variablesvar ...
are visible not only within the body of theletrec
but also withinexpr ...
.
Knowing what it does, and when it is used, I understand what is being
said, but what helped me most is to consider letrec
as a
macro and take a look at the expanded form5:
(letrec ((foo bar) (baz qux)) ...)
≡let (foo baz)
(setq foo bar)
(setq baz qux)
( ...)
What this allows us to do is to bind a function and ensure that the
function has access to the binding, so the previous
named-let
example could also be expressed as either of the
following:
...
≈
(letrec ((newtonlambda (i x)
(if (> i 0)
(funcall newton (1- i) (- x (/ (f x) (df x))))
(
x))))1000 x0))
(newton
≈let ((i 1000) (x x0))
(catch 'quit
(t
(while let ((x x) (i i))
(if (> i 0)
(setq x (- x (/ (f x) (df x)))
(1- i))
i (throw 'quit x)))))) (
Where the latter expansion would be preferable, as it does not involve function calls for each step of the computation.
But we won’t be using a named-let
, that was just a
distraction. The point was to demonstrate how letrec
can be
used, in this case to re-create something like labels
(without the use of cl-labels
in Emacs Lisp).
intern (format "subst-%S" mode))) (prop (
Moving on to the first binding of the letrec
, we see
that it has noting to do with what letrec
has to offer, for
it is just a regular, non-recursive expression bound to a symbol. So
far, so boring. We will see more on that later on.
What I do think is interesting is the usage of intern
.
In principle, all it does is “convert” a string to a symbol, in such a
way that the symbol will be eq
to all other strings with
the same string representation in the same obarray or package (depending
on the Lisp). As to why this holds:
eq (intern "foo") 'foo) ;=> t
(eq (make-symbol "foo") 'foo) ;=> nil
(eq (make-symbol "foo") (make-symbol "foo")) ;=> nil (
This is because make-symbol
returns a symbol that is not
“interned”, that is to say inserted into a collection of defined
symbols. The critical lessen here is that comparing symbol names isn’t
enough to conclude that two symbols are eq
(or
eql
, equal
, …):
defun broken-sym-eq (s1 s2)
(string= (symbol-name s1) (symbol-name s2)))
(
make-symbol "foo") (make-symbol "foo")) ;=> t (broken-sym-eq (
Here is my question: Is it “proper” to intern
symbols?
Sure, it must be done, at the very least by the reader, but should it
ever be done if it is not necessary?
This depends on whether or not you want symbols to be opaque or not.
As we have seen, in Lisp they are certainly not opaque, as each symbol
has a print representation that you can query with
symbol-name
or an equivalent function. For symbols to be
opaque, the reader would have to have some private, inaccessible obarray
or be decoupled from the rest of the system – which I don’t think most
people would say is worth it.
Some might question my motivation for distinguishing between opaque
symbols and non-opaque, “introspectable” symbols in the first place. The
fact that I haven’t given an argument for or against should indicate
that this is an issue I am undecided on myself. The best sketch of a
reason pro opaqueness is that if we want to understand a symbol
in (a non-equality-)relation to any other object in the system, it
should be strictly eq
-ual to only itself. I’d even go as
far as saying that the fact that symbols can be compared with one
another in constant time is purely an implementation detail. The central
point of a symbol could be argued to be the fact that you
cannot gain access to a symbol that is eq
to an existing
symbol, unless you have access to it in the first place. This guarantee
is broken when intern
is introduced, because it allows us
to access all symbols that have previously been interned – in so far we
have access to the obarray that contains the symbol and was used to
intern a symbol name.
To my understanding, Scheme doesn’t have the concept of an uninterned
symbol: Both R6RS(Sperber et al. 2009,
49) and R7RS(Shinn et al. 2013, 44)
just state that string->symbol
just returns a symbol
with the print representation of the input. And if we take a look at the
specification of symbol=?
, we get:
Returns
#t
if all the arguments are symbols and all have the same names in the sense ofstring=?
. (Shinn et al. 2013, 44)
which is relevant because two symbols are eq?
(ie.
eq
) if they are eql?
(ie.
eql
)(Shinn et al. 2013, 31),
and two symbols are eql?
if they are
symbol=?
(Shinn et al. 2013,
30).
Interestingly, R6RS is less precise on what it means for symbols to be the same:
Returns
#t
if the symbols are the same, i.e., if their names are spelled the same(Sperber et al. 2009, 49).
I really like symbol, as a programming feature. It is one of the main things I tend to miss in conventional, “blub” language. But this is also not unique to Lisp, so I wanted to check how other languages that I am familiar but not that well versed in handle symbols.
Smalltalk: In the introduction of the Smalltalk-80: Language and Implementation Book(Goldberg and Robson 1983, 21) we read
Symbols are objects that represent strings used for names in the system. The literal representation of a symbol is a sequence of alphanumeric characters preceded by a pound sign, for example,
#:bill #M63
There will never be two symbols with the same characters; each symbol is unique. This makes it possible to compare symbols efficiently.
(emphasis mine)
A little later on, when the formal definition of symbols is presented(Goldberg and Robson 1983, 168), where we not only see that symbols are treated the same way as Scheme does:
Symbols are arrays of Characters that are guaranteed to be unique.
Thus:
‘a string’ asSymbol == ‘a string’ asSymbol
answers true.
Where asSymbol
is a method of the String
class.
We also see that the Symbol
class protocol defines a
unary method intern
, that would also behave like
string->symbol
, not Lisp’s intern
.
Also interesting is that according to the class-hierarchy
diagram(Goldberg and Robson 1983,
144), Symbol
is actually a subclass of
String
?
Prolog: What we have been talking about as
symbols are usually referred to as “Atoms” in Prolog (which are
unrelated to “atoms” in Lisp, ie. any object that is not a
cons
-cell). The comparison is not that easy, because
equality plays a different role in Prolog, than it does in Lisp or
Smalltalk, because of the way Unification is used, but we can also check
for equality directly:
?- foo == foo.
true.
?- foo == bar.
false.
Here foo
and bar
are symbols, while in the
following
?- foo = Bar.
Bar = foo.
foo
is a symbol, and Bar
is a symbol that
the system attempts to resolve, in this case by instantiating it with
the symbol foo
.
What ==
is test if two terms are identical(Covington, Nute, and Vellino 1993, 70),
which in turns means they have the same AST(Covington, Nute, and Vellino 1993, 5).
According to (Covington, Nute, and Vellino
1993, 21) the “sameness” of atoms (or “tokens”) is also just
a matter of comparing their representation.
From what I see, and I might be mistaken here, there is no standard way to create a symbol from a string – or rather there is no such thing as a string in ISO Prolog? This might mean that standard Prolog actually has opaque symbols.
If we check actually existing implementation, we find atom_string
for SWI Prolog, or atom_chars
for GNU Prolog (the latter works on a list of symbols representing
characters, instead of strings).
Edit (30Oct24): Florian points out, that ISO
Prolog has atom_codes
.
The Wikipedia article on symbols in programming goes into this in greater detail.
But returning to the initial code-snippet: What we are doing here is
creating a symbol relative to the print representation of the
symbol that was passed through as an argument to the macro. So if the
input was foo
, I actually want to get the symbol
subst-foo
. This is a common thing to do in Lisps. E.g. if I
define a structure
defstruct complex real imag) ;a structure with two members (
the macro defstruct
will also define and provide me with
access-functions complex-p
, complex-real
,
complex-imag
that all were synthesised by interning the
print representations of the structure and member names. This also means
that even if one of the symbols wasn’t interned:
let ((complex (make-symbol "cplx")))
(eval `(cl-defstruct ,complex real imag))) (
the symbols of the auxiliary function will still be accessible as “regular”, interned symbols from the default obarray:
(intern-soft "cplx-real") ;=> cplx-real
We will be using this symbol, specifically its symbol properties to decide how and what to substitute. As we will see later, this was absolutly not necessary (but I never said that this macro is a good demonstration of various Lisp-2 features).
subst
(lambda (form)
(cond
(eq (car-safe form) 'quote) form)
((eq (car-safe form) 'function)
((let ((pred (get (cadr form) prop)))
`#',(or (and pred (funcall pred form data))
(cadr form))))
(consp form)
((cons
(let ((pred (get (car form) prop)))
(or (and pred (funcall pred form data))
(car form)))
(mapcar subst (cdr form))))
( (form))))
This is the “real meat” of the substitution procedure. The function will be invoked recursively on every expression of interest. To better discuss the macro, I’ll go through the procedure in blocks.
lambda (form)
(cond (
While substituting, we will have to be considering the possible
values an expression may consist of. I use cond
in this
case (as opposed to if
), because we will be considering
multiple alternative forms of expression, that are all “equally
probable”.
The relation between cond
and if
is
interesting, because both can be implemented as macros on top of
one-another. Taking the Scheme variations (that don’t have convenience
features like the implicit progn
of the else-case for
if
or the special handling of empty bodies for each
cond
branch), they might look something like this:
defmacro if-with-cond (test then else)
(cond
`(
(,test ,then)t ,else)))
(
defmacro cond-with-if (&rest branchs)
(let ((body nil))
(dolist (branch (reverse branchs))
(setq body `(if ,(car branch)
(progn ,@(cdr branch))
(
,body))) body))
On most systems, both forms will be built into the language as
special forms, but if you’d have to pick one the implement, I would have
guessed that if
would be the one to prefer, due to the
relative simplicity. Nevertheless, one finds that in (McCarthy et al. 1962,
10) the S-Expression for the conditional form is the familiar
cond
. My guess is that this is due to the fact
cond
is more flexible and that macros were not available
before 1963(Steele and Gabriel
1996, 274), (Hart 1963). Of course, given macros it
would be (computationally) feasible to define either of the two in terms
of lambda expressions, given a church-encoding for boolean values.
To my knowledge, the concept of a “conditional expression” had its
genesis in LISP. Most other languages that derive from ALGOL, that have
some kind of a conditional expression (often via a so-called “ternary
operator”, e.g. in C test ? then : else
), owe this to John
McCarthy’s proposition(Perlis 1978, 85). It appears that the
conditional expression as a concept in programming languages originated
in Lisp, and it did not appear in any of Lisp’s ancestors. It also
appears that the way we think of them wasn’t the initial conception(Stoyan 1984, p
302):
They are not listed as a kind of normal (nunerical) expressions. Instead they are one of of the alternatives for expressions that can occur as the argument of a
GOTO
-statement! These expressions are called “designational expressions”.
Returning back to the specific case, the point of cond
here is to check what kind of an expression is contained in
form
. As seen in the next section, this will be done by
manually inspecting the value. This is a low-level approach, that will
work well enough, but anyone who has had experience with languages that
support pattern matching will realise what I actually wanted to express.
This is one of the criticisms that Philip Wadler raised against
Scheme(Wadler
1987) that languages from the ML family (in his example
Mirinda) handle better.
Of course this is not an inherent limitation of a Lisp system. Scheme
has match.scm
,
Emacs Lisp has the powerful pcase
macro, and Common Lisp as
an unsurprising
variation of solutions, in addition to the case
-family
of forms (that aren’t as powerful, since they don’t support
pattern-matching via Unification).
eq (car-safe form) 'quote) form)
((eq (car-safe form) 'function)
((let ((pred (get (cadr form) prop)))
`#',(or (and pred (funcall pred form data))
(cadr form)))) (
This is the part of crux of the “real meat” of the substitution procedure. We want to do two things:
Both of these two checks rely on car-safe
, an auxiliary
function defined for Emacs Lisp. It is defined in the core, in C:
INLINE Lisp_Object(Lisp_Object c)
CAR_SAFE {
return CONSP (c) ? XCAR (c) : Qnil;
}
the intuitive translation into Lisp would be
defun car-safe (c) (if (consp c) (car x) nil)) (
The reason for this is that car
will raise an error
indicating that the argument is not a cons-cell, if the argument is not
a cons-cell.
Another approach would have been to have the cond
body
start with a
cond
(not (consp form)) form)
((;; ...
)
as to ensure that any non-cons-cell gets caught before we continue on
to the other clauses. That way we would have the guarantee that
car
and cdr
would be passed a legal value.
The first point is pretty trivial: If a form is quoted, we just
return the form as such and replace nothing underneath. This point is
open to debate, one might as well argue that the procedure should
operate under quoting, which might be of use when passing a form to
eval
. Then again, there are plenty of other situations
where this might break a program, so I’d prefer to err on the side of
caution.
The second point, checks if we are dealing with a quoted function,
ie. something of the form #'foo
(or
(function foo)
, which is what the reader expands the former
into). As discussed above, what this actually does depends on the Lisp
system. What we want to do is check if foo
is to be
substituted with some other symbol. We know the function will be
sharp-quoted again, so the #'
must remain. This is done
using quasi-quoting. For those unfamiliar, this is a kind of quoting
that has little holes we evaluate expressions in. We introduce a
quasi-quoted term with a backtick (`), and the holes with a comma
(,
). The following two are equivalent:
list has) ,(+ 2 2) items)
`(this (
≡list 'this (list 'list 'has) (+ 2 2) 'items) (
Among other things, we note that
`',foo
≡quote ,foo)
`(
≡list 'quote foo) (
and analogously
`'#,foo
≡function ,foo)
`(
≡list 'function foo) (
This is precisely what happens in the function
clause.
As to what is being replaced, we will consider in the next
section:
consp form)
((cons
(let ((pred (get (car form) prop)))
(or (and pred (funcall pred form data))
(car form)))
(mapcar subst (cdr form)))) (
The rule here is simple: For any cons-cell, which represents a function call, we try and replace the function symbol and apply the substitution procedure to all arguments. This mirrors function application in a Lisp-2.
So we finally arrive at the point where we figure out what is being
replaced. This is where symbol properties enter the game: We will be
using the symbol in the function slot6 to query a property that
we constructed earlier using
(intern (format "subst-%S" mode))
. The property might hold
a function that we invoke to query what the function slot is to be
replaced with. If either no function was found or the function indicates
that nothing is to be replaced, we re-insert the initial symbol into our
new tree.
So what we conclude, is that this is just a framework for function
substitution. How and what is replaced, is part of the input,
specifically the mode
argument. We’ll see a few examples
soon.
(form))))mapcar subst body)))) (macroexp-progn (
Finally, we have to start somewhere. We make use of
macroexp-progn
. If we check the macro signature, we see
that body
will contain a list of all arguments, which is
not a valid expression in itself. It is the difference between
(+ 1 1)
and ((+ 1 1))
. Since the macro has an
“implicit progn
”, we will expand to have an actual
progn
if necessary. The function
macroexp-progn
takes care of that by either prepending a
progn
to the list, or by just returning the
car
if the list consists of a single element.
Given this input expressions, we then proceed to invoke the substitution procedure discussed above, and might return a modified AST, which is also the return value of the macro – that is to say the code expansion.
Going back to the first example: Emacs used to be bundled with a “Common Lisp” compatibility library, that wasn’t really a “Common Lisp” compatibility library, but just attempts at reimplementing various Common Lisp functions in Emacs Lisp.
A while back the decision was made to deprecate this library,
cl.el
in favour of cl-lib.el
, as a
compromise(Monnier and
Sperber 2020, 74:38). It provides the same functionality, but
you have to prefix every function and macro with cl-
to
avoid namespaceing conflicts with its close relative, Emacs Lisp.
I have to admit, that there are certain cases where I’d be fine with
stripping away the cl-
prefixes. Especially when a section
combines multiple common lisp compatibility functions and macros into
one.
Initially I had assumed that the shorthand system, that was introduced in Emacs 28 – more hastily than carefully in my own opinion – would help me out. But it turns out that this can only expand a shorter, non-empty prefix into something else. It cannot be used to rename symbols entirely.
The topic of this text can help us remedy the inconvenience. To do
so, we first pick a symbol that will be used to indicate what kind of
substitution is taking place: We pick the intuitive cl
.
Next we iterate over all define symbols in the default obarray and find
those that begin with cl-
(taking advantage of symbol
introspection). These are tagged with the symbol property
subst-cl
mapping to a constant function, that will return
the respective prefixed form:
require 'cl-lib)
(
(mapatomslambda (sym)
(let ((name (symbol-name sym)))
(when (string-match "\\`cl-\\([^-].*\\)" name)
(intern (match-string 1 name))
(put (lambda (_form _data) sym)))))) 'subst-cl (
After having evaluated the code, we can try out a small text first:
funcall (get 'first 'subst-cl) 'form 'data) ;=> cl-first (
as expected. Now to the initial example can be tested:
macroexpand-1
(nil
'(subst-func-symbols cl every #'plusp (loop for i from 1 to 10 collect i))))
(;; => (cl-every #'cl-plusp (cl-loop for i from 1 to 10 collect i))
Just as was intended.
For the sake of convenience, we might also wrap this in yet another macro:
defun common-lisp (&rest body)
(nil ,@body)) `(subst-func-symbols cl
Another example where this could have been of use is ELPA package Compat, a backwards
compatibility library for Emacs Lisp that I happen to maintain. In fact,
the first prototype of this macro was mentioned on the development
mailing list. The issue we were dealing with is that the library
provides certain prefixed definitions, to avoid overwriting existing
functions and macro7. E.g. in Emacs 26, the traditional,
dyadic function assoc
“acquired” a new optional argument
testfn
, comparable to Common Lisp’s
:test
keyword for assoc
. One could use advice
over override the default definition, but that would be too invasive for
a package that is installed by default as a dependency on many systems
nowadays.
Instead one might use a macro like this to perform the substitutions,
similar to what was previously done with cl
. The difference
here would be that we could make use of that “data” argument to indicate
the maximum version we wish to target:
macroexpand-1
(27
'(subst-func-symbols compat assoc some-key some-alist #'some-comparator)))
(;; => (compat--26-assoc some-key some-alist #'some-comparator)
;; or
;; => (assoc some-key some-alist #'some-comparator)
What the exact expansion will be, depends on the current version of
Emacs. If the proper definition of assoc
is provided OOTB,
we would want to use that, since as a core function, it will be more
efficient than our replica. Otherwise we will fall back to some function
we chose depending on the input and the data argument (in this case the
value 27
).
Note that the decision was ultimately made in favour of a simpler system, that wouldn’t involve the kind of traversal that the macro implements
So that is that. I found this an interesting investigation of how various Lisp-2 features could be used and how they relate to Lisp and Programming in general.
In conclusion, I have no idea who the target audience of this text is supposed to have been.
Strictly speaking this is an
oversimplification, as ((lambda (foo) (list foo foo)) 'bar)
is valid and works like a let
.↩︎
As an example, I’ve recently been
hacking on Agda’s Emacs mode. One interesting fact about Agda’s Emacs
mode is that it currently just receives strings from the
interpreter/compiler, reads these in and passes them straight to
eval
. Possibly risky, but luckily there is a comment that
agrees with me. My proposal will be to add a declaration to
defun-declarations-alist
that first of all points out that
this function may be invoked by Agda and what “type” (CL not proper
typing) the arguments are to have. The arguments are also not
evaluated.↩︎
Note that the “macro-expansion phase”
does not have to strictly precede the “evaluation phase” with a break in
between. If we dynamically evaluate a term that includes a macro, the
two will happen directly after one another. The NEWS file for Emacs 24
(etc/NEWS.24
) has an interesting notice on this topic under
the heading “Emacs tries to macroexpand interpreted (non-compiled) files
during load.”↩︎
Note that named-let
has
a simple form of TCO-optimisation. If you are interested in how this is
done, you can see the implementation in Compat↩︎
For Emacs Lisp the macro will attempt
to expand a letrec
to let*
if there are no
recursive bindings, so bar
and qux
are best to
be understood as metavariables↩︎
As pointed out before, this is
actually not a valid assumption. The function slot may also contain a
(lambda ...)
expression. Worse yet, when considering the
previous case with sharp-quoted symbols, Emacs Lisp permits
#'(foo bar)
while Common Lisp would raise an error.↩︎
Which would be possible. You can even
overwrite core functionality, just try evaluating
(fset 'eval #'identity)
and see how Emacs will slowly
deteriorate↩︎