Sunday, 16 June 2024
For the past few days I have had a lot of fun polishing a small program written in C. I call it “Trip” and the idea is to automatically intercept specific Libc functions and simulate their failure.
The background for writing a tool like this, is that I am a TA for a systems-programming course at my university. All assignments are to be written in C, and we place emphasis on robustness and portability. My intention is to make it easier to provoke specific errors that would usually only occur when the operating system refuses or is incapable of providing the (finite) resources it virtualises (like memory, time and access to peripheral devices).
One example, that is interesting when discussion the example of how a
Unix/CLI shell is implemented, is what happens when the
fork(2)
system call fails — and what the appropriate way to
handle this error is. Out of habit, a lot of people would just write
= fork();
pid_t pid if (pid == -1) {
("fork");
perror(EXIT_FAILURE);
exit}
/* ... */
But running into this case is tricky, as fork
usually
only fails when the number of (dead or alive) processes on a system is
too high, which is a generally uncomfortable situation.
So the problem I am interested in is a comfortable way to
intentionally but safely provoke fork
to
fail.
One could use prlimit -u
to limit the maximal number of
processes or GDB’s return
command to modify the process
state and return the “right” value indicating an error had occurred1. While feasible, it is not what I
wanted.
Instead, what I came up with looks like this:
icterid$ trip fork:0.5 bash # after this point we are in a new shell:
icterid$ date
bash: fork: Cannot allocate memory
icterid$ date
Sun Jun 16 03:59:21 PM CEST 2024
icterid$ date
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
Sun Jun 16 03:59:28 PM CEST 2024
This demonstrates the behaviour of Bash, when on average
fork(2)
were to fail 50% of the time. As you can see, it
might set errno
to ENOMEM
, in which case Bash
gives up immediately. Sometimes it works, and if errno
is
set to EAGAIN
then it tries to handle the request a few
times, and in this case eventually succeeding. In the last case, it
waits for a bit between attempts, which makes sense as the number of
processes on a system is time-dependent and will hopefully decrease.
Additionally, we could also list multiple functions that could fail,
give these different chances of failure and fix specific
errno
values that we want to provoke.
I think that this is a simple and neat tool that can help students think about what robust error handling entails.
My intention behind writing this text isn’t to advertise “Trip”. The core functionality isn’t what is fun to work on. What I want to highlight, are some of the inane decisions I insisted on following through, that led me to work on the program for over three years (on and off) before getting it into a functional state.
As mentioned above, one idea would be to take a debugger-like
approach and use something like ptrace
to control and
manipulate a program.
But I did not do that, instead I use the LD_PRELOAD
hack. By setting this environmental variable, we can instruct the
dynamic linker/loader (ld.so
) to look up the definition of
certain symbols in a shared object of our own, before it checks the
usual places, such as the actual standard library
(libc.so
).
The reasonable approach would be to have one executable and one
shared object, and have the executable set LD_PRELOAD
to
point to the shared object before exec
uting the intended
program (e.g. bash
in the above example).
But no, then I’d have to figure out where to find the shared object and the user would have to install two files! Just imagine all the wasted disk space of having a multi kilobyte ELF headers lying around. Entirely unreasonable!
Instead, I want one file, which is both an executable file, and a
shared object (seeing as both are just ELF files with different “Types”,
DYN
vs EXEC
). An example of this already
existing in some form is the aforementioned shared object for the C
standard library:
icterid$ /usr/lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.36-9+deb12u7) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
Glibc does this by setting a custom entry point.
Instead Trip compiles a regular position
independent executable, which should be sufficient for
ld.so
to preload, but it is not. If you try that, you get
the error message
$ LD_PRELOAD=.../preload-this ./execute-this
ERROR: ld.so: object '.../preload-this'' from LD_PRELOAD cannot be preloaded (cannot dynamically load position-independent executable): ignored.
this is the output ouf ./execeute-this.
...
which is not what I want. Luckly, due to a helpful
program by Yubo Xie, I found a trick to circumvent this draconian
ordinance. And it involves editing the ELF file generated by GCC with
the intent of flipping a single bit (DF_1_PIE
). With this
change, ld.so
doesn’t object, while the file
remains executable. So that means I can preload the executable being
executed as it exec
utes some other program. This brings me
to the next step,
This is not that interesting, but worth mentioning
nevertheless. The issue is that LD_PRELOAD
requires an
absolute path to the shared object file it should use. But using
argv[0]
I can only determine what the string was that
invoked Trip.
Seeing as LD_PRELOAD
is not portable2, we
can use a Linux-specific hack to determine what the file of the program
currently being executed. This involves the /proc
filesystem, specifically the /proc/self
directory.
As you might know, each Process with some PID has a
/proc/[PID]/
directory with process metadata, such as the
command line arguments, the environmental variables, file descriptors,
and so on exposed using the file system interface. The
/proc/self/
is a shorthand for the /proc/[PID]
directory of the current process. And in this directory, there is a
symbolic link called exe
, that points to the file we are
looking for.
So the absolute path is just a readlink(2)
away, that we
can use to construct the environmental assignment that we realise using
execvpe(2)
, a GNU extension of execvp
that
allows passing a list of "VAR=VALUE"
strings to specify the
environment right after exec
uting the new file (we use
getopt(3)
to parse command line options, hence the
optint
):
(argv[optind], argv + optind, (char*[]) {preload, conf, NULL}); execvpe
What’s conf
? What we did not discuss up until now, is
how we preserve state between Trip “the executable” and Trip “the
library”. That is were we use conf
, a special environmental
variable encoding the options specified when invoking Trip.
____TRIP_CONFIGURATION
We are too deep into this rabbit-hole to give up now. There is no going back. Once again, there will be an obvious and necessary decision, though it might offend a few of the bluenoses amongst us.
As a reminder, we invoke Trip with a list of functions, optionally
specifying a chance it should “trip” (by default
)
and optionally fixing a specific errno
value to set
(otherwise this will also decided on randomly from the list of known
errno
values for the function, more on that below).
E.g.,
mkdir:0.25:ELOOP
says that mkdir(2)
will fail 25% of the time, but when
it does it will set errno
to ELOOP
(“Too many
levels of symbolic links”).
Environmental variables give us a C string to encode the value. So no
NUL-bytes! This means we cannot just encode arbitrary binary data, such
as a struct
containing a floating point value, that
might include a NUL-byte on
char
level.
Instead, we will to pass a list of triples, respectively specifying,
%a
), to use a radix with a higher information density – if
we disregard the 0x
prefix…errno
value, also in hexadecimal, or 0
to indicate an absence.How do we encode the triple? By using ASCII group separators of
course. ascii(7)
tells us these 35 in octal encodes these.
And the list? Isn’t it obvious… By using ASCII record separators,
naturellement (36 in octal). As these aren’t pretty to write,
we define two macros,
#define GS "\035" /* group separator */
#define RS "\036" /* record separator */
where the octal encoding of arbitrary charachters comes to use (a lot
of people are just familiar with the one special case, '\0'
to encode the NUL-byte, which is next to chmod
probably one
of the most common places where octal notation is used – be it that it
coincides with most other bases).
To construct the value itself, I’d like to use
printf(3)
-style format strings, to make parsing the value
easier, but now much memory will I need? Just any arbitrary constant
won’t do it! So I’ll allocate it dynamically, but here again, a mere
malloc
is so pedestrian. Instead using
open_memstream(3)
we write into a FILE*
that
will construct a string of sufficient size as needed.
Now we can write,
(h, "%s" GS "%a" GS "%x" RS,
fprintf[i].name, entries[i].chance,
entries[i].error) entries
which also makes use of C’s string concatenation of literals, as
GS
and RS
are just strings that are
substituted by the preprocessor.
Now we have encoded the intended state of the library. As discussed,
we set this while exec
ing and later on will decode it when
the library is initialised. But how does that happen? Some brief
comments on the topic follow below.
As we are relying on LD_PRELOAD
, all the functions that
we intend to support, will have a little stump which either will invoke
the function in the usual fashion or simulate a failure. To decide which
of the two we should do, Trip exposes a little function called
____trip_should_fail
(the _
s indicate that it
is very internal).
Basically it checks the configuration and rolls a dice if necessary.
But to do this, we first have to parse the configuration. To this end,
begin ____trip_should_fail
by calling the sensibly named
function called init
.
But wait, seeing the invocation of this function depends on the
execution of the tripped program, which might involve multiple threads
or otherwise concurrent execution, what happens when two functions wish
to check a yet-uninitialised ____trip_should_fail
contemporaneously? Why, that could mean that the internal state might be
initialised twice over, leaving us in an inconsistent disarray.
We might use a kind of “lock”, e.g. pthread_mutex
but
consider the costs of depending on yet another library, not to
speak of the enormous overhead that passive locking effectuates!
No, we need something more fancy appropriate. Luckly, C11
has just the thing: atomic_flag
! By statically allocating
such a flag inside init
,
static atomic_flag done = ATOMIC_FLAG_INIT;
we can use atomic_flag_test_and_set
to ensure that only
one execution will set the flag. The others we keep busy using a naive
spin-lock, we release as soon as the “chosen one” (whoever first
completed atomic_flag_test_and_set
) parses the state and
unlocks the spin-lock.
A nice of example of non-blocking synchronisation. I just think there is something neat about it.
errno
One thing I omitted to mention above is how we convert a symbol
representation of an errno
value, such as
ELOOP
, EAGAIN
, ENOENT
, … into a
hexadecimal number. The last part is easy, as soon as I have the
numerical representation we use the %x
format string (we
have seen that part already). The more interesting part is how the
rest.
Here, some might argue that I could just define an associative array, of the form
#define ENTRY(e) { .name = #e, .val = e },
struct errno_names {
char *name;
int val;
} names[] = {
(ELOOP),
ENTRY(EAGAIN),
ENTRY(ENOENT),
ENTRY/* ... */
}
but where’s the fun in that? Instead, how about this? Each time we
want to look up what ELOOP
does, we manually invoke
cpp
(the standalone preprocessor) and just write the value
we wish to expand into its standard input? And what better tool for that
than popen(3)
, fopen(3)
’s lesser known
relative (with alleged ties to system(3)
)?
But the issue is, that with popen(3)
, we can only read
or write, quote the Linux manpage,
Since a pipe is by definition unidirectional, the type argument may specify only reading or writing, not both; the resulting stream is correspondingly read-only or write-only.
Pity, the unimaginative and ill-willed might say.
No, I retort, there is a way! Remember, gentle reader that
popen
takes a shell command. We can easily include a pipe
in there, and with a little echo
inject the input into the
command being executed. So in effect, to resolve ELOOP
, we
execute
echo "ELOOP" | cpp -include errno.h
and then read the output, until we encounter a lone number, that is the value we were looking for! We just have to skip over some noise to get to this point (try executing the above yourself, and be surprise to see what you see).
There remains a nibble to quibble! Where do you get the memory
for that command, which you pass to popen(2)
, the
irksomely inquisitive inquire inappropriately. Why, we just allocate it
on the stack! After all, if you invoke snprintf
the right
way, it will tell you how much space a format string will require. We
reserve the requisive space with a simple VLA, and bada-bing
bada-bum, done.
But do we really want to invoke sprintf
ourselves
twice? Here’s the shtick: We don’t. Instead, we put on
our intermediate macrology hats and summon this construct:
FILE *cpp;
(cmd, "echo \"%s\" | cpp -include errno.h", name) {
$sprintf= popen(cmd, "r");
cpp }
Here, $sprintf
is a macro (we needlessly make use of the
fact that dollar-signs are (apparently3)
valid constituents of identifiers), which will format some data into a
local variable called cmd
, visible only in the subsequent
block. How done this? Using the old trick, that expands the above to a
for
-loop that executes exactly once. In effect, it would
look something like this:
FILE *cpp;
for (char cmd[snprintf(NULL, 0, "...", name) + 1], c = 1;
== 1 && (snprintf(buf, sizeof buf, "...", name), 1);
c = 0) {
c = popen(cmd, "r");
cpp }
Note that the second sprintf
is executed only once, due
to the short-circuiting of &&
and that we disregard
the return value of sprintf
even though tisn’t
necessary.
In Reality, we add some __COUNTER__
magic on top to
avoid variable capture (a problem which we do not encounter), which
requires us to glue together new symbols. Business as usual for the
hat-wearers.
We were adroitly reticent on a crucial matter, up until now. But behold, as all cards are now exposed, for all to see, for all detest and abhore!
I never really thought about it, but there is no easy,
machine-readable way to figure out what errno
values a
function might set. It is not encoded in the type system (if we might
even speak of such a thing), and parsing it out of the man pages is not
something I’d like to imagine doing reliably. So I must satisfy myself
by manually writing these facts down. And this turns out to be the
bottleneck of the entire operation.
We write these down, in a custom file format. In effect, these just list blocks of this form:
errno ENOMEM
fail NULL
name strdup
params const char *s
return char *
separated by empty lines. How does this help us? Easy, we use a little bit of AWK to convert these into C files. But do I really want to write C code in AWK? Perhaps, but not today (or whenever I wrote this initially). So instead, I define a little macro in C and convert the block into an expression that the macro will use. That will look like this:
(char *, strdup, (const char *s), ( s), (NULL), ENOMEM) DEF
When compiling the definition, this will expand to the kind of
function we described above: This will be our definition of
strdup
, which LD_PRELOAD
injects into the real
executable and calls ____trip_should_fail
from!
And while we are at it, the most devilishly delightful stratagem
remains: We need a table of all functions inside of the
trip
module itself, among other things to implement the
-l
flag that lists all supported functions. And while we
could create a temporary file with the requisite data, and
#include
that as usual, why bother our destitute disk and
not just inject what we need right into the array?
/* list of known commands */
#define DEF(ret, name, params, args, fail, ...) { #name, { __VA_ARGS__ }},
static struct entry_name {
char *name;
int errs[256/sizeof(int)-sizeof(char*)]; /* adjust if necessary */
} names[] = {
#include "/dev/stdin"
};
#undef DEF
After a moment, I am certain everyone will realise that including
from /dev/sdtdin
means that we are reading from the
standard input of the compiler herself! But what will the input be?
Here, as we are knee-deep into GNU-isms we use a target-specific
variable assignment, that will take effect only when compiling
trip.c
to trip.o
:
trip.o: CC := grep -h '^DEF' $(GENSRC) | sort -k2,2d -t, | $(CC) \
-DCOMPILER="\"$(shell $(CC) --version | sed 1q)\""
Where GENSRC
are the list of files containing
DEF
forms. Yes, we grep
through all of these,
sort by the function name (as we of course wish to use
bsearch(3)
later on when looking up an entry) and then pipe
this into the C compiler4. As prior to the
#include
statement, we have re-defined DEF
, we
will discard the uninteresting arguments and only use what we need to
construct the table in a single go.
The best part about this is watching the output of
make
:
$ time make -j8
awk -f gen.awk db/stdio.db > db/stdio.c
awk -f gen.awk db/stdlib.db > db/stdlib.c
awk -f gen.awk db/string.db > db/string.c
awk -f gen.awk db/sys-socket.db > db/sys-socket.c
awk -f gen.awk db/unistd.db > db/unistd.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -ldl -fanalyzer -fsanitize=undefined,nonnull-attribute -fhardened fix-pie.c -o fix-pie
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o db/string.o db/string.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o db/stdio.o db/stdio.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o db/stdlib.o db/stdlib.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o db/sys-socket.o db/sys-socket.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o db/unistd.o db/unistd.c
grep -h '^DEF' db/stdio.c db/stdlib.c db/string.c db/sys-socket.c db/unistd.c | sort -k2,2d -t, | cc -DCOMPILER="\"cc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4)\"" -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer -c -o trip.o trip.c
cc -o trip db/stdio.o db/stdlib.o db/string.o db/sys-socket.o db/unistd.o trip.o -ldl -fanalyzer -fsanitize=undefined,nonnull-attribute -fhardened
./fix-pie trip
real 0m1.177s
user 0m1.504s
sys 0m0.170s
Another fun thing is that this breaks LSP servers…
And there we have it. These were the most fun highlights of my little
hobby program. There are a few more points of interest, such as
function-internal typedef
s, a custom assert
implementation, and a lagged Fibonacci random number generator taken
from TAOCP,
just to quote Knuth in a comment, the persistent usage of
dprintf
and user-dependent installation destinations, but
those interested in that can take a look on their own. I’m done
explaining my code – as the saying goes, if it was hard to write, it
should be hard to read!
Reflecting on the program, and having reminded myself of the internal structure over the last few days, I do think that “polishing” is the right word to describe what I enjoy. Compared to CS homework assignments, competitive programming or the hacks that keep my servers alive, the changes I make to this program are increasingly insignificant and the commit messages are progressively elaborate. I get to pin-point my focus on as little of a fragment of the code as I wish, and pointlessly rewrite it to no semantic effect. While at it, I get to try out ideas I had but no opportunity to apply and to skim manuals and standards for inspiration on what I could do.
It is obvious even to me that this isn’t everyone’s idea of a fun weekend, but for me there is something to it — and I don’t just mean the unvarying grin on my face as I explain to a visibly disturbed person how Trip works — especially if I know that I actually should be doing other stuff.
Update (26Aug24): As pointed out on Hacker News, it is possible to replicate what I am doing with strace↩︎
E.g. Linux takes a space-delimited list of files, while OpenBSD takes a colon-delimited list, and macOS/Darwin/XNU has an entirely different variable and file format↩︎
Edit: Turns out that
this is not true, GCC accepts $
but ISO C doesn’t require
this to be the case.↩︎
The macro definition
COMPILER
makes use of the surprising fact that most
compilers, even TCC support a long --version
flag. We use
this, instead of GCC’s __VERSION__
, as it appears to be
more informative↩︎