Consistently Making Wrong Decisions Whilst Writing Recreational C

Sunday, 16 June 2024

For the past few days I have had a lot of fun polishing a small program written in C. I call it “Trip” and the idea is to automatically intercept specific Libc functions and simulate their failure.

The background for writing a tool like this, is that I am a TA for a systems-programming course at my university. All assignments are to be written in C, and we place emphasis on robustness and portability. My intention is to make it easier to provoke specific errors that would usually only occur when the operating system refuses or is incapable of providing the (finite) resources it virtualises (like memory, time and access to peripheral devices).

One example, that is interesting when discussion the example of how a Unix/CLI shell is implemented, is what happens when the fork(2) system call fails — and what the appropriate way to handle this error is. Out of habit, a lot of people would just write

pid_t pid = fork();
if (pid == -1) {
  perror("fork");
  exit(EXIT_FAILURE);
}
/* ... */

But running into this case is tricky, as fork usually only fails when the number of (dead or alive) processes on a system is too high, which is a generally uncomfortable situation.

So the problem I am interested in is a comfortable way to intentionally but safely provoke fork to fail.

One could use prlimit -u to limit the maximal number of processes or GDB’s return command to modify the process state and return the “right” value indicating an error had occurred1. While feasible, it is not what I wanted.

Instead, what I came up with looks like this:

icterid$ trip fork:0.5 bash # after this point we are in a new shell:
icterid$ date
bash: fork: Cannot allocate memory
icterid$ date
Sun Jun 16 03:59:21 PM CEST 2024
icterid$ date
bash: fork: retry: Resource temporarily unavailable
bash: fork: retry: Resource temporarily unavailable
Sun Jun 16 03:59:28 PM CEST 2024

This demonstrates the behaviour of Bash, when on average fork(2) were to fail 50% of the time. As you can see, it might set errno to ENOMEM, in which case Bash gives up immediately. Sometimes it works, and if errno is set to EAGAIN then it tries to handle the request a few times, and in this case eventually succeeding. In the last case, it waits for a bit between attempts, which makes sense as the number of processes on a system is time-dependent and will hopefully decrease.

Additionally, we could also list multiple functions that could fail, give these different chances of failure and fix specific errno values that we want to provoke.

I think that this is a simple and neat tool that can help students think about what robust error handling entails.

The horribly pretty and the pretty horrible

My intention behind writing this text isn’t to advertise “Trip”. The core functionality isn’t what is fun to work on. What I want to highlight, are some of the inane decisions I insisted on following through, that led me to work on the program for over three years (on and off) before getting it into a functional state.

The core mechanism

As mentioned above, one idea would be to take a debugger-like approach and use something like ptrace to control and manipulate a program.

But I did not do that, instead I use the LD_PRELOAD hack. By setting this environmental variable, we can instruct the dynamic linker/loader (ld.so) to look up the definition of certain symbols in a shared object of our own, before it checks the usual places, such as the actual standard library (libc.so).

The reasonable approach would be to have one executable and one shared object, and have the executable set LD_PRELOAD to point to the shared object before executing the intended program (e.g. bash in the above example).

But no, then I’d have to figure out where to find the shared object and the user would have to install two files! Just imagine all the wasted disk space of having a multi kilobyte ELF headers lying around. Entirely unreasonable!

Instead, I want one file, which is both an executable file, and a shared object (seeing as both are just ELF files with different “Types”, DYN vs EXEC). An example of this already existing in some form is the aforementioned shared object for the C standard library:

icterid$ /usr/lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.36-9+deb12u7) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 12.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

Glibc does this by setting a custom entry point.

Instead Trip compiles a regular position independent executable, which should be sufficient for ld.so to preload, but it is not. If you try that, you get the error message

$ LD_PRELOAD=.../preload-this ./execute-this
ERROR: ld.so: object '.../preload-this'' from LD_PRELOAD cannot be preloaded (cannot dynamically load position-independent executable): ignored.
this is the output ouf ./execeute-this.
...

which is not what I want. Luckly, due to a helpful program by Yubo Xie, I found a trick to circumvent this draconian ordinance. And it involves editing the ELF file generated by GCC with the intent of flipping a single bit (DF_1_PIE). With this change, ld.so doesn’t object, while the file remains executable. So that means I can preload the executable being executed as it executes some other program. This brings me to the next step,

Preloading onself

This is not that interesting, but worth mentioning nevertheless. The issue is that LD_PRELOAD requires an absolute path to the shared object file it should use. But using argv[0] I can only determine what the string was that invoked Trip.

Seeing as LD_PRELOAD is not portable2, we can use a Linux-specific hack to determine what the file of the program currently being executed. This involves the /proc filesystem, specifically the /proc/self directory.

As you might know, each Process with some PID has a /proc/[PID]/ directory with process metadata, such as the command line arguments, the environmental variables, file descriptors, and so on exposed using the file system interface. The /proc/self/ is a shorthand for the /proc/[PID] directory of the current process. And in this directory, there is a symbolic link called exe, that points to the file we are looking for.

So the absolute path is just a readlink(2) away, that we can use to construct the environmental assignment that we realise using execvpe(2), a GNU extension of execvp that allows passing a list of "VAR=VALUE" strings to specify the environment right after executing the new file (we use getopt(3) to parse command line options, hence the optint):

execvpe(argv[optind], argv + optind, (char*[]) {preload, conf, NULL});

What’s conf? What we did not discuss up until now, is how we preserve state between Trip “the executable” and Trip “the library”. That is were we use conf, a special environmental variable encoding the options specified when invoking Trip.

____TRIP_CONFIGURATION

We are too deep into this rabbit-hole to give up now. There is no going back. Once again, there will be an obvious and necessary decision, though it might offend a few of the bluenoses amongst us.

As a reminder, we invoke Trip with a list of functions, optionally specifying a chance it should “trip” (by default P(X)=1P(X) = 1) and optionally fixing a specific errno value to set (otherwise this will also decided on randomly from the list of known errno values for the function, more on that below).

E.g.,

mkdir:0.25:ELOOP

says that mkdir(2) will fail 25% of the time, but when it does it will set errno to ELOOP (“Too many levels of symbolic links”).

Environmental variables give us a C string to encode the value. So no NUL-bytes! This means we cannot just encode arbitrary binary data, such as a struct containing a floating point value, that might include a NUL-byte on char level.

Instead, we will to pass a list of triples, respectively specifying,

  1. The function to trip
  2. A chance encoded by a floating-point value serialised in hexadecimal (%a), to use a radix with a higher information density – if we disregard the 0x prefix…
  3. An errno value, also in hexadecimal, or 0 to indicate an absence.

How do we encode the triple? By using ASCII group separators of course. ascii(7) tells us these 35 in octal encodes these. And the list? Isn’t it obvious… By using ASCII record separators, naturellement (36 in octal). As these aren’t pretty to write, we define two macros,

#define GS "\035"           /* group separator */
#define RS "\036"           /* record separator */

where the octal encoding of arbitrary charachters comes to use (a lot of people are just familiar with the one special case, '\0' to encode the NUL-byte, which is next to chmod probably one of the most common places where octal notation is used – be it that it coincides with most other bases).

To construct the value itself, I’d like to use printf(3)-style format strings, to make parsing the value easier, but now much memory will I need? Just any arbitrary constant won’t do it! So I’ll allocate it dynamically, but here again, a mere malloc is so pedestrian. Instead using open_memstream(3) we write into a FILE* that will construct a string of sufficient size as needed.

Now we can write,

fprintf(h, "%s" GS "%a" GS "%x" RS,
        entries[i].name, entries[i].chance,
        entries[i].error)

which also makes use of C’s string concatenation of literals, as GS and RS are just strings that are substituted by the preprocessor.

Now we have encoded the intended state of the library. As discussed, we set this while execing and later on will decode it when the library is initialised. But how does that happen? Some brief comments on the topic follow below.

Break on through to the other side of the moon

As we are relying on LD_PRELOAD, all the functions that we intend to support, will have a little stump which either will invoke the function in the usual fashion or simulate a failure. To decide which of the two we should do, Trip exposes a little function called ____trip_should_fail (the _s indicate that it is very internal).

Basically it checks the configuration and rolls a dice if necessary. But to do this, we first have to parse the configuration. To this end, begin ____trip_should_fail by calling the sensibly named function called init.

But wait, seeing the invocation of this function depends on the execution of the tripped program, which might involve multiple threads or otherwise concurrent execution, what happens when two functions wish to check a yet-uninitialised ____trip_should_fail contemporaneously? Why, that could mean that the internal state might be initialised twice over, leaving us in an inconsistent disarray.

We might use a kind of “lock”, e.g. pthread_mutex but consider the costs of depending on yet another library, not to speak of the enormous overhead that passive locking effectuates!

No, we need something more fancy appropriate. Luckly, C11 has just the thing: atomic_flag! By statically allocating such a flag inside init,

static atomic_flag done = ATOMIC_FLAG_INIT;

we can use atomic_flag_test_and_set to ensure that only one execution will set the flag. The others we keep busy using a naive spin-lock, we release as soon as the “chosen one” (whoever first completed atomic_flag_test_and_set) parses the state and unlocks the spin-lock.

A nice of example of non-blocking synchronisation. I just think there is something neat about it.

As an aside, parsing errno

One thing I omitted to mention above is how we convert a symbol representation of an errno value, such as ELOOP, EAGAIN, ENOENT, … into a hexadecimal number. The last part is easy, as soon as I have the numerical representation we use the %x format string (we have seen that part already). The more interesting part is how the rest.

Here, some might argue that I could just define an associative array, of the form

#define ENTRY(e) { .name = #e, .val = e },
struct errno_names {
char *name;
int val;
} names[] = {
ENTRY(ELOOP),
ENTRY(EAGAIN),
ENTRY(ENOENT),
/* ... */
}

but where’s the fun in that? Instead, how about this? Each time we want to look up what ELOOP does, we manually invoke cpp (the standalone preprocessor) and just write the value we wish to expand into its standard input? And what better tool for that than popen(3), fopen(3)’s lesser known relative (with alleged ties to system(3))?

But the issue is, that with popen(3), we can only read or write, quote the Linux manpage,

Since a pipe is by definition unidirectional, the type argument may specify only reading or writing, not both; the resulting stream is correspondingly read-only or write-only.

Pity, the unimaginative and ill-willed might say. No, I retort, there is a way! Remember, gentle reader that popen takes a shell command. We can easily include a pipe in there, and with a little echo inject the input into the command being executed. So in effect, to resolve ELOOP, we execute

echo "ELOOP" | cpp -include errno.h

and then read the output, until we encounter a lone number, that is the value we were looking for! We just have to skip over some noise to get to this point (try executing the above yourself, and be surprise to see what you see).

There remains a nibble to quibble! Where do you get the memory for that command, which you pass to popen(2), the irksomely inquisitive inquire inappropriately. Why, we just allocate it on the stack! After all, if you invoke snprintf the right way, it will tell you how much space a format string will require. We reserve the requisive space with a simple VLA, and bada-bing bada-bum, done.

But do we really want to invoke sprintf ourselves twice? Here’s the shtick: We don’t. Instead, we put on our intermediate macrology hats and summon this construct:

FILE *cpp;
$sprintf(cmd, "echo \"%s\" | cpp -include errno.h", name) {
    cpp = popen(cmd, "r");
}

Here, $sprintf is a macro (we needlessly make use of the fact that dollar-signs are (apparently3) valid constituents of identifiers), which will format some data into a local variable called cmd, visible only in the subsequent block. How done this? Using the old trick, that expands the above to a for-loop that executes exactly once. In effect, it would look something like this:

FILE *cpp;

for (char cmd[snprintf(NULL, 0, "...", name) + 1], c = 1;
     c == 1 && (snprintf(buf, sizeof buf, "...", name), 1);
     c = 0) {
    cpp = popen(cmd, "r");
}

Note that the second sprintf is executed only once, due to the short-circuiting of && and that we disregard the return value of sprintf even though tisn’t necessary.

In Reality, we add some __COUNTER__ magic on top to avoid variable capture (a problem which we do not encounter), which requires us to glue together new symbols. Business as usual for the hat-wearers.

One more time

We were adroitly reticent on a crucial matter, up until now. But behold, as all cards are now exposed, for all to see, for all detest and abhore!

I never really thought about it, but there is no easy, machine-readable way to figure out what errno values a function might set. It is not encoded in the type system (if we might even speak of such a thing), and parsing it out of the man pages is not something I’d like to imagine doing reliably. So I must satisfy myself by manually writing these facts down. And this turns out to be the bottleneck of the entire operation.

We write these down, in a custom file format. In effect, these just list blocks of this form:

errno   ENOMEM
fail    NULL
name    strdup
params  const char *s
return  char *

separated by empty lines. How does this help us? Easy, we use a little bit of AWK to convert these into C files. But do I really want to write C code in AWK? Perhaps, but not today (or whenever I wrote this initially). So instead, I define a little macro in C and convert the block into an expression that the macro will use. That will look like this:

DEF(char *, strdup, (const char *s), ( s), (NULL), ENOMEM)

When compiling the definition, this will expand to the kind of function we described above: This will be our definition of strdup, which LD_PRELOAD injects into the real executable and calls ____trip_should_fail from!

And while we are at it, the most devilishly delightful stratagem remains: We need a table of all functions inside of the trip module itself, among other things to implement the -l flag that lists all supported functions. And while we could create a temporary file with the requisite data, and #include that as usual, why bother our destitute disk and not just inject what we need right into the array?

/* list of known commands */
#define DEF(ret, name, params, args, fail, ...) { #name, { __VA_ARGS__ }},
static struct entry_name {
    char *name;
    int errs[256/sizeof(int)-sizeof(char*)]; /* adjust if necessary */
} names[] = {
#include "/dev/stdin"
};
#undef DEF

After a moment, I am certain everyone will realise that including from /dev/sdtdin means that we are reading from the standard input of the compiler herself! But what will the input be? Here, as we are knee-deep into GNU-isms we use a target-specific variable assignment, that will take effect only when compiling trip.c to trip.o:

trip.o: CC := grep -h '^DEF' $(GENSRC) | sort -k2,2d -t, | $(CC) \
              -DCOMPILER="\"$(shell $(CC) --version | sed 1q)\""

Where GENSRC are the list of files containing DEF forms. Yes, we grep through all of these, sort by the function name (as we of course wish to use bsearch(3) later on when looking up an entry) and then pipe this into the C compiler4. As prior to the #include statement, we have re-defined DEF, we will discard the uninteresting arguments and only use what we need to construct the table in a single go.

The best part about this is watching the output of make:

$ time make -j8
awk -f gen.awk db/stdio.db > db/stdio.c
awk -f gen.awk db/stdlib.db > db/stdlib.c
awk -f gen.awk db/string.db > db/string.c
awk -f gen.awk db/sys-socket.db > db/sys-socket.c
awk -f gen.awk db/unistd.db > db/unistd.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer  -ldl -fanalyzer -fsanitize=undefined,nonnull-attribute -fhardened  fix-pie.c   -o fix-pie
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o db/string.o db/string.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o db/stdio.o db/stdio.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o db/stdlib.o db/stdlib.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o db/sys-socket.o db/sys-socket.c
cc -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o db/unistd.o db/unistd.c
grep -h '^DEF' db/stdio.c db/stdlib.c db/string.c db/sys-socket.c db/unistd.c | sort -k2,2d -t, | cc -DCOMPILER="\"cc (GCC) 14.1.1 20240522 (Red Hat 14.1.1-4)\"" -std=c11 -Wall -Wextra -Wformat=2 -Wuninitialized -Warray-bounds -Os -pipe -fsanitize=undefined,nonnull-attribute -fhardened -fanalyzer   -c -o trip.o trip.c
cc -o trip db/stdio.o db/stdlib.o db/string.o db/sys-socket.o db/unistd.o trip.o -ldl -fanalyzer -fsanitize=undefined,nonnull-attribute -fhardened
./fix-pie trip

real    0m1.177s
user    0m1.504s
sys 0m0.170s

Another fun thing is that this breaks LSP servers…

And there we have it. These were the most fun highlights of my little hobby program. There are a few more points of interest, such as function-internal typedefs, a custom assert implementation, and a lagged Fibonacci random number generator taken from TAOCP, just to quote Knuth in a comment, the persistent usage of dprintf and user-dependent installation destinations, but those interested in that can take a look on their own. I’m done explaining my code – as the saying goes, if it was hard to write, it should be hard to read!


Reflecting on the program, and having reminded myself of the internal structure over the last few days, I do think that “polishing” is the right word to describe what I enjoy. Compared to CS homework assignments, competitive programming or the hacks that keep my servers alive, the changes I make to this program are increasingly insignificant and the commit messages are progressively elaborate. I get to pin-point my focus on as little of a fragment of the code as I wish, and pointlessly rewrite it to no semantic effect. While at it, I get to try out ideas I had but no opportunity to apply and to skim manuals and standards for inspiration on what I could do.

It is obvious even to me that this isn’t everyone’s idea of a fun weekend, but for me there is something to it — and I don’t just mean the unvarying grin on my face as I explain to a visibly disturbed person how Trip works — especially if I know that I actually should be doing other stuff.


  1. Update (26Aug24): As pointed out on Hacker News, it is possible to replicate what I am doing with strace↩︎

  2. E.g. Linux takes a space-delimited list of files, while OpenBSD takes a colon-delimited list, and macOS/Darwin/XNU has an entirely different variable and file format↩︎

  3. Edit: Turns out that this is not true, GCC accepts $ but ISO C doesn’t require this to be the case.↩︎

  4. The macro definition COMPILER makes use of the surprising fact that most compilers, even TCC support a long --version flag. We use this, instead of GCC’s __VERSION__, as it appears to be more informative↩︎