Unix Considered Harmful

Friday, 19 April 2019

I’ve been meaning to write on this topic for a while now, but when I recently saw a post about Electron on the fediverse, I finally got a good starting point.

It’s sad, but true. Writing for the web has become more comfortable + easier for the developer, and it seems that with ever more development in the field of browser engines, “most” users don’t really care anymore how many megabytes of javascript are loaded and interpreted.

The question as to how this is related to Unix, I intend to answer in the following text by looking at how the above mentioned development happened: why have we piled layer over layer of abstractions to reach this upside-down computer world we now live in and love to hate?

Failed Primitives

An early Unix™ infomercial by Bell Labs claims

Ken Thompson and Dennis Ritchie were aiming to keep their system simple, and they found a collection of primitives that enabled then to do a great deal with a very few primitives.

These primitives, around which had developed the (so-called) “Unix Philosophy”, are explained to be, among other things, concepts such as:

This can be considered common knowledge among everyone who has used a Unix system in a slightly technical way.

One classical example from Unix folklore, to show how elegant these principles are, was in response to a little problem posed by Jon Bentley, asking for a program to roughly do this:

Calculate the n most used words in a text and output these, sorted by their respective frequencies.

Bentley asked Knuth to write a demonstration for his column in Communications of the ACM, “Programming Pearls”. Knuth then proceeded to write a lengthy, literate WEB program spanning 6 pages, in some versions over 10 pages.

In response to this, Douglas McIlroy, then head of the Computing Techniques Research Department at Bell Labs wrote a simple shell script:

tr -cs A-Za-z '
' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q

If you are unfamiliar with the syntax or the programs used read the above linked article

Although McIlroy doesn’t recommend the use of pipelines to solve this problem (after all: sort(1), used twice, has a worst-case complexity of O(n log(n))), it is most certainly more elegant. I’m even quite sure that most people, upon hearing the problem might come up with a similar solution.

When stepping back to reconsider the solution, one might realise that the problem itself fits rather well into the Unix world. Text (a file) is processed (using pipes) and a list is generated (in the form of lines) – Unix’ decisive advantage is that its primitive datatype is “text”.

In other words: Unix is a glorified typewriter.

An alternative prospective

The standard story we tell each other about Unix goes something along these lines:

When Bell Labs left the Multics Project, Thompson and Ritchie started working on a simpler time sharing operating system for the DEC PDP-7. Over time this became more popular, and was translated into a C, a high-level language that enabled a higher degree of portability, which together with cheap licensing for universities added to the popularity and spread of the Unix system.

Rewriting History

While it’s factually not wrong, we do leave out some relevant information in this narrative. Ritchie himself, while discussing the history of Unix, mentions that part of the “legitimisation” for purchasing a new PDP-11 was Emphasis mine

not merely to write some (unspecified) operating system, but instead to create a system specifically designed for editing and formatting text, what might today be called a “word-processing system”.

In fact, when reading the Unix Reader, specifically an initial section “1.1 The People”, one finds out that quite a lot of people worked or contributed to “word-processing”-related programs and problems:

etc.

Of course it would be an oversimplification to claim that everyone was working on text processing. Besides actual work on the operating system, many worked on other systems such as circuit design (such as Steve Bourne, the inventor of the Bourne shell, bash’s predecessor) or networking.

A prehistoric fossil

Another interesting point it that the above mentioned pipeline-system that Unix is now well known for, wasn’t developed until v4:

Pipes appeared in Unix in 1972, well after the PDP-11 version of the system was in operation, at the suggestion (or perhaps insistence) of M. D. McIlroy, a long-time advocate of the non-hierarchical control flow that characterizes coroutines.

It’s even said that although McIlroy had thought of pipes as a concept for computers as far back as 1964, Ken and Dennis were initially not that interested. But the advent of pipes changed the general aura. Core utilities were redesigned to fit the new paradigm, but interestingly the desk calculator dc(1) stayed the same (Reader, 3.4):

The PDP-11 assembler, a desk calculator dc, and B itself were written in B to bootstrap the system to the PDP-11. Because it could run before the disk had arrived, dc – not the assembler – became the first language to run on our PDP-11.

If you haven’t used dc before, the linked above Wikipedia article gives a good introduction. The main point I want to make is that anyone familiar with Unix will admit that this syntax:

36[d1-d1<F+]dsFxp

is not “Unix-ish” whatever that means – just try guessing what it does keep in mind that this isn’t that “intentionally” obfuscated, see this article for a fuller explanation. It should also be pointed out that this example far exceeds what the original dc was capable of..

These kinds of utilities became ever rarer, as it became more and more common to implicitly assume the abstraction of words or lines over the aforementioned sequence of bytes. Even ed(1), which maybe at first glance uses just as cryptic single-letter commands, enforced one command per line sed(1) could be seen as a popular exception, although not many people use it beyond substitution or other single-command operations..

Another way to argue this point is quote McIlroy, recalling “Thompson saw that file arguments weren’t going to fit with this scheme of things and he went in and changed all those programs in the same night. I don’t know how…and the next morning we had this orgy of one-liners” (source), meaning that the introduction of pipes changed the “core” userland utilities, and the system interface (the shell).

I understand if some people find this example less convincing I’m not even fully satisfied with the argument, but the main points I want to convey are the following:

The point regarding the technical limitations has been discussed elsewhere in the very much related context of C.

So what?

I always like analysing the limitations of a system by looking at its attempts to overcome these, both by internal additions (e.g. symbolic links although Multics already had these) or external (e.g. programs we use instead of shell to interact with the OS).

If you use GNU/Linux, macOS or another BSD derivative on a regular basis, and then decide to play around with an older Unix system, you might be surprised to see how much has changed.

A good way to identify these shortcomings is to ask the dogmatists what they hate. In the case of *nix, they will hasten to demonstrate the absolute depth of their purity without having to necessarily elaborate on every point — it’s obvious after all..

But people have not tried to go “beyond” Unix, not because they are evil or ignorant, but because the above elaborated fundamental assumptions don’t fit all use-cases, or at the very least prove to be a hindrance I experienced for this the first time then attempting to send floating point numbers between two independent programmes and wanting to obey the Unix idioms, had to accept formatting and parseing these into ASCII strings..

Unix in particular is all the more interesting, since people have decided to stick to it and its ideas, while overcoming them internally. Some examples could be:

With this point I return to the first issue I raised, namely that of browser-based frameworks like Electron or React gaining more and more popularity. The lazy way to explain this away would be to say that the teams and programmers employing these tools are themselves just too lazy to bother with proper implementations while social/economic questions, such as time needed to prototype, ship and support are in no shape or form relevant. It is my opinion that this position does not suffice: of course the ability to work with higher level abstractions is more attractive, but the fact that we need to layer abstraction upon abstractions says more about what we see the need to abstract from than where we are abstracting towards.

Some Ideas for Going Forward

It is naturally easier to point out the many shortcomings we can see in the current situation, but harder to way what should be done to fix these some might say impossible.

But not to end on an entirely negative point, I will try to elaborate on tendencies I have observed, and my interpretations of these with a bit more complaining in-between..

The Value of Hypertext

As said above, the World Wide Web has proven the utility of going beyond just plain text that is read without being interpreted, viewed without being modified, and when worked on nothing is hidden between the human and hard disk.

But of course, HTML isn’t perfect. While some might say that it actually suffices all needs and that most people just don’t realise this, the fact and manner in which the web has been extended is strong evidence to the contrary. A few examples I could give would be:

Any non-embedded operating system, meant for users first, has to place the ability to create rich hypertext as an important priority.

On Plaintext

Generally we take a “plain text” format to be preferable to “custom formats”. The reasoning is simple. While we depend on specific or even proprietary programmes to process “non-plain” files, plain text is the lingua franca of any proper program. Any text editor will do. grep(1), awk(1), sed(1)? No problem. Why even ed(1) will do…

We can work with any text, old and yet to come – its the universal format for the universal age of computing.

Well, except for punchcards STRAWMAN!. Its easy to say that this is irrelevant, who uses punchcards now? They are all lost or forgotten, and if not, they are stored in some file cabinets, to be lost or forgotten. But if the argument for plain text should be its eternal nature, the same could have certainly been said for punchcards a few decades ago. While they do share similarities (sequential ordering of characters, as defined by some encoding), they probably differ in every other aspect. The difference ultimately lies in the different kinds of computing environments someone finds themselves in now as compared to the 1960s – I see no reason why in another 50 years, we can with such certainly claim that “plain text” will be default.

But then again, do we really want to accept incomprehensible formats? I can’t say for sure, but if we accept that there is a point in having programmes share memory instead of serialising into plain text and interpreting it again, maybe there could be something in having a protocol to communicate on a “more native” level, if possible?

Unix as a Programming Language

Dogmatists defend Unix as programming environment, yes, even as an “IDE”! But I prefer to look down on it as a rather peculiar programming language.

One-way dynamic scoping (environmental variables), heavy function calls (fork) and the lack of any meaningful type system. If anyone were to introduce a new language and proclaim

Functions in this language can take any amount of variable, all of which are strings actually arrays with special symbols, but what’s the difference, and additionally, there are also undefined – but for the most part just three – character streams that will also exist!

he or she would be ignored, ridiculed or adored by contrarians as studies have shown. But that’s basically what every C programmer accepts when they write

int main(int argv, char *argv[]) { /* ... */

Peculiar.

Yet this isn’t the true tragedy in my eyes. Rather, I’d argue that it is the fact that while the field of programming language theory has developed in various and critical ways over the last 50 years, it is saddening to see how focused operating systems have stayed on more or less the same topics, or at most the attempt to improve speed and security in a set amount of issues. Couldn’t operating systems even bring new possibilities to programming languages, in terms of persistence and hardware exploitation, which we can’t as standard assume from just any programming environment?

One might now say, that an operating system is far less flexible than compared to the field of possibilities a language designer has to her or his disposal – after all, they are not bound to the hardware in the same way as the OS architect – but then again, not only has the world of programming languages been turned on its head multiple times over, but so has the world of hardware, which has developed and changed drastically, allowing not only more of what we previously had less of, but opening entirely new perspectives. Is this not at least an impetus to also reconsider what we should expect from the our dear mediator?

This returns back to my fundamental point, once again. If Unix can be taken to be a “glorified typewriter”, a device that manipulates text on paper, then it’s not surprising we should be content with viewing everything as text, if necessary manually converting back and forth. But what a waste of the potential this is.

In our age, where computers have become so ever-present parts of our lives, it’s far too easy to forget what miraculous devices, contraptions, and inventions they are. While mysteries like P=NP, naming variables and many others aren’t resolved, we still are capable of incredible and previously inconceivable things. In that sense it’s almost insulting. Is the power of the computer merely to turn digital what was previously not? With all due respect to the people and the limits they experienced at their times, is it not embarrassing that we still think and talk about the digital versions of the file cabinets and desks the inventors probably sat next to when inventing their digital analogues? After all, doesn’t everything is a file sound more like something from Kafka, than the maxim of an operating system?

Henry Ford once said “If I had asked people what they wanted, they would have said faster horses,” and in some sense the same, but inverted, could be applied to the world of operating systems. We base ourselves on what we obviously don’t really want or need anymore, and build on top or away from it.

This is why I consider Unix to be harmful – it limits the horizons of what we take computers to be, prohibiting the fascinating exploration of new and long overdue possibilities.

Disclaimers

Operating systems are famously one of the “Holy Wars” of computer science, so writing on this topic is expected to provoke unpleasant responses. I know that because I have been on the other side of the argument not too long ago. My first confrontation with a critique of Unix and Unix-like operating systems provoked negative feelings. After all, I had spent many years teaching myself the commands, learning to read shell scripts, practising the chastity of “minimalism”. The thought of being able to use a operating system from the 1970s, was a blessing. I took pride in resisting configuration of certain programmes or understanding how to use ed(1).

All of this in name of the “Unix Philosophy” I consistently see this to be a misnomer, since there is little proper philosophy going on and more tips on how to cooperate with/subjugate oneself to the environment under which one worked.. Anyone outside it, just couldn’t understand it. After all, didn’t Ritchie say:

UNIX is very simple, it just needs a genius to understand its simplicity.

But over time, I became estranged, and especially my deeper engagement with Emacs over the last year has offered insight into what the limits of Unix could be which is not to say that any Emacsen is limitless or perfect. Although not quite up to date, I’d also recommend reading The UNIX-HATERS Handbook or the original Jargon File, in case that I’ve provoked any interest in the reader to find out more. Even better is what I’m currently interested in is a critical engagement with the history of operating systems, to find out what ideas have been unrightfully forgotten, while others have prevailed for no real reason, and what we can learn from this.

All in all, this text is though not primarily directed at those who work on the Linux kernel, participants in the *BSD projects or operating system researchers. Without any doubt, they know far more about these subjects than I do, and I wouldn’t be too surprised if I would bore them too much if they were to read this. I address the hobbyists and mythologists, the people who reject (for example) Emacs or browsers, because they are “one big programme”, while not questioning the fact than the whole operating system is also just “one big programme” that switches back and forth between address spaces.

While all of this is true, I still am a *nix user myself, and have difficulties using anything else: On most workstations I use, I assume that some kind will Linux to be installed on them. I rarely don’t have a terminal emulator opened on some workspace at the time of writing it was 9 instances, and I have to admit that I quite enjoy using OpenBSD on my server. This site is just a was giant *nix hack. Using any version of Windows on the other hand is most exhausting – I miss my single hierarchy file system, my neat shell tricks and my package manager One reason for Unix advocacy I suspect is a radical rejection of Windows — from experience, most younger or newer users of, for example, Linux have a history with Microsoft’s OS. The step beyond often involves difficulties and annoyances that have to be legitimised — sadly taking the form of dogmatism in many cases.. After all, I have to admit, in the end, all I want is a comfortable computer environment where I never would have to fear the operating system or any program for that matter of becoming an obstacle to my work.


This text has many greyed out, not quite articulated points. I have used these as weaker arguments, pointers, or notes in between, and hope to articulate these when I come around to do so. I am open to criticism or questions, especially if I enjoyed writing something in a more roundabout way than necessary. Please send me a mail or a message if you think you can help.

I’d also like to thank @greg@animal.church for pointing out quite a few spelling and grammatical mistakes.

Comments and discussions can be found on lobste.rs.