My Cursed Setup for Public Bookmarks

I love hyperlinks. So it is not a surprise that I am keen on this WWW thing. There is so much to see, that it is hard to remember, and so easy to forget. Being of the age that I’ve been surfing the web since my early childhood, there is a lot I know I won’t find again — be it due to link rot or the gradual deterioration search engines, as adversarial strategies to climb their ranks become more and more widespread.

I started bookmarking at a young age as well. Somewhen around 2002 or 2003 I distinctly remember being proud of my little bookmark collection I had in Internet Explorer on our family computer. When we later switched to Firefox, we imported them over and I had a sub-directory of the IE links. When Chrome came up we repeated that again, and deciding to switch back to Firefox, the directory structure of my bookmarks was a linked list of four elements. I got into Linux around that time and was re-installing operating systems behind my parents back, and an unfortunate dd command was enough to blow my cover. Luckily we didn’t loose everything, but my little bookmark collection, that had to be 10 years old at that point fell victim to the consequences of overwriting that those first “few” megabytes of the file system. My punishment was that I was promoted to my families system administrator for life 😓.

Either way, I still collect link and find delight in finding just the right kind of web pages. There are different properties that evoke this sensation, and it can be anything from elegant simplicity to baroque detail, though-provoking ideas or clever implementations. Whatever it is, I know I won’t remember it, but I am capable of recalling the necessary information to locate it again. But as you can imagine, these trails often disappear or degenerate.

So back to bookmarking. I never got to use in-browser bookmarking in a reliable way. Social media sites like Reedit or Lobsters are a nuisance to use and have too much unnecessary drama. I tried keeping track of interesting sites in a local text file (which I also used to protocol every web search and what I found out as a consequence of that), but that too is cumbersome and limited to only some devices. My most productive setup up to date was a Org file I had on my old undergraduate university website, that I exported to HTML on a monthly basis.

reveals that I had accumulated some 1500 links. A real hub, I would say. Some good properties included:

Disadvantages include that to modify the list, I had to connect via SSH to a server and that the lack of metadata or just any structured data made it more complicated to change anything.

So after graduating, and moving over to the SDF cluster I decided to rethink my approach. Initially, I played around with a concept that involved Recutils, but I was not satisfied with the usability. This article will document what I came up with: I think it is great, others might be shocked and chagrined by what I will describe, mortified and stupefied by the details of the implementation. But perhaps someone might just like it and use it as well?

While previously I had a .org file that I exported to a .html file, I have now severely complicated the situation. It involves some 6-7 files, give or take. The main ones are:

The CGI script is password protected, and the password is salted and stored in the database. This might not scale in the future, as I currently have to load the entire database just to check the password. If that does end up being an issue, I can store the password separately and only load that.

Initially I intended to have the CGI script have a “read” view and an “edit” view, but then I figured that I would be reading the page a lot more than editing it, so I might has well just export the links from the database only after editing it (which happens a few times a week) to a static file (links.html), and have the static file link back to the main CGI script for interaction.

It would be more traditional to use SQLite or something along those lines to store the data. I decided against that, as encoding/gob makes it trivial to load and dump any object into a io.Reader/io.Writer. The main limitation is that I cannot change the form of the data willy-nilly. To make this less a problem, the “database” is presently a struct with three fields:

type Data struct {
    Links []Link         // an arbitrarily branching tree type.
    Auth  []byte         // contains the Bcrypt'ed password
    Conf  map[string]any // any other miscellaneous data
}

Keeping Conf underspecified is the key for allowing myself flexibility in the future, but it is also something to be weary of in terms of possible complexity – the big advantage I have is that this is just my script and I am the only user. Any inconvenience I might part upon myself is my own fault, which makes myself as the user most understanding.

The CGI script has some basic documentation on the start page, but the main interface is revealed by links pointing from links.html (these are abbreviated to @s besides each node). Requests with a query string of the form ?add=/0/2/1/4 indicate a path through the tree to a specific node. Underneath this node I can add more nodes. Each node may have a title, a “source” link, some commentary (written in Markup, though I might scrap this at some point because Goldmark (a Markdown library written in Go) is currently my only real external dependency, the other one being bcrypt from the extended standard library) and a URL. Of course, the interesting nodes all have URLs, but the non-URL nodes are there to add structure to the page.

The ?add=/x/y/z page also allows me to delete the node (and recursively all nodes below it) or to move it around in the tree. These operations all don’t need to load the database. Editing an existing node actually has to load the existing data, so that is on a separate page (?edit=/x/y/z).

I won’t beat around the bush: The entire UI is very bare-bones and utilitarian. It is a different matter that this is the kind of UI you couldn’t sell to an average computer user, despite the fact that the UX is probably better than most software they attest to being routinely annoyed by (which is not to say that my UI/UX is perfect, by no means! The best I can claim is that it might be close to an optimum given the constraints of classical web development and a UN*X-oid drive towards a conception of simplicity). I use html/template to generate both the UI and the exported HTML and despite parsing the template every time the script runs, it is very quick (but probably not as fast as a language that would actually support compiling a template and other DSLs such as regular expressions at compile time…). The main slowdown in my estimation is the NFS overhead — this is where using compress/gzip comes in handy, as even now with a miniature database of 13K, compressing the file means that I only have to write and write 7K.

The CGI script may also be invoked from the command line and has a few sub-commands to set the password (which also initialises the database the first time you use it) or to set options that enable generating an Atom feed. Note that if you invoke a CGI script with a query string like ?foo, then foo will be passed as argv[1] to the script. So it is necessary to check that we are not in a CGI environment to avoid people setting the password with a clever HTTP request.

The main reason I added the Atom feed is to have an overview of the newest links. The CGI script already remembers when I added each node, so I don’t have to take care of that manually. If enabled, generating an atom feed will traverse the tree of nodes and pick out the newest 20 (by default, though you can configure that). The Atom feed is then generated using encoding/xml and some annotated structs. I stole an XSLT stylesheet and after trying to make an LLM generate a blogpost on how to inject HTML into the output of the stylesheet (which pointed me in the direction of ensuring my output was XHTML instead of wrapping it in a CDATA block), I made the feed readable in a browser without needing any Javascript…

In conclusion, I am starting to think that the design of this little site, especially if it is just to be used by a single person, is not that absurd after all. The choice of dumping the database in a single file is controversial, but also involves a negligible overhead on modern machines.

If you are interested in reading the source code, just append a ?source to the end of the CGI URL:

and your browser will download the .go file (just like with all my CGI scripts). You need a modern Go toolchain, since I have used this as an opportunity to actually play around and use “newer” features like generics, the slices library or new utility functions like WaitGroup.Go. Using Go remains a Go choice for this projects, as I can trivially cross-compile an executable on my GNU/Linux laptop to run on the NetBSD SDF server, and the standard library provides a lot of useful functionality OOTB. The source code is in the public domain and if you decide to use it, you are on your own. The only version control I am using are the numbered backup files that Emacs generates for me. This is gossip-ware: If you use it, you maintain it. There is no upstream, release tarball or issue tracker.

To conclude the conclusion, I have two fears: The first is that my inclination for interesting-but-non-standard solutions might come to bite me in the years to come, at which point it will become a major headache to address the issues. The second is that by publicly writing about how this works, malicious actors might try to exploit it. I am not arguing for security-by-obscurity, but drawing attention to it doesn’t help. In my experience, Go has provided a reasonably secure foundation and best practices that prevents common security loopholes, so in my threat model, the most likely issue would be a DDOS attack, which affects all SDF members.

So make of this all what you will, I felt like writing about this as it was raining all day here, so I coded myself into a trance to add a number of missing features that were bugging me.