Saturday, 25 October 2025
Summary: I wrote a CGI “script” in Go to maintain my public bookmarks. It dumps and reads a binary file storing all the data. In this article I comment on the implementation and my motivations.
I love hyperlinks. So it is not a surprise that I am keen on this WWW thing. There is so much to see, that it is hard to remember, and so easy to forget. Being of the age that I’ve been surfing the web since my early childhood, there is a lot I know I won’t find again — be it due to link rot or the gradual deterioration search engines, as adversarial strategies to climb their ranks become more and more widespread.
I started bookmarking at a young age as well. Somewhen around 2002 or
2003 I distinctly remember being proud of my little bookmark collection
I had in Internet Explorer on our family computer. When we later
switched to Firefox, we imported them over and I had a sub-directory of
the IE links. When Chrome came up we repeated that again, and deciding
to switch back to Firefox, the directory structure of my bookmarks was a
linked list of four elements. I got into Linux around that time and was
re-installing operating systems behind my parents back, and an
unfortunate dd
command was enough to blow my cover. Luckily we didn’t loose everything,
but my little bookmark collection, that had to be 10 years old at that
point fell victim to the consequences of overwriting that those first
“few” megabytes of the file system. My punishment was that I was
promoted to my families system administrator for life 😓.
Either way, I still collect link and find delight in finding just the right kind of web pages. There are different properties that evoke this sensation, and it can be anything from elegant simplicity to baroque detail, though-provoking ideas or clever implementations. Whatever it is, I know I won’t remember it, but I am capable of recalling the necessary information to locate it again. But as you can imagine, these trails often disappear or degenerate.
So back to bookmarking. I never got to use in-browser bookmarking in a reliable way. Social media sites like Reedit or Lobsters are a nuisance to use and have too much unnecessary drama. I tried keeping track of interesting sites in a local text file (which I also used to protocol every web search and what I found out as a consequence of that), but that too is cumbersome and limited to only some devices. My most productive setup up to date was a Org file I had on my old undergraduate university website, that I exported to HTML on a monthly basis.
$ curl https://wwwcip.cs.fau.de/~oj14ozun/links.html | hxwls | wc -l
reveals that I had accumulated some 1500 links. A real hub, I would say. Some good properties included:
Disadvantages include that to modify the list, I had to connect via SSH to a server and that the lack of metadata or just any structured data made it more complicated to change anything.
So after graduating, and moving over to the SDF cluster I decided to rethink my approach. Initially, I played around with a concept that involved Recutils, but I was not satisfied with the usability. This article will document what I came up with: I think it is great, others might be shocked and chagrined by what I will describe, mortified and stupefied by the details of the implementation. But perhaps someone might just like it and use it as well?
While previously I had a .org file that I exported to a
.html file, I have now severely complicated the situation.
It involves some 6-7 files, give or take. The main ones are:
The CGI script is password protected, and the password is salted and stored in the database. This might not scale in the future, as I currently have to load the entire database just to check the password. If that does end up being an issue, I can store the password separately and only load that.
Initially I intended to have the CGI script have a “read” view and an
“edit” view, but then I figured that I would be reading the page a lot
more than editing it, so I might has well just export the links from the
database only after editing it (which happens a few times a week) to a
static file (links.html), and have the static file link
back to the main CGI script for interaction.
It would be more traditional to use SQLite or something along those lines to store the
data. I decided against that, as encoding/gob makes it
trivial to load and dump any object into a
io.Reader/io.Writer. The main limitation is
that I cannot change the form of the data willy-nilly. To make this less
a problem, the “database” is presently a struct with three
fields:
type Data struct {
Links []Link // an arbitrarily branching tree type.
Auth []byte // contains the Bcrypt'ed password
Conf map[string]any // any other miscellaneous data
}Keeping Conf underspecified is the key for allowing
myself flexibility in the future, but it is also something to be weary
of in terms of possible complexity – the big advantage I have
is that this is just my script and I am the only user. Any inconvenience
I might part upon myself is my own fault, which makes myself as the user
most understanding.
The CGI script has some basic documentation on the start page, but
the main interface is revealed by links pointing from
links.html (these are abbreviated to @s
besides each node). Requests with a query string of the form
?add=/0/2/1/4 indicate a path through the tree to a
specific node. Underneath this node I can add more nodes. Each node may
have a title, a “source” link, some commentary (written in Markup,
though I might scrap this at some point because Goldmark (a Markdown library
written in Go) is currently my only real external dependency, the other
one being bcrypt from the
extended standard library) and a URL. Of course, the interesting nodes all have URLs,
but the non-URL nodes are there to add structure to the page.
The ?add=/x/y/z page also allows me to delete the node
(and recursively all nodes below it) or to move it around in the tree.
These operations all don’t need to load the database. Editing an
existing node actually has to load the existing data, so that is on a
separate page (?edit=/x/y/z).
I won’t beat around the bush: The entire UI is very bare-bones and utilitarian. It is a
different matter that this is the kind of UI you couldn’t sell to an
average computer user, despite the fact that the UX is probably better than most software they attest
to being routinely annoyed by (which is not to say that my UI/UX is
perfect, by no means! The best I can claim is that it might be close to
an optimum given the constraints of classical web development and a
UN*X-oid drive towards a conception of simplicity). I use html/template
to generate both the UI and the exported HTML and despite parsing the
template every time the script runs, it is very quick (but probably not
as fast as a language that would actually support compiling a template
and other DSLs such as
regular expressions at compile time…). The main slowdown in my
estimation is the NFS overhead
— this is where using compress/gzip
comes in handy, as even now with a miniature database of 13K,
compressing the file means that I only have to write and write 7K.
The CGI script may also be invoked from the command line and has a
few sub-commands to set the password (which also initialises the
database the first time you use it) or to set options that enable
generating an Atom feed. Note that if you invoke a CGI script with a
query string like ?foo, then foo will be
passed as argv[1] to the script. So it is necessary to
check that we are not in a CGI environment to avoid people setting the
password with a clever HTTP request.
The main reason I added the Atom feed is to have an overview of the
newest links. The CGI script already remembers when I added each node,
so I don’t have to take care of that manually. If enabled, generating an
atom feed will traverse the tree of nodes and pick out the newest 20 (by
default, though you can configure that). The Atom feed is then generated
using encoding/xml and
some annotated structs. I stole an XSLT
stylesheet and after trying to make an LLM generate a blogpost on how to inject HTML into the
output of the stylesheet (which pointed me in the direction of ensuring
my output was XHTML instead of wrapping it in a CDATA
block), I made the feed readable in a browser without needing any
Javascript…
In conclusion, I am starting to think that the design of this little site, especially if it is just to be used by a single person, is not that absurd after all. The choice of dumping the database in a single file is controversial, but also involves a negligible overhead on modern machines.
If you are interested in reading the source code, just append a
?source to the end of the CGI URL:
and your browser will download the .go file (just like
with all my CGI scripts). You need a modern Go toolchain, since I have
used this as an opportunity to actually play around and use “newer”
features like generics, the slices library or new
utility functions like WaitGroup.Go.
Using Go remains a Go choice for this projects, as I can trivially
cross-compile an executable on my GNU/Linux laptop to run on the NetBSD SDF server, and the standard
library provides a lot of useful functionality OOTB. The source code is in the public domain and if you
decide to use it, you are on your own. The only version control I am
using are the numbered
backup files that Emacs generates for me. This is gossip-ware: If
you use it, you maintain it. There is no upstream, release tarball or
issue tracker.
To conclude the conclusion, I have two fears: The first is that my inclination for interesting-but-non-standard solutions might come to bite me in the years to come, at which point it will become a major headache to address the issues. The second is that by publicly writing about how this works, malicious actors might try to exploit it. I am not arguing for security-by-obscurity, but drawing attention to it doesn’t help. In my experience, Go has provided a reasonably secure foundation and best practices that prevents common security loopholes, so in my threat model, the most likely issue would be a DDOS attack, which affects all SDF members.
So make of this all what you will, I felt like writing about this as it was raining all day here, so I coded myself into a trance to add a number of missing features that were bugging me.