I’ve always been fascinated by screens, especially those with video games on them. Somehow whole universes can emerge from grids of tiny lights and the mystery of “how this works” is what originally drew me to HCI. I wanted to design systems that dance with imagination and study the interface between the human and the computer. So far that has taught me a lot about both.
In general I feel we under-think our interfaces in modern software in favour of convention and convenience[1]. I don’t distinguish by medium when I use the word “interface”, however I mostly mean: textual, audiovisual or mechanical. The latest trend in under-thinking interface design is to remove the interface entirely and replace it with an LLM[2]. This is a fun mental move and seems like it could unlock a world of new abilities, but it’s backwards.
Interfaces Are Already Languages
An interface is more than a control panel covered in buttons[3]. It’s the shared vocabulary that emerges when two different systems of representation need to communicate.
Look at any modern software application: buttons are verbs, boxes with drop-shadows are nouns, API requests are grammatical structures. We’re not “using” interfaces so much as speaking them. When you pick up a new piece of software you can usually operate it but you lack fluency, you’re still learning the dialect.
This fluency is how humans develop “muscle memory” for computer interfaces. Experienced users can operate hardware and software without conscious thought. They’ve internalized the language. Working on video games makes this extremely obvious, but it applies to web pages and CLIs all the same.
new prototype: first-person parkour wizard delivery service #gamedev #screenshotsaturday
Don’t Go All In On Chat
Traditional interfaces feel rigid because they’re solving a rigid problem: machines can only be operated by one interface, the one they were built with. Humans must meet the machine at whatever static model of reality it has. In practice this leads to performing long, tedious chains of reliable operations.
Enter ChatGPT, promising to replace everything with “natural language.” There’s a sharp tradeoff here: what you gain in flexibility, you lose in reliability[4]. I can lower my precision but I will never develop fluency in an interface by explaining my whims to an LLM.
LLMs (by their static nature) also cannot develop fluency in an interface beyond what can be re-explained in a new context window. They are models of language, not language itself. They’re statistical snapshots of evolving patterns of speech. Real language evolves through use and builds shared meaning through repetition and develops specialised grammar & vocabulary for specific contexts.
When we make the LLM the entire interface, we create a stochastic obfuscation that prevents formation of shared meaning. You can’t develop fluency with something that responds randomly.
The Real Opportunity
LLMs aren’t meant to be the interface. They’re translators.
Everything is easier if you think of them as fuzzy translators that can speak almost every dialect: bridging between human requests and system capabilities. They augment the interface, rather than replacing it. They can teach you the basics for you to develop your own fluency. They can even act as custodians of the vocabulary, helping humans ensure coherence.
The reason that “the tools that are most “ergonomic” for agents also end up being surprisingly intuitive to grasp as humans” is that both are operating in the same linguistic space.
Let’s start basic: imagine you are exploring a database you’ve never seen before. You connect with an LLM and a database client and ask it to list the tables and explore the data. Then you ask follow up questions via queries you can audit. Finally, the LLM converts these queries into views or stored procedures. At each stage, you and the LLM are free to swap in and out of the driver’s seat and you are exposed directly to the underlying concepts and data.
Later, if you return without an LLM you can still access the familiar perspective you encoded into the views and procedures. The interface evolves but your perspective is stable.
Unfortunately, even in my example above the cracks start to show. What if I want to actually change the names of tables, or remove a column? What if a collaborator disagrees with my perspective on the best model? What if the LLM deletes all my data?
If interfaces are languages, how do we design digital languages that can safely evolve with use?
✌️ Ben
What I’ve been thinking about
[1] almost entirely ignoring hardware and the depth of sensory input
[2] or maybe an obnoxious voice
[3] though that’s often what it feels like in practice
[4] you are also confronted with the challenge of articulating what you really want upfront, instead of letting it unfold gradually