What are we really looking for?

Words by

Published on

August 17, 2024

Past, Present, and Future of Search

<p>Agostino Ramelli's 1588 invention, <em>The Book Wheel</em></p> — Agostino Ramelli's 1588 invention, The Book Wheel
‍

In light of some the major moments happening in the world of internet search (SearchGPT, Antitrust Lawsuits), I'm going to be writing a two-partseries of short posts on the past, present, and future of search. The utopian case for the internet has, to my mind, always been the navigation of all accumulated human knowledge, and so the fight over search represents to me a larger fight over the future of the internet.
‍

The Context of the Problem

There is one central, blessedly simple and cruelly complex problem at the heart of being someone who wants to learn something new. The pathways towards that knowledges exists somewhere, maybe in a dusty library somewhere at the edge of the Earth, maybe held in the brain of some eccentric shut-in intellectual, but finding out where and how to access that knowledge is a vertiginous process. There is more stuff out there than we can possibly imagine, and finding an easy way to sift through it all to find what you might need is a challenge we, as a human species, never fully solved.

I lay that down as mostly a challenge, because what I want to examine is some of the common myths undergirding our conception of "search" as it happens both online and offline. Particularly, the bias towards understanding search as solely a means of retrieval of information you are aware of but don't know. "When is the Superbowl?" "Who won the Grammy for Album of the Year in 1988?" (It was U2's Joshua Tree. Somehow Prince's Sign O'The Times wasn't even nominated). This is one use case, for sure, but it is not exhaustive, nor does it provide us with a sufficient mental for how the process of building out a knowledge base actually works.

But, if we stick with retrieval as the use case for a moment, the central tension has always been the cognitive load of sorting through answers to your query that only half address or obliquely mention a potential answer to your question. If you were to Google "does lifting weights make you shorter?" you're likely to get a menagerie of things like men's health magazines saying "no, it doesn't", Reddit threads about people who claim they got shorter after lifting too much, esoteric academic papers that give you a qualified half-answer, a series of YouTube videos by people who you've never heard of talking about something kind of similar.

No one is getting paid to answer your specific question, so what you are left with is a series of sources that leave the answer in your hands, albeit a bit more equipped with the tools to assemble it.

Paul Otlet, an early 20th-century major figure in information sciences, imagined a more precise and efficient way for the retrieval process. It involved cataloging all the world’s knowledge in cross-referenced and tagged index cards, and, through the use of "radio telescopes," one would be able to find a card on exercise --> weightlifting --> effect on height.

<p>"Paul Otlet’s vision of a project to unbundle and itemize all knowledge, showing individual ideas flowing into authors who put them into books which get decomposed into an index card based encyclopedia answering granular questions."</p> — "Paul Otlet’s vision of a project to unbundle and itemize all knowledge, showing individual ideas flowing into authors who put them into books which get decomposed into an index card based encyclopedia answering granular questions."
‍

The problem, of course, is that you, as the searcher, learn unexpected things in the process of piecing together imperfect answers from imperfect sources. There might be pieces of important information not included in my original question that prove to be helpful in forming a well-formed answer; exercises to strengthen your back and improve your posture, for example. This isn't what I was asking for per se, but it deepens my contextual understanding of my original question.
‍

The Memex and Hypertext

In 1945, Vannevar Bush, who led the U.S. Office of Scientific Research and Development, published an article in The Atlantic titled As We May Think. The most famous part of the essay occurs around section six where he frames up the problem endemic to any indexing endeavor:

"When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain."

Bush's solution to our associative way of thinking was a speculative invention -- the memex, a desk-like device equipped with screens, levers, and buttons that contained a large storage capacity for books, records, and communications, all of which would be stored on microfilm. The user of the memex would have at his fingertips a huge catalogue of information with the ability to call upon specific media using different codes or triggers.

<p>A rendering of the memex.</p> — A rendering of the memex.
‍

The most revolutionary part of the memex wasn't necessarily its interaction paradigms, but Bush's idea that the user could use associative indexing, connecting any piece of information to another to create trails of related information. These could be examined and annotated by others, slowly building out a collaborative knowledge base in which one piece of information could lead a user down a wide diversity of trains of thought, depending on the context of the search.

"The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds and interesting but sketchy article, leaves it projected, Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him."

It's easy to imagine the different ways this same search could branch off into more and more nodes and side trails as links give way to more links. Rather than arriving at the physical properties of the elasticity of the bow, the searcher branches off earlier into a new trail about the effects of the success of the Turkish bow on the longterm consequences on the Crusades and the presence of Christian missionaries in the Middle East.

What's most important to note here is that the search experience envisioned by the memex doesn't see the goal of search as the retrieval of specific information, but as more fully capturing the context of an ask and the end-to-end sense-making arc one uses to arrive at a provisional stopping point. Answers, of course, are important, but are ultimately secondary and a downriver effect of a process that begins by recognizing one's intent to answer a question, identify the context under which you are asking that question (are you interested in the construction of bow and arrows, or the Crusades?), comb through a diverse array of sources (some of which match your intent, some of which don't), and construct a trail of connected information that shows how each piece of knowledge relates to each other so that one could feasibly retrace your steps from wood elasticity to the Crusades.

What associative indexing provides is the possibility that, in the process of searching, you might learn something you didn't mean to learn, but nonetheless proves valuable. Maybe all you wanted to know was the history of Turkish-English Crusade skirmishes, but the associative linking to other sources of knowledge shows you that, crucial to understanding these skirmishes, is the elasticity of certain types of wood. Now you know about wood and the Crusades.
‍

Broadening Our Understanding

No one built the Memex, nor did we use radio telescopes to comb through the world's biggest vat of notecards. These were speculative inventions, thought experiments that illustrated different approaches to knowledge-gathering, the confluence of how our brains work and how technology works.

Unconstrained by things like SEO hacking, selling advertisements, and data scraping, what Bush, Otlet, and their predecessors and contemporaries (people like H.G. Wells, Emmanuel Goldberg, Patrick Geddes, Otto Neurath, and Wilhelm Otswald) imagined was information technology that captured the full spectrum of knowledge acquisition. Not just as a way to find an answer to a question, but to pile on layers of context so that we more fully understand the process by which we arrive at a provisional answer to questions that might be more ambiguous than something like, "what day is Easter this year?"

And it's provisional because, apart from the most straightforward of questions, what we're really doing when engaging in any sort of search process is assembling, piece-by-piece, sense-making arcs that include a multiplicity of different sources and opinions, all offering different angles of the same larger picture. Shift the frame, and you learn something new.

Insofar as it's useful to revisit some of the earlier major moments in information technology, it's to cast the process of engaging with a search engine in a new light. There will always be the "retrieval" use case, but that only covers a fraction of how we actually behave when we're trying to understand something new. Connecting you to a trust-worthy source of data on an indisputable fact is the lowest common denominator. But what really makes a search tool worthy of its name is how it helps us interact with imperfect information: answers that are slightly askew of the question we might have asked, but contribute to a larger, more holistic understanding of the thing about which we are trying to learn. This is the useful friction of a search tool that provides us with things we didn't know we needed, or at least with information that is off base, but in a way that still proves generative.

The Context of the Problem

The Memex and Hypertext

Broadening Our Understanding

Modalities of Search

The Pragmatism of Courage

AI is getting emotional. Is this supposed to save the digital economy?