To search or not to search

Johannes Stiehler

November 2022

Technology

#Search

Socks and science

Search is extraordinarily present in both our analog and digital lives. As far as analog life is concerned, we don’t like to talk about it. If I have to search for something, I’ve either forgotten where it is - so my memory is weak - or I haven’t even put it in a thought-out place - so my sense of order is underdeveloped.

Only if a big amount of information needs to be managed, “searching” becomes acceptable, e.g. in a library. Of course, in this case we don’t search in an unordered pile of books. Usually they are pre-ordered - first by category, then alphabetically by author. But what if I only remember the name of the main character? Or if I’m looking for a rare subcategory (“daguerreotype” in the photography segment). This cannot be solved by arrangement any more. Neither would one like to have arbitrarily small subcategory shelves, nor would one like to buy books two or three times so that they can be in different overlapping sections. Libraries used to have a subject catalog for this: A cabinet full of index cards, each with a word (“daguerreotype”) or phrase (“categorical imperative”) that contained references to all the books related to it.

Digital distress

In the digital realm, one is actually almost always dealing with such a large amount of information that categories and a single ordering method (alphabetical) are not sufficient to access subjects in a meaningful way. Even a normal online store already has so many items that I have to select half a dozen filters (size, color, type, cut) to get to something I like - sometimes. On the other hand, things are possible in the digital world that are very difficult in the analog counterpart, for example, applying several different category trees to the same data set: I can group books by literary history (Romanticism → Late Romanticism, Middle Ages → Early Middle Ages) or geographically (Latin America → Brazil, Asia → Mongolia) or by genre (novel → crime novel → regional crime novel → regional crime novel in Swabian dialect). This could be refined and extended at will: Impossible to solve in the analog domain, now relatively commonplace in the digital domain.

Full text rules

And how great would it be to have a keyword catalog that simply records all the words from all the books? Voilà, that’s full-text search. It does nothing more than create a queryable list of all words, each of which has pointers to the “books” that contain it.

Search is everywhere, search is always necessary, but search is of course never an end in itself. On the contrary, ideally you shouldn’t even really notice that you’re searching. The less effort it takes to search, the more successful the offer normally is; the less a user has to type into the search to get to the goal, the better. This has always been the case, culminating in an approach we used to call “zero term search,” i.e., search that is triggered only by what we know about the user rather than what they type in.

Customer-driven vs. commodity

In the early 2000s, the “search engine” was such a dominant topic that many of our customers at the time had their own sophisticated ideas about what problems they could solve with this engine. Or, they afforded themselves an expensive search platform because everyone else had one, too. Many of these search solutions were tremendously exciting and well thought out, but ahead of their time, others were just terrible because completely ill-considered and out of touch with the user. And some were thoughtful and got to the heart of the user’s problem. These were the cases where skillful use of off-the-shelf software led to real business success.

Meanwhile, search is a commodity built into other applications, a feature that is expected. The advantage is that you can usually find your stuff. The downside is that not much thought is given to search as an “enabler” of complex applications. Often an unoptimized full-text search is put on a web page and this then becomes the “service offering”. Or customers have to make do with the search in the standard store software to find the right products. If the products are called “???” (The Three Question Marks) or if it is completely unclear how to transliterate Bulgakov / Bulgakoff / Bulgakow in German, well, that’s just bad luck and you have to read something by Miller or Smith with a “normal” title.

Search is key

From our point of view, search functions are essential for the success of a wide range of software. It must not be - even today - a neglected secondary aspect. It must not simply be standardized, because the use cases of the users are also not that generally standardized. I own a book called “Search Patterns” which is from 2010. It outlines, among other things, the “triumvirate” of auto-complete, search, filtering - a long established way for a user to quickly get their desired result. This book is now 11 years old and still many offers on the web have simply rolled out the antipattern “search box → bad result” instead. No wonder that people go to Google to find information on such a website.

Yes, that’s right, why don’t we actually have everyone go to Google? Isn’t that the cheapest (because free) solution to finding something? That’s often true, e.g. for information-only portals, but by doing so, the information owner completely relinquishes the search experience. Why spend hours optimizing menu structures that no one clicks on anyway, but hand over the essential tool that users use to interact with my site to a third party?

You can already tell: we’re not done with search. While the necessary software is becoming more and more commodity - either as standard feature in commercial software or via open source – configuring, optimizing and using search to delight users is still a real challenge that requires effort but also promises high returns.

In the past, we’ve worked primarily on types of use cases that can’t be solved adequately by using Google. Google is largely agnostic to the use case. This is both its strength and its weakness. This allows it to arbitrarily appropriate and utilize content that others have created and make billions from it. But on the other hand, it can hardly respond to the specifics of the content and the user’s interests, because it would then no longer be “general-purpose”.

So if you have very specific content (e.g. scientific articles or spare parts for bicycles) and you know a lot about the intentions and needs of your users, you will still benefit from search-based solutions. We are happy to help with that.

Johannes Stiehler

CO-Founder NEOMO GmbH

Johannes has spent his entire professional career working on software solutions that process, enrich and surface textual information.

There's more where this came from!

Subscribe to our newsletter

If you want to disconnect from the Twitter madness and LinkedIn bubble but still want our content, we are honoured and we got you covered: Our Newsletter will keep you posted on all that is noteworthy.

Please use the form below to subscribe.

Follow us for insights, updates and random rants!

Whenever new content is available or something noteworthy is happening in the industry, we've got you covered.

Follow us on LinkedIn and Twitter to get the news and on YouTube for moving pictures.

Sharing is caring

If you like what we have to contribute, please help us get the word out by activating your own network.

“Digitale Wissenbissen": Generative AI agents - One job, one bot

January 2025

In this episode, we delve deep into the world of Agentic AI and explore whether specialized AI agents can revolutionize business processes by handling complex tasks autonomously and efficiently.

Read more

"Digitale Wissensbissen": The future of data analysis – A conversation with Christian Schömmer

November 2024

Data Warehouse, Data Lake, Data Lakehouse - the terms are constantly escalating. But what do I really need for which purpose? Is my old (and expensive) database sufficient or would a “Data Lakehouse” really help my business? Especially in combination with Generative AI, the possibilities are as diverse as they are confusing. Together with Christian Schömmer, we sit down in front of the data house by the lake and get to the bottom of it.

Read more

"Digitale Wissensbissen": Generative AI in Business-Critical Processes

September 2024

After the somewhat critical view of generative AI in the last episode, this time we are looking at the specific application: can generative AI already be integrated into business processes and, if so, how exactly does it work? It turns out that if you follow two or three basic rules, most of the problems fade into the background and the cool possibilities of generative AI can be exploited with relatively little risk. We discuss in detail how we built a compliance application that maximizes the benefits of large language models without sacrificing human control and accountability. (Episode in German)

Read more