System Design — Autocomplete

Dingding Wang
4 min readSep 27, 2019

Product Requirements

  • Give a list of query suggestions based on user input (as a prefix)
  • Suggestions ordered by some ranking score

Non-product Requirements

  • High performance, response should be quicker than user type speed, let’s say < 200ms
  • Result accuracy and system availability are not strictly required

Main Data Structure

Trie is a data structure that fits prefix search well.

To save space, instead of presenting each char with one node, we present common substring in a node. For example, ‘rest’, ‘restaurant’, ‘restroom’ and ‘restriction’ share the common prefix ‘rest’. In our trie, ‘rest’ is stored in a parent node with a bit set to True indicating the the node itself is a completion (‘rest’), and its children branches are ‘restaurant’, ‘restroom’ and ‘restriction’.

Now if there’s a user input of length L, and we want to return K results from the trie that contains N words, average lookup time would be: O(L) + O(Nlog(K))

O(L): find the node where prefix ends, worst case we’ll need to traverse L nodes to get that node.

O(Nlog(K)): after getting to prefix end, we have to traverse every sub-nodes so that to gather all completions with the prefix. Worst case we have to traverse the whole trie. And of course, if we want to return top K results, we parse all completions through a min heap of size K, this requires Nlog(K) time.

Advanced Data Structure and Higher Level Architecture

Since response speed is the most important criteria, let’s see if we can do better.

  1. Store sorted result in node

To prevent traversing every subtree for a prefix, we can do some pre-computations and store results in that prefix-end node. In our case, the sorted list of top K words with that prefix.

With this data structure, we sacrifice space to get better performance. A prefix…

--

--

Dingding Wang

Former Yelper, now a Snapchatter. Focus on Payment transaction system, Search system, Web API server and Internationalization.