Chapters

Hide chapters

Data Structures & Algorithms in Dart

First Edition · Flutter · Dart 2.15 · VS Code 1.63

Section VI: Challenge Solutions

Section 6: 20 chapters
Show chapters Hide chapters

11. Tries
Written by Kelvin Lau & Jonathan Sande

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

The trie, pronounced “try”, is a tree that specializes in storing data that can be represented as a collection, such as English words:

root C A T. T. E. U T O. A.
A trie containing the words CAT, CUT, CUTE, TO and A

Each string character maps to a node where the last node is terminating. These are marked in the diagram above with a dot. The benefits of a trie are best illustrated by looking at it in the context of prefix matching.

In this chapter, you’ll first compare the performance of a trie to a list. Then you’ll implement the trie from scratch!

List vs. Trie

You’re given a collection of strings. How would you build a component that handles prefix matching? Here’s one way:

class EnglishDictionary {
  final List<String> words = [];

  List<String> lookup(String prefix) {
    return words.where((word) {
      return word.startsWith(prefix);
    }).toList();
  }
}

lookup will go through the collection of strings and return those that match the prefix.

This algorithm is reasonable if the number of elements in the words list is small. But if you’re dealing with more than a few thousand words, the time it takes to go through the words list will be unacceptable. The time complexity of lookup is O(k × n), where k is the longest string in the collection, and n is the number of words you need to check.

Imagine the number of words Google needs to parse
Imagine the number of words Google needs to parse

The trie data structure has excellent performance characteristics for this problem. Since it’s a tree with nodes that support multiple children, each node can represent a single character.

You form a word by tracing the collection of characters from the root to a node with a special indicator — a terminator — represented by a black dot. An interesting characteristic of the trie is that multiple words can share the same characters.

To illustrate the performance benefits of the trie, consider the following example in which you need to find the words with the prefix CU. First, you travel to the node containing C. That quickly excludes other branches of the trie from the search operation:

root C A T. T. E. U T O. A.

Next, you need to find the words that have the next letter, U. You traverse to the U node:

root C A T. T. E. U T O. A.

Since that’s the end of your prefix, the trie would return all collections formed by the chain of nodes from the U node. In this case, the words CUT and CUTE would be returned.

Imagine if this trie contained hundreds of thousands of words. The number of comparisons you can avoid by employing a trie is substantial.

root C A D G O. A R T. T. T. E. U T O. S. D. D. A. Z O O.

Implementation

As always, open up the starter project for this chapter.

TrieNode

You’ll begin by creating the node for the trie. Create a lib folder in the root of your project and add a file to it named trie_node.dart. Add the following to the file:

class TrieNode<T> {
  TrieNode({this.key, this.parent});

  // 1
  T? key;

  // 2
  TrieNode<T>? parent;

  // 3
  Map<T, TrieNode<T>?> children = {};

  // 4
  bool isTerminating = false;
}

Trie

Next, you’ll create the trie itself, which will manage the nodes. Since strings are one of the most common uses for tries, this chapter will walk you through building a String-based trie. In Challenge 2 at the end of the chapter, you’ll create a generic trie that can handle any iterable collection.

import 'trie_node.dart';

class StringTrie {
  TrieNode<int> root = TrieNode(key: null, parent: null);
}

Insert

Tries work on collections internally, so you’ll need to take whatever string is inserted and convert its code units into TrieNode keys.

void insert(String text) {
  // 1
  var current = root;

  // 2
  for (var codeUnit in text.codeUnits) {
    current.children[codeUnit] ??= TrieNode(
      key: codeUnit,
      parent: current,
    );
    current = current.children[codeUnit]!;
  }

  // 3
  current.isTerminating = true;
}

Contains

contains is very similar to insert. Add the following method to StringTrie:

bool contains(String text) {
  var current = root;
  for (var codeUnit in text.codeUnits) {
    final child = current.children[codeUnit];
    if (child == null) {
      return false;
    }
    current = child;
  }
  return current.isTerminating;
}
import 'package:starter/string_trie.dart';

void main() {
  final trie = StringTrie();
  trie.insert("cute");
  if (trie.contains("cute")) {
    print("cute is in the trie");
  }
}
cute is in the trie

Remove

Removing a node from the trie is a bit more tricky. You need to be particularly careful since multiple collections can share nodes.

void remove(String text) {
  // 1
  var current = root;
  for (final codeUnit in text.codeUnits) {
    final child = current.children[codeUnit];
    if (child == null) {
      return;
    }
    current = child;
  }
  if (!current.isTerminating) {
    return;
  }
  // 2
  current.isTerminating = false;
  // 3
  while (current.parent != null &&
        current.children.isEmpty &&
        !current.isTerminating) {

    current.parent!.children[current.key!] = null;
    current = current.parent!;
  }
}
final trie = StringTrie();
trie.insert('cut');
trie.insert('cute');

assert(trie.contains('cut'));
print('"cut" is in the trie');
assert(trie.contains('cute'));
print('"cute" is in the trie');

print('\n--- Removing "cut" ---');
trie.remove('cut');
assert(!trie.contains('cut'));
assert(trie.contains('cute'));
print('"cute" is still in the trie');
"cut" is in the trie
"cute" is in the trie

--- Removing "cut" ---
"cute" is still in the trie

Prefix Matching

The most iconic algorithm for a trie is the prefix-matching algorithm. Write the following at the bottom of StringTrie:

List<String> matchPrefix(String prefix) {
  // 1
  var current = root;
  for (final codeUnit in prefix.codeUnits) {
    final child = current.children[codeUnit];
    if (child == null) {
      return [];
    }
    current = child;
  }

  // 2 (to be implemented shortly)
  return _moreMatches(prefix, current);
}
List<String> _moreMatches(String prefix, TrieNode<int> node) {
  // 1
  List<String> results = [];
  if (node.isTerminating) {
    results.add(prefix);
  }
  // 2
  for (final child in node.children.values) {
    final codeUnit = child!.key!;
    results.addAll(
      _moreMatches(
        '$prefix${String.fromCharCode(codeUnit)}',
        child,
      ),
    );
  }
  return results;
}
final trie = StringTrie();
trie.insert('car');
trie.insert('card');
trie.insert('care');
trie.insert('cared');
trie.insert('cars');
trie.insert('carbs');
trie.insert('carapace');
trie.insert('cargo');

print('Collections starting with "car"');
final prefixedWithCar = trie.matchPrefix('car');
print(prefixedWithCar);

print('\nCollections starting with "care"');
final prefixedWithCare = trie.matchPrefix('care');
print(prefixedWithCare);
Collections starting with "car"
[car, card, care, cared, cars, carbs, carapace, cargo]

Collections starting with "care"
[care, cared]

Challenges

How was this chapter for you? Are you ready to take it a bit further? The following challenges will ask you to add functionality to and generalize what you’ve already accomplished. Check out the Challenge Solutions section or the supplemental materials that come with the book if you need any help.

Challenge 1: Additional Properties

The current implementation of StringTrie is missing some notable operations. Your task for this challenge is to augment the current implementation of the trie by adding the following:

Challenge 2: Generic Trie

The trie data structure can be used beyond strings. Make a new class named Trie that handles any iterable collection. Implement the insert, contains and remove methods.

Key Points

  • Tries provide great performance metrics for prefix matching.
  • Tries are relatively memory efficient since individual nodes can be shared between many different values. For example, “car,” “carbs,” and “care” can share the first three letters of the word.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now