The Trie is an ordered tree data structure used usually to store strings, in which case each node defines an association with a letter of the alphabet. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
1. Example
The best way to understand this data structure is with an example. The above example shows a trie for the complete English words: "A","to", "tea", "ted", "ten", "i", "in", and "inn". The English words are listed in the nodes. Each complete English word has an arbitrary integer value associated with it, e.g. "ten" has the value 12. The value can used to identify if the node is a leaf node or to indicate the index position of the information related to the English word in another array. Thus, the Trie can be used as a fast search data structure.
2. Time complexity
best | average / worst case | |
space | O(k*n) | O(k*n) |
search | O(1) | O(L) |
insert | O(1) | O(L) |
delete | O(1) | O(L) |
As we can see in the Table, in the best and worst cases the space complexity is O(k*n), where k is the alphabet size and n is the total number of nodes, which in practice is sometime unaffordable. Thereby, the Trie needs a total of k*n pointers. For the three operations (search, insert and delete), the time complexity of the Trie in the best case is constant (1) and in the average and worst cases is L, where L is the size of the input string for the operation.
3. Where to use it ?
- To replace other data structures
-
Binary search trees.
-
Hash table, over which it has the following advantages:
-
Looking up data in a trie is faster in the worst case, O(L) time (where L is the length of a search string), compared to an imperfect hash table. An imperfect hash table can have key collisions. A key collision is the hash function mapping of different keys to the same position in a hash table. The worst-case lookup speed in an imperfect hash table is O(N) time, but far more typically is O(1), with O(m) time spent evaluating the hash.
-
There are no collisions of different keys.
-
Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
-
There is no need to provide a hash function or to change hash functions as more keys are added to a trie.
-
A trie can provide an alphabetical ordering of the entries by key.
-
-
- Dictionay representation
-
Storing a predictive text or autocomplete dictionary, such as found on a mobile telephone. Such applications take advantage of a trie's ability to quickly search for, insert, and delete entries; however, if storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.
-
Tries are also well suited for implementing approximate matching algorithms, including those used in spell checking and hyphenation software.
-
A discrimination tree term index stores its information in a trie data structure.
-
4. Drawbacks
- Tries can be slower in some cases than hash tables for looking up data, especially if the data is directly accessed on a hard disk drive or some other secondary storage device where the random-access time is high compared to main memory.
- Some keys, such as floating point numbers, can lead to long chains and prefixes that are not particularly meaningful. Nevertheless, a bitwise trie can handle standard IEEE single and double format floating point numbers.
- Some tries can require more space than a hash table, as memory may be allocated for each character in the search string, rather than a single chunk of memory for the whole entry, as in most hash tables.
5. Source code
- C++: can be downloaded from the [GitHub] - [Used to solve tutorials in Hackerrank]
- Java: can be downloaded from the [GitHub] - [Used to solve tutorials in Hackerrank]
Please, feel free to give me your comments :)