Suffix Tree

I. OVERVIEW

A "trie" is a tree-like data structure whose nodes store the letters of an alphabet. A Suffix Tree is simply a compressed trie for all suffixes of a given text, and is extremely useful in searching for patterns. It is a space-efficient data structure to store strings that allows many kinds of queries to be answered quickly.

This repository includes a Javascript implementation/demo of a Generalized Suffix Tree. The demo can also be found at: https://projects.aziztitu.com/sau/ai/suffix-tree-demo/

II. HOW DOES IT WORK?

A Suffix Tree works by storing all the suffixes of the word(s) in a tree like structure.

For example, the suffixes of the word "Banana" are:

Banana
anana
nana
ana
na
a

Now, the suffixes with common prefixes can be combined:

"anana" and "ana"
"nana" and "na"
"anana" and "a"

After combining the suffixes, they look something like this:

Before Combining (A Regular Tree):	After Combining (A Standard Trie):

The next and final step is to reduce the number of unnecessary edges by compressing the nodes. This is done by joining chains of single nodes together, and representing them as one node. It looks something like this:

After Compressing (A Suffix Tree):

Constructing the Suffix Tree in code can get very complex, and a naive implementation for generating a suffix tree would have O(n²) or even O(n³) time complexity.

But using Ukkonen's algorithm, the entire suffix tree can be built on-line (on-the-fly) in O(n) time complexity where n is the length of the text to be preprocessed.

Pattern Searching:

Now that the suffix tree is built by preprocessing the text, we can search for any pattern in O(m) time complexity where m is the length of the pattern.

To search for a pattern on the suffix tree:

Starting from the root node, do the following:
- If there is an edge from the current node leading to the current character in the pattern,
  - Follow that edge to get to the next node
  - Move to the next character in the pattern.
- If not,
  - The pattern does not exist in the text.
If all the characters in the pattern have been processed, the pattern is found.

Generalized Suffix Tree:

A Generalized Suffix tree is a variation on the regular suffix tree in which suffixes for multiple strings can be stored.

This project includes an implementation of a Generalized Suffix tree that uses Ukkonen's algorithm.

For more information on Generalized Suffix Tree, take a look at: https://www.geeksforgeeks.org/generalized-suffix-tree-1/

III. HOW IS THIS DATA STRUCTURE USEFUL?

Some of the applications of suffix trees include:

Searching for large sequences of genomes
Text auto-completion
Finding longest repeating substring
Finding the longest common substring
Finding the longest palindrome in a string

There are other approaches to some of these applications, but Suffix trees are an ideal fit as they have a linear time complexity for "on-line" construction as well as for searching.

Other approaches include algorithms such as KMP, and Boyer-Moore that preprocess the pattern to make the pattern searching better, whereas the Suffix Tree preprocesses the text to make any pattern searching better.

Also since Suffix Tree is more general, you can perform a wide variety of queries on it. The downside however is that it uses more disk space and takes a little longer to preprocess the text.

IV. FURTHER READING

"On-line Contruction of suffix trees" by Esko Ukkonen: https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

Ukkonen's Algorithm in plain english: https://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english

Suffix tree and it's construction with some psuedo code: http://www.allisons.org/ll/AlgDS/Tree/Suffix/

Pattern searching using Suffix Trees: https://www.geeksforgeeks.org/pattern-searching-using-suffix-tree/

Ukkonen's algorithm for Generalized Suffix Trees: https://stackoverflow.com/questions/28278802/ukkonens-algorithm-for-generalized-suffix-trees

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
js		js
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Suffix Tree

I. OVERVIEW

II. HOW DOES IT WORK?

Pattern Searching:

Generalized Suffix Tree:

III. HOW IS THIS DATA STRUCTURE USEFUL?

IV. FURTHER READING

About

Releases

Packages

Contributors 2

Languages

License

aziztitu/suffix-tree-demo

Folders and files

Latest commit

History

Repository files navigation

Suffix Tree

I. OVERVIEW

II. HOW DOES IT WORK?

Pattern Searching:

Generalized Suffix Tree:

III. HOW IS THIS DATA STRUCTURE USEFUL?

IV. FURTHER READING

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages