Thursday, March 08, 2007

Phone Directory using Trie in C#


Trie is an ordered tree data structure that uses strings as keys. Unlike Binary Trees, Tries do not store keys associated with the node. The key is actually determined based on the position of the node on the tree. Any descendants of a node shares the common prefix of the key string associated with that node. Hence trie is also called as Prefix Tree. The word “trie” comes from Retrieval, and it is pronounced as “try”. To read more about Trie click here.

Since this data structure is a prefix tree, trie is commonly used in Dictionaries, Phone Directories and matching algorithms. Trie is best suited for phone directory (any matching application for that matter!) because it is very efficient in matching strings.

So I have decided to implement Trie myself in C#. I have created three classes:


Node: Represents a single tree node;
NodeCollection: Represents the children of a node;
Trie: Trie implementation to insert and search nodes.

Node: Node represents a basic tree node. Node implements both Depth First and Breadth First algorithms to search its children. It also contains its Parent node and Children nodes. Node has a key and a Value; Key contains the single character and value has the actual data of the node. The key of the node is determined by suffixing the single key character to its parent’s key. Node has a special property called IsTerminal and it is set to true if the key or a value represents a complete string. See the picture below:





NodeCollection: This class is a simple collection class which implements the IEnumerable interface for iteration operations. This class contains an internal List class of type Node. NodeCollection implements the standard list methods such as Add ( ),Contains ( ), Remove ( ) etc.

Trie: This class implements the Trie algorithms such as Insert and Find methods. Here is the InsertNode method:


//Inserts Names into the Trie data structure
public static Node InsertNode(string name, Node root)
{
//Is name null?
if (string.IsNullOrEmpty(name))
throw new ArgumentNullException("Null Key");

//set the index, start inserting characters
int index = 1;
//key
string key;
//start with the root node
Node currentNode = root;
//loop for all charecters in the name
while (index <= name.Length) { //get the key character
key = name[index - 1].ToString();
//does the node with same key already exist?
Node resultNode = currentNode.Children.GetNodeByKey(key);
//No, this is a new key
if (resultNode == null)
{
//Add a node
Node newNode = new Node(key, name.Substring(0, index));
//If reached the last charaecter, this is a valid full name
if (index == name.Length)
newNode.IsTerminal = true;
//add the node to currentNode(i.e. Root
// node for the first time)
currentNode.Children.Add(newNode);
//set as the current node
currentNode = newNode;
}
else
{
//node already exist, set as the current node
//and move to the next character in the name
;currentNode = resultNode;
}
//move to the next character in the name
index++;
}
//all done, return root node
return root;
}


The InsertNode method inserts the string as one character at a time. It starts with the first character; if the first character doesn’t already exist in the root node it adds a new node with the that character and returns the new node. Otherwise it returns the node which has that character to add the remaining characters. It loops until it adds the entire string to the root node. Once it reaches the last character, it marks that last node as a terminal node since the node represents a complete string in the tree hierarchy.

The Find methods is implemeted by Depth First search algorithm. The tree is searched until the complete string is found. Below is the code.


//Find a node given the key ex. "Jo"
public static bool Find(Node node, string key)
{
//Is key empty
if (string.IsNullOrEmpty(key))
return true;//terminal Node
//get the first character
string first = key.Substring(0, 1);
//get the tail: key - first character
string tail = key.Substring(1);
Node curNode = node.Children.GetNodeByKey(first);
//loop until you locate the key i.e. "Jo"
if (curNode != null)
{
return Find(curNode, tail);
}
else
{
//not found, return false
return false;
}
}


I‘ve attached the entire source code below. The source code contains the Trie class library and a console application to test the Trie library. The console application loads a set of names (stored in names.txt in debug folder) in to the tree and provides options to run Depth First & Breadth First algorithms. The application also provides options for Directory Look-Up and Find functionalities.

The class library can be further used to develop a web based phone directory. The data can also be stored on the client (if it is too small!) and the Trie can be implemented in JavaScript.

Source Code: <trie.zip>

Happy Coding,
Madhu

No comments: