Let's discuss some important aspects of Data Structures, algorithms and concepts.

  • Data Structures: Linked Lists, Binary Trees, Tries, Stacks, Queues, Vectors / ArrayLists, Hash Tables
  • Algorithms: Breadth First Search, Depth First Search, Binary Search, Merge Sort, Quick Sort
  • Concepts: Bit Manipulation, Singleton Design Pattern, Factory Design Pattern, Memory (Stack vs Heap), Recursion, Big-O Time

Introduction to Data Structures

Data Structure is a way of collecting and organising data in such a way that we can perform operations on these data in an effective way, conidering space and time constraints

Data Structure  is about:
  • Storing a collection of objects in memory
  • Operations we can be performed on that data
  • Algorithms for those operations
  • Time and space efficiency those algorithms

Think about following scenarios which data structure is used:

  • Google Search
  • Computer Networking
  • Working or RAM
  • DNA matching

Concept of Data Types

Data type is a classification of data which tells the compiler or interpreter how the programmer intends to use the data.

Data Type of a particular variable or constant determines how many bits are used for that particular data item, and how the bits are to be interpreted.

  • Primitive Data Types
    • e.g. int, bool, etc.
  • Composite Data Types
    • Collections or groupings of multiple data items into a single entity
    • e.g. Array, Classes, etc.

Primitive Data Type

Primitive types :

  • values immediately map to machine representations
  • operations immediately map to machine instructions.

Examples: 

Data type Set of values Examples of operations
Integer integer add. substract, multiply, devide
Float float add, substract, multiply, devide
Char charecter compare
Pointers Refrence pointer Link to another cell/memory location

 



Abstract Data Types (ADT)

Data abstraction separates :

  • What you can do with data from
  • How it is represented

ADT is to separates:

  • Specification 
    • What kind of thing we're working with and what operations can be performed on it.
  • Implementation 
    • How the thing and its operations are actually implemented

An abstract data type (ADT) consists of the following:

  • collection of data
  • A set of operations on the data or subsets of the data
  • A set of axioms, or rules of behavior governing the interaction of operations

Examples:

  • Stack:
    • operations are "push an item onto the stack", "pop an item from the stack"; implementation may be as an array or linked list or whatever.
  • queue:
    • Operations are "add to the end of the queue", "delete from the beginning of the queue"; implementation may be as an array or linked list or heap.

 

Abstract Data Types (ADT)

  • Operations on objects of the type are those provided by the abstraction

  • Implementation is hidden

Two parts of ADT:

  1. The public (or external) part, which consists of:
    • the conceptual picture
      • the user's view of what the object looks like,
      • how the structure is organized
    • the conceptual operations (what the user can do to the ADT)
  2. The private (or internal )part, which consists of:
    • the representation
      • how the structure is actually stored
    • the implementation of the operations

 

Abstract Data Types are a way to encapsulate and hide the implementation of a data structure, while presenting a clean interface to other programmers.

Some more examples: 

Data type Set of values Examples of operations
Color​ three 8-bit integers get red component, brighten
Picture 2D array of colors get/set color of pixel (i, j)
String sequence of characters length, substring, compare

 


Data Organizing Principles

Data Organizing Principles-

  • Ordering:

    • Put keys into some order so that we know something about where each key is are relative to the other keys.
    • e.g. Phone books are easier to search because they are alphabetized.
    • Data Structure:
  • Linking:

    • Add pointers to each record so that we can find related records quickly.
    • e.g. The index in the back of the book provides links from words to the pages on which they appear.
    • Data Structure:
  • Partitioning:

    • Divide the records into 2 or more groups, each group sharing a particular property.
    • e.g. Multi-volume encyclopedias (Aa-Be, W-Z), Folders on your hard drive 
    • Data Structure:

Array

  • Array is a data structure used to store homogeneous elements at contiguous locations.
  • The size of an array must be provided before storing data.

Arrays have following advantages:

  1. Random access is possible.
  2. Required fixed memory.

Arrays have following limitations:

  1. The size of the arrays is fixed: So we must know the upper limit on the number of elements in advance. Also, generally, the allocated memory is equal to the upper limit irrespective of the usage.
  2. Deletion and Insertion are expensive. Inserting a new element in an array of elements is expensive because the room has to be created for the new elements and to create room existing elements have to shift.
    • e.g. For example, in a system if we maintain a sorted list of IDs in an array arrayExample[] = [100, 103, 105, 200, 204].
      • Insertion: if we want to insert a new value 102, then to maintain the sorted order, we have to move all the elements after 100.

      • Deletion: To delete 103 in arrayExample[], everything after 101 has to be moved.

Complexity of Array Data Structure

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

 Array

Θ(1) Θ(n) Θ(n) Θ(n) O(1) O(n)  O(n) O(n) O(n)

 

Explanation:

Let the size of the array be 'n'.
Accessing Time:

O(1)  - This is possible because elements are stored at contiguous locations

 
Search Time:  

O(n) for Sequential Search
O(log n) for Binary Search if Array is sorted


Insertion Time:

O(n) - The worst case occurs when insertion happens at the Beginning of an array and requires shifting all of the elements.


Deletion Time:

O(n) -  The worst case occurs when deletion happens at the Beginning of an array and requires shifting all of the elements

Arrays as an ADT:

  • String
    • Array of characters
  • Stack
    • Array of data type

Array Representation

  1. Using One-Dimensional Array

    • e.g. String - Array of characters

  2. Using Two-Dimensional Array

Picture source = new Picture(args[0]);

int width = source.width();

int height = source.height();

Picture target = new Picture(width, height);

for (int i = 0; i < width; i++)

for (int j = 0; j < height; j++)

target.set(i, height-j-1, source.get(i, j));

target.show();

 


Array Operations

Operations on Array:

  • Traversing an array.
  • Insert
    • Append a new node (to the end) of array
    • Prepend a new node (to the beginning) of the array
    • Inserting a new node to a specific position on the array
  • Deleting a node from the array
  • Updating a node in the array

Linked List

A linked list is a linear data structure (like arrays) where each element is a separate object. Each element (that is the node) of a list is comprising of two items – the data and a reference to the next node.

Why do we need Linked List:

  • Static arrays are structures whose size is fixed at compile time and therefore cannot be extended or reduced to fit the data set.
  • A dynamic array can be extended by doubling the size but there is overhead associated with the operation of copying old data and freeing the memory associated with the old data structure.
  • One potential problem of using arrays for storing data is that arrays require a contiguous block of memory which may not be available if the requested contiguous block is too large.
  • However, the advantages of using arrays are that each element in the array can be accessed very efficiently using an index.
  • However, for applications that can be better managed without using contiguous memory using Linked lists.

Types of Linked List:

  1. Singly Linked List:
    • In this type of linked list, every node stores address or reference of next node in the list and the last node has next address or reference as NULL.
  2. Doubly Linked List:
    • In this type of Linked list, there are two references associated with each node, One of the reference points to the next node and one to the previous node.
    • The advantage of this data structure is that we can traverse in both the directions and for deletion we don’t need to have explicit access to the previous node. 
  3. Circular Linked List:
    • Circular linked list is a linked list where all nodes are connected to form a circle. There is no NULL at the end.
    • A circular linked list can be a single circular linked list or doubly circular linked list. The advantage of this data structure is that any node can be made as starting node.
    • This is useful in the implementation of the circular queue in linked list.
  4. Multi-Linked List: 
    • It is a more general linked list with multiple links from nodes. 
    • for example, in storing and organizing a musical song database that can be iterated/viewed/played in chronological (insertion) order, in alphabetical (sorted) order, and in random order (a.k.a. “shuffle” mode).

Advantages over arrays:

  • Dynamic size
  • Ease of insertion/deletion

Drawbacks over arrays:

  • Random access is not allowed. We have to access elements sequentially starting from the first node. So we cannot do a binary search with linked lists.
  • The array can be accessed very efficiently using an index, as it requires a contiguous block of memory.
  • Extra memory space for a pointer is required with each element of the list.

The complexity of Linked List:

 

Time Complexit

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

Singly-Linked List

Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)

Doubly-Linked List

Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)

 

Explanation:

  • Accessing time of an element: O(n)
  • Search time of an element: O(n)
  • Insertion of an element: O(1) If we are at the position where we have to insert an element.
  • Deletion of an element: O(1) If we know the address of node previous the node to be deleted.

Linked List Representation:

 

FYI - 


Linked Lists Operations

Operations on Linked Lists:

  • Traversing a linked list.
  • Insert
    • Append a new node (to the end) of a list
    • Prepend a new node (to the beginning) of the list
    • Inserting a new node to a specific position on the list
  • Deleting a node from the list
  • Updating a node in the list

 


Stack

A stack or LIFO (last in, first out) is an abstract data type that serves as a collection of elements.

It has following principal operations:

  1. Push- which adds an element of the collection.
  2. Pop- which removes the last element that was added. In the stack, both the operations of push and pop takes place at the same end that is top of the stack.
  3. Peek: Get the topmost item.

A stack is a sequence of elements such that each new element is added (pushed) onto one end, called the top of the stack, and an element is removed (popped) from the same end. Also, known as a FILO (First In Last Out) buffer.

There are two ways to implement a stack:

The complexity of Stack:

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

Stack

Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)

 

Examples:

  • Application
    • Undo operation,
    • Back feature in web browser
    • Backtracking,
    • Knight tour problem,
    • Rat in a maze,
    • N-queen problem
    • Sudoku 
  • Algorithm
    • Tower of Hanoi,
    • Tree traversals,
    • Stock span problem,
    • Histogram problem
    • Depth First Search

Stack Operations

Stack Operations :

Following primitive operations are to be supported:

  1. CreateStack()

  1. DestroyStack()

  1. Push(element)

  1. Pop(element)

The following useful support operations are also going to be supported:

  1. Size() - returns the number of items in the stack

  1. IsEmpty() - returns true if the stack is empty, otherwise false

  1. Display() - Gives a picture of the stack contents on the screen


Stack as an ADT

Collection: elements of some proper type T
Operations:

  • void push (T t)
  • void pop ()
  • T top ()
  • bool empty ()
  • unsigned int size ()
  • constructor and destructor

Axioms: (for any stack S)

  1. S.size(), S.empty(), S.push(t) are always defined
  2. S.pop() and S.top() are defined iff S.empty() is false
  3. S.empty(), S.size(), S.top() do not change S
  4. S.empty() is true iff 0 = S.size()
  5. S.push(t) followed by S.pop() leaves S unchanged
  6. after S.push(t), S.top() returns t
  7. S.push(t) increases S.size() by 1
  8. S.pop() decreases S.size() by 1



Arithmetic expressions

An arithmetic expression is an expression using additions +, subtractions -, multiplications *, divisions /, and exponentials **

An arithmetic expression is an expression that results in a numeric value.

Types of numeric values

  1. Integers (whole numbers), and 
  2. Real or floating point numbers (numbers containing a decimal point). 

An arithmetic expression is a syntactically correct combination of numbers, operators, parenthesis, and variables.

Examples:

  • 1 + 3 is 4
  • 1.23 - 0.45 is 0.78
  • 3 * 8 is 24
  • 6.5/1.25 is 5.2
  • 8.4/4.2 is 2.0 rather than 2, since the result must be of Real type.
  • -5**2 is -25
  • 12/4 is 3
  • 13/4 is 3 rather than 3.25. Since 13/4 is a single mode arithmetic expression and since all of its operands are of INTEGER type, the result must also be of INTEGER type. The computer will truncate the mathematical result (3.25) making it an integer. Therefore, the result is 3.
  • 3/5 is 0 rather than 0.6.

Operator Priority:

Operator

Meaning

Priority

() Parenthetical groups 0

^

power 1

-

negation 2

/

devide 3

*

multiply 3

+

addition 4

-

subtraction 4

 

If operators have same priority do them left to right.

Infix, Prefix and Postfix Expressions:

Examples:

Infix Expression
Prefix Expression
Postfix Expression
A + B + A B A B +
A + B * C + A * B C A B C * +
(A + B) * C * + A B C A B + C *
A + B * C + D + + A * B C D A B C * + D +
(A + B) * (C + D) * + A B + C D A B + C D + *
A * B + C * D + * A B * C D A B * C D * +
A + B + C + D + + + A B C D A B + C + D +

 

 

 


Converting an expression from infix to postfix

Infix, Prefix and Postfix Expressions:

Examples:
Infix Expression
Prefix Expression
Postfix Expression
A + B + A B A B +
A + B * C + A * B C A B C * +
(A + B) * C * + A B C A B + C *
A + B * C + D + + A * B C D A B C * + D +
(A + B) * (C + D) * + A B + C D A B + C D + *
A * B + C * D + * A B * C D A B * C D * +
A + B + C + D + + + A B C D A B + C + D +

Algorithm for converting an expression from infix to postfix:

  1. Create an empty stack called myStack for keeping operators. Create an empty list for output.
  2. Convert the input infix string to a list by using the string method split.
  3. Scan the token list from left to right.
    1. If the token is an operand, append it to the end of the output list.
    2. If the token is a left parenthesis, push it on the myStack.
    3. If the token is a right parenthesis, pop the myStack until the corresponding left parenthesis is removed. Append each operator to the end of the output list.
    4. If the token is an operator, *, /, +, or -, push it on the myStack. However, first remove any operators already on the myStack that have higher or equal precedence and append them to the output list.
  4. When the input expression has been completely processed, check the myStack. Any operators still on the stack can be removed and appended to the end of the output list.

 


Evaluating Postfix expressions

Infix, Prefix and Postfix Expressions:

Examples:
Infix Expression
Prefix Expression
Postfix Expression
A + B + A B A B +
A + B * C + A * B C A B C * +
(A + B) * C * + A B C A B + C *
A + B * C + D + + A * B C D A B C * + D +
(A + B) * (C + D) * + A B + C D A B + C D + *
A * B + C * D + * A B * C D A B * C D * +
A + B + C + D + + + A B C D A B + C + D +

Algorithm for evaluating postfix expression:

  1. Create an empty stack called myStack.
  2. Convert the string to a list by using the string method split.
  3. Scan the token list from left to right.
    1. If the token is an operand, convert it from a string to an integer and push the value onto the myStack.
    2. If the token is an operator, *, /, +, or -, it will need two operands. Pop the myStack twice. The first pop is the second operand and the second pop is the first operand. Perform the arithmetic operation. Push the result back on the myStack.
  4. When the input expression has been completely processed, the result is on the stack. Pop the myStack and return the value.

Queue

A queue or FIFO (first in, first out) is an abstract data type that serves as a collection of elements.

It has following principal operations:

  1. Enqueue - the process of adding an element to the collection.(The element is added from the rear side
  2. Dequeue - the process of removing the first element that was added. (The element is removed from the front side).
  3. Front - Get the front item from queue.
  4. Rear - Get the last item from queue.

A queue is a sequence of elements such that each new element is added (enqueued) to one end, called the back of the queue, and an element is removed (dequeued) from the other end, called the front. Also known as a FIFO (First In First Out) buffer.

There are two ways to implement a queue:

Queue will be useful in following scenarios-

  • When a resource is shared among multiple consumers
    • e.g. Include CPU scheduling, Disk Scheduling, etc.
  • When data is transferred asynchronously between two processes. 
    • e.g. Include IO Buffers, pipes, file IO, etc.

The complexity of Queue:

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

Queue

Θ(n) Θ(n) Θ(1) Θ(1) O(n) O(n) O(1) O(1) O(n)

 

Examples:

  • Applications:
    • CPU Scheduling
    • Disk Scheduling
  • Algorithm;
    • Breadth First Search

 

 


Queues as an ADT

Collection:

  • Elements of some proper type T


Operations:

  • void push (T t)
  • void pop ()
  • T front ()
  • bool empty ()
  • unsigned int size ()
  • (plus constructor and destructor)

Axioms: (for any Queue Q)

  • Q.size(), Q.empty(), Q.push(t) are always defined
  • Q.pop() and Q.front() are defined iff Q.empty() is false
  • Q.empty(), Q.size(), Q.front() do not change Q
  • Q.empty() is true iff 0 = Q.size()
  • Suppose n = Q.size() and the next element pushed onto Q is t;
  • then, after n elements have been popped from Q, t = Q.front()
  • Q.push(t) increases Q.size() by 1
  • Q.pop() decreases Q.size() by 1
  • If t = Q.front() then Q.pop() removes t from Q


Circular Queue

A circular queue follows the basic rules of a queue data structure. It has the first in first out mechanism, but its size is restricted. We just need to keep track of two terms to understand, implement, and use a circular queue; they are:

  1. Front position (from where you can dequeue)
  2. Rear position (the last element's position in the queue)


Operations On Queues

Queue primitive operations are to be supported:

  • enqueue(o):
    • Insert o at rear of queue – Input: Object; Output: None
  • dequeue():
    • Remove object at front; error if empty – Input: None; Output: Object removed

Queue support operations are to be supported:

  • size():
    • Return number of objects in queue – Input: None; Output: Integer
  • isEmpty():
    • Return a boolean indicating queue empty – Input: None; Output: Boolean
  • first():
    • Return object at front without removing; error if empty – Input: None; Output: Object

Priority Queue

 

A priority queue can be implemented using many of the data structures that we've already studied (an array, a linked list, or a binary search tree (BST)). However, those data structures do not provide the most efficient operations. To make all of the operations very efficient, we'll use a new data structure called a heap.

Implementing priority queues using heaps

A priority queue is a data structure that stores priorities (comparable values) and perhaps associated information. A priority queue supports inserting new priorities and removing/returning the highest priority. When a priority queue is implemented using a heap, the worst-case times for both insert and removeMax are logarithmic in the number of values in the priority queue.

Examples:

  • Djikstra’s Algorithm
    • One of the fastest algorithms for solving this problem has a runtime of O(E*V*Log(V)), where E is the number of road segments, and V is the number of intersections.
    • The algorithm would take about 2 seconds to find the shortest path in a city with 10,000 intersections, and 20,000 road segments (there are usually about 2 road segments per intersection). 

Deque

Deque stands for double-ended queue. In a deque values can be inserted at either the front or the back, and similarly the deck allows values to be removed from either the front or the back.


Recursion

Recursion is the repeated application of a procedure or function or definition, with one base case/termination case.

Examples of Recursion:

  • Application
    • Factorial function
    • Check if word is palindrome
    • Computing powers of a number
    • The Sierpinksi gasket
    • Counting employees under
    • The 8-Queens problem
    • Hierarchies
      • A tree traversal algorithm
    • Networks
      • Google Map
    • Graphs
      • Business’s organization chart
    • A minesweeper game
  • Algorithm
    • Euclid's algorithm (GreteGreateston divisor)
    • Towers of Hanoi

Factorial Function

Factorial Function: The product of the integers 1 through given number.  

Represented using '!' factorial of 3 will represented as 3!. 

For example, 3! equals 1*2*3 (i.e 6)

function factorial(n)
{
   if (n<=1)
      return 1;
   else
      return n * factorial(n-1);
}

 

How factorial/recursion works: 

e.g.

3! = 3*2!  
  = 3*2*1!  (base case reached)  
  = 3*2*1  (recursion is terminated)  

 

 


Tree

 

The tree with no nodes is called the null or empty tree.

A tree that is not empty consists of a root node and potentially many levels of additional nodes that form a hierarchy.

Tree: undirected, connected graph with no cycles

Examples:

  • The file system on the computer where we want to sort files by folder structure or size in the same folder.
  • Manipulate hierarchical data.
  • Make information easy to search
  • Manipulate sorted lists of data.
  • Multi-stage decision-making
  • Authorization (e.g. Admin--> All--> RW--> R)
  • Javascript document object model

 

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

Binary Search Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)

Cartesian Tree

N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(n)  O(n)  O(n)  O(n)

B-Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Red-Black Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Splay Tree

N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(log(n)) O(log(n)) O(log(n)) O(n)

AVL Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

KD Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)

Basic Terminology Of Trees

 

Trees are hierarchical data structures, its abstract data type, made up of nodes or vertices and edges without having any cycle.

Tree Terminologies: 

  • Root
    • The node at the top of the tree is called root.
  • Parent
    • Any node that has one edge upward to a node
  • Child
    • The node below a given node connected by its edge downward
  • Leaf
    • The node which does not have any child node
  • Level
    • The level of a node represents the generation of a node. If the root node is at level 0, then its next child node is at level 1, its grandchild is at level 2, and so on.
  • Height
    • Height is considered as is the maximum number of nodes on the root to leaf path
  • SubTree
    • Subtree represents the descendants of a node- which has at lease one parent and child relationship.

 


Binary Tree

A binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.

https://upload.wikimedia.org/wikipedia/commons/thumb/f/f7/Binary_tree.svg/300px-Binary_tree.svg.png

It is implemented mainly using Links.

Binary Tree Representation:

 A tree is represented by a pointer to the topmost node in the tree. If the tree is empty, then the value of root is NULL. A Binary Tree node contains following parts.

  1. Data
  2. Pointer to left child
  3. Pointer to right child

A Binary Tree can be traversed in two ways:

  • Depth First Traversal: It is stack implementation. A Left-first walk around a tree starts at the root and walks alongside the edges keeping the edges always on the left, until returning to the root after visiting all the edges.
    • Inorder Traversal (Left-Root-Right) - Record each leaf as soon as you see it and each internal vertex the second time you see it.
      • Push root onto stack
      • While stack is not empty
        • v := top(stack)
        • While v has a left-child
          • Push leftchild(v) onto stack
          • v:= leftchild(v)
        • v := pop(stack)
        • List v
        • If v has a right-child
          • Push rightchild(v) onto stack
          • v := rightchild(v)
        • Else
          • While stack not empty and v has no right-child
            • v := pop(stack)
            • List v
          • If v has a right-child
            • Push rightchild(v) onto stack
            • v := rightchild(v)
    • Preorder Traversal (Root-Left-Right) - Record each vertex the first time it is seen (and not again).
      • Push root onto stack.
      • While stack is not empty:
        • Pop a vertex off stack, and write it on output list.
        • Push its children, right to left onto stack.
    • Postorder Traversal (Left-Right-Root) - Record each vertex the last time it is seen (and not before).
      • Push root onto stack
      • While stack is not empty
        • If top(stack) is unmarked, mark it and push its children right to left onto stack,
        • Else, pop stack to output list.
  • Breadth First Traversal: It is Queue implementation.
    • Level Order Traversal - The level-order of an ordered tree is a listing of the vertices in increasing order of depth, such that the vertices of equal depth are listed according to their prescribed order.
      • Enqueue root.
      • While queue is not empty:
        • Dequeue a vertex and write it on the output list.
        • Enqueue its children left to right.

Binary Tree Properties:

  • The maximum number of nodes at level ‘l’ = 2l-1.
  • A maximum number of nodes with height 'h'= 2h – 1
    • Height is considered as is the maximum number of nodes on the root to leaf path
  • Minimum possible height =  ceil(Log2(n+1))   
  • A number of leaf nodes are always one more than nodes with two children.

Examples:

  • Decision tree with Yes/No action

 

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

Binary Search Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)

Cartesian Tree

N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(n)  O(n)  O(n)  O(n)

B-Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Red-Black Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Splay Tree

N/A Θ(log(n)) Θ(log(n)) Θ(log(n)) N/A O(log(n)) O(log(n)) O(log(n)) O(n)

AVL Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

KD Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(n) O(n) O(n) O(n) O(n)

Binary tree representation as an array and Linked list

  • Array vs Linked List Representation of Binary Tree

    • Arrays cannot represent arbitrarily-shaped binary trees, only complete trees. A complete binary tree is one in which all levels are full, OR all levels except the deepest level are full and the deepest level has all of its nodes as far to the left as possible. (You can imagine that the levels are filled with nodes from left to right, and one level has to be filled before the next level can begin.)
    • Heaps are, by definition, complete binary trees - hence the array implementation is used due to its superior memory efficiency. On the other hand, binary search trees that must support insertion and removal at arbitrary locations (and thus may not be complete trees) cannot use the array implementation.

 

 


Application of Trees

  • Decision Making

    • Next Move in games
    • Computer chess games build a huge tree (training) which they prune at runtime using heuristics to reach an optimal move.
  • Networking

    • Router algorithms -Network Routing, where the next path/route of the packet is determined
    • Social networking is the current buzzword in CS research. It goes without saying that connections/relations are very naturally modeled using graphs. Often, trees are used to represent/identify more interesting phenomena.
  • Representation

    • Chemical formulas representation
    • XML/Markup parsers use trees
    • Producers/consumers often use a balanced tree implementation to store a document in memory.
  • Manipulate Hierarchical Data

    • Make information easy to search
    • Manipulate sorted lists of data.
  • Workflow

    • As a workflow for compositing digital images for visual effects.
  • Organizing Things

    • Folders/ files in the Operating system
    • HTML Document Object Model (DOM)
    • Company Organisation Structures
    • PDF is a tree-based format. It has a root node followed by a catalog node followed by a pages node which has several child page nodes.
  • Faster Lookup

    • Auto-correct applications and spell checker
    • Syntax Tree in Compiler
  • Task Tracker

    • Undo function in a text editor
  • Encoding

  • Other applications of Binary tree:

    • Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
    • Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
    • Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
    • Hash Trees - used in p2p programs and specialized image signatures in which a hash needs to be verified, but the whole file is not available.
    • Heaps - Used in implementing efficient priority queues, which in turn are used for scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including robotics and video games). Also used in heapsort.
    • Huffman Coding Tree (Chip Uni) - used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats.
    • GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
    • Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
    • Treap - Randomized data structure used in wireless networking and memory allocation.
    • T-tree - Though most databases use some form of B-tree to store data on the drive, databases which keep all (most) their data in memory often use T-trees to do so.

 


Threaded Binary Tree

What is Threaded Binary Tree?

It is a binary tree where leaf node is threaded towards both the in-order pre­de­ces­sor or  suc­ces­sor (left or right) means all right null point­ers will point to inorder suc­ces­sor or all left null point­ers will point to inorder predecessor.

A binary tree is threaded by making all right child pointers that would normally be null point to the inorder successor of the node (if it exists), and all left child pointers that would normally be null point to the inorder predecessor of the node

 

Why do we need Threaded Binary Tree?

  • Binary trees have a lot of wasted space: the leaf nodes each have 2 null point­ers. We can use these point­ers to help us in inorder traversals.
  • Threaded binary tree makes the tree tra­ver­sal faster since we do not need stack or recur­sion for traversal

ref: https://upload.wikimedia.org/wikipedia/commons/7/7a/Threaded_tree.svg

Types of threaded binary trees:

  • Sin­gle Threaded:
    • Each leaf node is threaded towards either the in-order pre­de­ces­sor or suc­ces­sor (left or right) means all right null point­ers will point to an inorder suc­ces­sor OR all left null point­ers will point to an inorder predecessor.

  • Dou­ble threaded: 
    • Each leaf node is threaded towards both the in-order pre­de­ces­sor and suc­ces­sor (left and right) means all right null point­ers will point to an inorder suc­ces­sor AND all left null point­ers will point to an inorder predecessor.

 


AVL trees (Russian inventors Adelson-Velskii and Landis)

Balanced tree -A tree where no leaf is much farther away from the root than any other leaf.

Different balancing schemes allow different definitions of "much farther" and different amounts of work to keep them balanced. 

Height Balanced Tree aka AVL Tree  (Inventors G.M. Adel’son-Vel’skii and E.M. Landis), are self-balancing binary search trees.

An AVL tree is one that requires heights of left and right children of every node to differ by at most ±1.

AVL Tree (Height Balanced Tree) - A tree whose subtrees differ in height by no more than one and the subtrees are height-balanced, too.

An empty tree is height-balanced.

 

Time Complexity

Space Complexity

 

Average

Worst

Worst

Data Structure 

 Access 
 Search 
 Insertion 
 Deletion 
 Access 
 Search 
 Insertion 
 Deletion 
 

AVL Tree

Θ(log(n)) Θ(log(n)) Θ(log(n)) Θ(log(n)) O(log(n)) O(log(n)) O(log(n)) O(log(n)) O(n)

Graph

Graph is a data structure that consists of following two components:

  1. A finite set of vertices also called as nodes.
  2. A finite set of ordered pair of the form (u, v) called as an edge.

 

  • The pair is ordered because (u, v) is not same as (v, u) in the case of a directed graph(di-graph).
  • The pair of the form (u, v) indicates that there is an edge from vertex u to vertex v. The edges may contain weight/value/cost.
    • V -> Number of Vertices.
    • E -> Number of Edges.

 

Graph Classification:

The graph can be classified on the basis of many things, below are the two most common classifications:

  1. Direction Graph:
    • Undirected Graph: The graph in which all the edges are bidirectional.
    • Directed Graph: The graph in which all the edges are unidirectional.
  2. Weight Graph:
    • Weighted Graph: The Graph in which weight is associated with the edges.
    • Unweighted Graph: The Graph in which there is no weight associated with the edges.

 

Time Complexities:

  • Time Complexities in a case of Adjacency Matrix:
    • Traversal :(By BFS or DFS) O(V^2)
    • Space: O(V^2)
  • Time Complexities in the case of Adjacency List:
    • Traversal :(By BFS or DFS) O(ElogV)
    • Space: O(V+E)

Examples :

  • Google map
  • Represent network
    • Telephone
    • Social Networking sites

Representations of Graph:

  1. Adjacency Matrix
    • It is a 2D array of size V x V where V is the number of vertices in a graph.
  2. Adjacency List
    • Implemented using array of linked list.
    • Size of the array is equal to number of vertices
    • Used to represent a weighted graph
      • Weights of edges can be stored in nodes of linked lists
  3. Incidence Matrix
  4. Incidence Lis

 

 


Basic Terminology of Graphs

  • Define a Graph:

    • G = (V, E) by defining a pair of sets:
      1. V = a set of vertices
      2. E = a set of edges
  • Edges:

    • Each edge is defined by a pair of vertices
    • An edge connects the vertices that define it
    • In some cases, the vertices can be the same
  • Vertices:

    • Vertices also called nodes
    • Denote vertices with labels
  • Representation:

    • Represent vertices with circles, perhaps containing a label
    • Represent edges with lines between circles
  • Example:

    • V = {A,B,C,D}
    • E = {(A,B),(A,C),(A,D),(B,D),(C,D)}
  • Path:

    • sequence of vertices in which each pair of successive vertices is connected by an edge
  • Cycle:

    • a path that starts and ends on the same vertex
  • Simple path:

    • a path that does not cross itself
      • That is, no vertex is repeated (except first and last)
      • Simple paths cannot contain cycles
  • The length of a path:

    • Number of edges in the path
      • Sometimes the sum of the weights of the edges
  • The degree of a Node:

    • The degree of a node is the number of edges the node is used to define
    • In the example above:
      • Degree 2: B and C
      • Degree 3: A and D
      • A and D have odd degree, and B and C have even degree
    • Can also define in-degree and out-degree
      • In-degree: Number of edges pointing to a node
      • Out-degree: Number of edges pointing from a node


Sorting

Time Complexity comparison of Sorting Algorithms


 

Algorithm Time Complexity
  Best Average Worst
Quick Sort O(n log(n)) O(n log(n)) O(n^2)
Mergesort O(n log(n)) O(n log(n)) O(n log(n))
Heapsort O(n log(n)) O(n log(n)) O(n log(n))
Bubble Sort O(n) O(n^2) O(n^2)
Insertion Sort O(n) O(n^2) O(n^2)
Select Sort O(n^2) O(n^2) O(n^2)
Bucket Sort O(n+k) O(n+k) O(n^2)
Radix Sort O(nk) O(nk) O(nk)

 


Classification of sorting techniques

  • Internal sorting - does not require external memory

    • Bucket sort
    • Bubble sort
    • Insertion sort
    • Selection sort
    • Heapsort
    • Mergesort
  • External Sorting - requires external memory

 


Bubble Sort

Input:

Sequence <A1', A2', ...An'> of numbers

Output:

Permutation of numbers <A1',A2',...An'>

Description:
  • Swipe first two elements if the 1st element is greater
  • Swipe 2nd and 3rd element if 2nd element is greater
  • Repeat same until array is sorted
Running Time:

Space: O(1)

Worst Case:

O(n2)

Average Case:

O(n2)


Quick Sort

 
 
Worst Case:

O(n2)

Average Case:

O(n log(n))


Insertion Sort Algorithm

Input:

Sequence <A1', A2', ...An'> of numbers

Output:

Permutation of numbers <A1',A2',...An'>

Pseudocode:

Insertion Sort(A1n) //Sorts A[1...n]

FOR j <-- 2 to n

DO key <-- A[j]

i<-- j-1

WHILE i>0 AND A[i]> key

DO A[i+1] <--> A[i]

A[i] <-- key

i <-- i-1

 

Running Time:
  • Depends on input
    • If input reverse sorted running time will be more
    • If input is already sorted running time will be minimum
  • Depends on input size.
    • Fast for small input size
    • Slow for input size is very large
  • Depends on computer
    • relative speed (same system)
    • absolute speed (different system)
Worst Case:

Scenario: When the input is reverse sorted.

T(n) = Ө(n2)


Merge Sort Algorithm

Input:

Sequence <A1', A2', ...An'> of numbers

Output:

Permutation of numbers <A1',A2',...An'>

Pseudocode:

Merge Sort (Recursive sorting)

  1. IF n=1, DONE //already sorted
  2. Recursively sort 
    1. A[1,...(n/2)]              AND
    2. A[(n/2)+1,...n]
  3. Merge two sorted list (Subroutine)

 

Description:

Merge sort is a recursive algorithm. It works as follows.

  1. Divides the array into two part and then merge them back. 
  2. Each half will again perform #1 (until there are only two elements remaining)

 

Running Time:

O(n)

Worst Case:

O(n log(n))

Avarage Case:

O(n log(n))