COMP202 - Data Structures Study Guide
A concise, exam-oriented reference covering every data type from the course up to (but not including) Graphs. For each: definition, operations, Java implementation, complexity, use cases, and trade-offs.
1. Arrays
Definition
An array is the most fundamental data structure in computer science: a fixed-size, contiguous block of memory that stores elements of the same type, indexed from 0 to n-1. "Contiguous" means the elements are stored in consecutive memory addresses - the address of element i is computed as base_address + i × element_size. This formula is why random access (jumping directly to any index) takes constant time: it is a single arithmetic calculation followed by a single memory access, regardless of the array's size.
Because the size of an array is fixed at creation time, it cannot grow or shrink. This is both its greatest strength (predictable memory layout) and its primary limitation (inflexibility).
Key Operations
- access(i) - retrieve
A[i]directly via address arithmetic. - set(i, x) - write
xintoA[i]directly. - search(x) - scan linearly for target
x; no shortcut without sorting. - insert(i, x) - place
xat indexi, shifting elements at indicesithroughn-1one position to the right. - remove(i) - delete
A[i], shifting elements at indicesi+1throughn-1one position to the left.
Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| access / set | O(1) | Address arithmetic: base + i × size is a single calculation regardless of n. |
| search | O(n) | Without any ordering guarantee, every element may need to be checked. |
| insert at end (if capacity) | O(1) | Simply write to A[n] - no shifting needed. |
insert/remove at arbitrary i |
O(n) | In the worst case (inserting at index 0), all n elements must be shifted by one position. On average, ~n/2 elements are shifted. |
Space: O(n) - the array itself occupies memory proportional to its capacity, with no additional per-element overhead.
Use Cases
Lookup tables, matrix representations, buffers, foundation for ArrayList/Heap/Hashtable.
Advantages
Fastest random access, cache-friendly (contiguous memory layout means sequential elements are loaded into CPU cache together), minimal memory overhead (no pointers or metadata per element).
Drawbacks
Fixed size (must know capacity at creation), expensive insertions/removals at arbitrary positions (shifting), wasted space if over-allocated.
Special Concept - Growable Arrays
Two resizing strategies when full:
- Incremental (
+c): resize by a constant → O(n²) total for n pushes. Each resize copies all existing elements, and over n insertions, you resize ~n/c times, each costing O(n) - summing to O(n²). - Doubling (
×2): geometric growth → O(n) total, amortized O(1) per push. The total copy work across all resizes is1 + 2 + 4 + ... + n ≈ 2n, which is O(n). This is whyArrayListuses doubling.
2. Singly Linked Lists
Singly Linked Lists Definition
A singly linked list is a linear data structure composed of a chain of nodes, where each node stores two things: an element (the data) and a reference (next pointer) to the subsequent node. The list is accessed exclusively through a head reference pointing to the first node; the last node's next is null, signaling the end of the chain.
Unlike arrays, nodes are scattered throughout memory - they are not contiguous. This means there is no address arithmetic to jump to the i-th element; you must follow the chain of next pointers from the head, one hop at a time. However, this scattered allocation means the list can grow and shrink dynamically by simply creating or discarding individual nodes - no resizing or copying required.
head → [A|•] → [B|•] → [C|•] → null
Singly Linked Lists Key Operations
- addFirst(e) - create a new node pointing to the current head, then update head to the new node.
- addLast(e) - traverse from head to the last node, then attach a new node after it.
- removeFirst() - save head's element, advance head to
head.next, return the saved element. - removeLast() - requires traversing to the second-to-last node (since there's no backward pointer), then setting its
nexttonull. - size / isEmpty / traversal - walk the chain via
nextpointers.
Java Implementation
private static class Node<E> {
E element;
Node<E> next;
Node(E e, Node<E> n) { element = e; next = n; }
}
private Node<E> head = null;
private int size = 0;
public void addFirst(E e) {
head = new Node<>(e, head);
size++;
}
public E removeFirst() {
if (head == null) return null;
E ans = head.element;
head = head.next;
size--;
return ans;
}
public void addLast(E e) {
if (head == null) { addFirst(e); return; }
Node<E> walk = head;
while (walk.next != null) walk = walk.next;
walk.next = new Node<>(e, null);
size++;
}
Singly Linked Lists Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| addFirst / removeFirst | O(1) | Only the head pointer is updated - no traversal needed. |
| addLast (without tail pointer) | O(n) | Must walk through all n nodes to find the last one. |
| addLast (with tail pointer) | O(1) | A maintained tail reference gives direct access to the last node. |
| removeLast | O(n) | Even with a tail pointer, you must find the predecessor of the tail by traversing from head, because there is no backward pointer. |
| search / access by index | O(n) | No random access - must follow next pointers from head sequentially. |
Space: O(n) - each node stores one extra pointer (next), so total memory is n × (element_size + pointer_size).
Singly Linked Lists Use Cases
Stack/Queue implementation, undo chains, adjacency lists.
Advantages vs Arrays
Dynamic size (grows/shrinks by individual nodes), O(1) head insert/remove (no shifting), no wasted capacity.
Singly Linked Lists Drawbacks
No random access (must traverse sequentially), cache-unfriendly (nodes are scattered in memory), removeLast is O(n) because there is no backward pointer to find the predecessor, extra memory for one pointer per node.
Special Concepts
- Tail pointer enables O(1)
addLast, butremoveLaststill needs O(n) because after removing the tail, you need to updatetailto its predecessor - and there's noprevpointer to find it. - Serves as the backbone for stack (push/pop at head) and queue (enqueue at tail, dequeue at head) linked implementations.
3. Doubly Linked Lists
Doubly Linked Lists Definition
A doubly linked list extends the singly linked list by giving each node two pointers: prev (pointing to the preceding node) and next (pointing to the following node). This bidirectional linkage means you can traverse the list in both directions and, critically, you can remove any node in O(1) time if you already have a reference to it - something impossible in a singly linked list where you need to find the predecessor first.
In practice, doubly linked lists use header and trailer sentinel nodes - dummy nodes that bracket the real data. The header's next points to the first real element, and the trailer's prev points to the last. The sentinels themselves hold no data; they exist solely to eliminate null-checking edge cases at the boundaries of the list. With sentinels, every real node is guaranteed to have non-null prev and next, so insertions and deletions never need special logic for "is this the first or last node?"
header ⇄ [A] ⇄ [B] ⇄ [C] ⇄ trailer
Key Operations (all O(1) given a node reference)
- addFirst / addLast / addBetween / addBefore / addAfter
- removeFirst / removeLast / remove(node)
- Traversal in both directions via
next/prev.
Doubly Linked Lists Java Implementation
private static class Node<E> {
E element; Node<E> prev, next;
Node(E e, Node<E> p, Node<E> n) { element = e; prev = p; next = n; }
}
private Node<E> header, trailer; // sentinels
// constructor: header = new Node<>(null,null,null);
// trailer = new Node<>(null,header,null);
// header.next = trailer;
private void addBetween(E e, Node<E> pred, Node<E> succ) {
Node<E> newest = new Node<>(e, pred, succ);
pred.next = newest;
succ.prev = newest;
size++;
}
private E remove(Node<E> node) {
node.prev.next = node.next;
node.next.prev = node.prev;
size--;
return node.element;
}
Doubly Linked Lists Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| addFirst / addLast | O(1) | Insert between header/trailer and the first/last node - just rewire 4 pointers. |
| addBetween / addBefore / addAfter | O(1) | Given predecessor and successor references, creating and linking a new node is pointer arithmetic only. |
| remove(node) | O(1) | With both prev and next available, you simply bypass the node: node.prev.next = node.next and node.next.prev = node.prev. |
| search / access by index | O(n) | Still no random access - must traverse from header or trailer. |
Space: O(n) with ~2 pointers overhead per node (roughly double the pointer overhead of a singly linked list).
Doubly Linked Lists Use Cases
Deques, LRU caches, text editors (cursor movement in both directions), backbone of PositionalList and java.util.LinkedList.
Advantages vs Singly Linked
O(1) removeLast (the prev pointer provides direct access to the predecessor), bidirectional traversal, simpler insertion/deletion with a known node reference.
Doubly Linked Lists Drawbacks
Higher memory overhead (two pointers per node instead of one), slightly more pointer bookkeeping on each insert/remove (4 pointer updates instead of 2).
Special Concept - Sentinels
Header/trailer dummy nodes remove "is this the first/last?" checks. Every real node is guaranteed non-null prev and next, so one addBetween method handles every insertion case uniformly. This eliminates a whole class of null-pointer bugs.
4. ArrayList (Dynamic Array)
ArrayList (Dynamic Array) Definition
An ArrayList (also called a dynamic array) implements the List ADT on top of a resizable array. It stores n elements at indices [0, n-1] in a backing array whose capacity may be larger than n. When the number of elements reaches the backing array's capacity, the ArrayList allocates a new, larger array (typically double the size), copies all existing elements into it, and discards the old array. This gives the ArrayList the random-access speed of a plain array while allowing it to grow dynamically.
The key insight is that although an individual resize operation is expensive (O(n) to copy everything), resizes happen so infrequently (capacity doubles each time) that the cost, amortized over all insertions, is O(1) per insertion.
ArrayList (Dynamic Array) Key Operations
- get(i) / set(i, e) - direct indexing into the backing array.
- add(i, e) - shift elements right to make room, possibly triggering a resize.
- add(e) - append at the end (no shifting, possible resize).
- remove(i) - shift elements left to fill the gap.
- size / isEmpty.
Java Implementation (Doubling)
private E[] data;
private int size = 0;
public void add(int i, E e) {
if (size == data.length) resize(2 * data.length); // doubling
for (int k = size - 1; k >= i; k--) data[k+1] = data[k];
data[i] = e;
size++;
}
public E remove(int i) {
E ans = data[i];
for (int k = i; k < size - 1; k++) data[k] = data[k+1];
data[--size] = null;
return ans;
}
@SuppressWarnings("unchecked")
private void resize(int cap) {
E[] temp = (E[]) new Object[cap];
for (int k = 0; k < size; k++) temp[k] = data[k];
data = temp;
}
ArrayList (Dynamic Array) Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| get / set | O(1) | Direct index into the backing array - same as a plain array. |
| add at end (doubling) | O(1) amortized | Most appends just write to data[size]. Resizes (copying all n elements) happen only when capacity is full, and because capacity doubles, the total copy cost across n insertions is O(n), giving O(1) per insertion amortized. |
| add at end (incremental +c) | O(n) amortized | Resizes happen every c insertions, each copying all existing elements. Over n insertions, total copy cost is O(n²), so per-insertion cost is O(n). |
| add(i, e) / remove(i) | O(n) | Elements after index i must be shifted right (insert) or left (remove). Worst case is index 0, shifting all n elements. |
Space: O(n) - though the backing array may have unused capacity (up to ~2× the number of stored elements with doubling).
Amortized Analysis Sketch
With doubling, across n pushes, the total copy work is 1 + 2 + 4 + ... + n = 2n − 1 = O(n), so per-push cost is O(1). With incremental growth by c, you perform n/c resizes, each copying O(n) elements, giving O(n²/c) = O(n²) total.
ArrayList (Dynamic Array) Use Cases
Default "list" when random access and appending matter more than middle-insertion. Used as java.util.ArrayList.
ArrayList (Dynamic Array) Advantages
O(1) random access; amortized O(1) append; cache-friendly (contiguous memory); simple API.
ArrayList (Dynamic Array) Drawbacks
O(n) arbitrary-position insert/remove (shifting); occasional resize cost spikes (though amortized away); some unused space from over-allocation.
5. Positional List
Positional List Definition
A Positional List is a list ADT where elements are referenced by Positions - opaque tokens that represent a location in the list - rather than by integer indices. A Position<E> is an abstract handle that remains valid even as other elements are inserted or removed elsewhere in the list; it is invalidated only when the specific element it refers to is removed.
The motivation is simple: in an index-based list, inserting an element at position 3 silently shifts every element after it - any variable holding index 5 now refers to a different element. Positions don't suffer from this: a position always refers to the same element regardless of what happens around it. This makes positions ideal for iterators, cursors in text editors, or any scenario where you need a stable, long-lived reference to a specific location.
Why Positions Over Indices?
Indices shift whenever you insert/delete anywhere before them. Positions don't - they act like pointers into the list structure. This makes them ideal for iterators, cursors in editors, and any pointer-like reference you want to keep live across structural modifications.
Key Operations (all O(1))
first(), last(), before(p), after(p), addFirst(e), addLast(e), addBefore(p, e), addAfter(p, e), set(p, e), remove(p).
Positional List Java Implementation
Built on a doubly linked list with sentinels where each internal Node implements the Position interface (so the node is the position).
public interface Position<E> { E getElement() throws IllegalStateException; }
// Node implements Position; getElement() throws if node has been removed.
private Node<E> validate(Position<E> p) {
if (!(p instanceof Node)) throw new IllegalArgumentException("Invalid p");
Node<E> node = (Node<E>) p;
if (node.next == null) throw new IllegalArgumentException("p is no longer in the list");
return node;
}
Positional List Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| first / last / before / after | O(1) | Direct pointer traversal - follow one next or prev link. |
| addBefore / addAfter / addFirst / addLast | O(1) | Built on the doubly linked list's addBetween - just pointer rewiring. |
| set(p, e) / remove(p) | O(1) | Direct access to the node via the position token - no searching needed. |
| find by value (not a standard op) | O(n) | Without an index or hash, you must traverse the list linearly. |
Space: O(n). Each node carries two pointers (prev, next) plus the element.
Positional List Use Cases
Text editors (cursor/bookmarks), stable iterators that survive structural changes, undo/redo systems, the natural replacement for indices when you need references that don't become stale.
Positional List Advantages
O(1) insert/remove anywhere given a position; positions never shift and remain valid across modifications.
Positional List Drawbacks
No random access by index (to reach the k-th element, you must traverse from the head/tail); extra pointer memory per node; cache-unfriendly due to non-contiguous node allocation.
6. Stack
Stack Definition
A Stack is a LIFO (Last-In, First-Out) collection - the most recently added element is the first one to be removed. All insertions and removals happen at a single end called the top. You can think of it like a stack of plates: you can only add or remove the plate on the very top.
The stack is one of the simplest and most widely used abstract data types. Despite its simplicity (it supports only push, pop, and peek), it is foundational to computing - function call execution itself relies on a stack (the "call stack"), and many algorithms convert recursion into an explicit stack.
Stack Key Operations (all O(1))
- push(e) - add element to the top.
- pop() - remove and return the top element (null or exception if empty).
- top() / peek() - inspect the top element without removing it.
- size / isEmpty.
Java Implementation - Array-Based
public class ArrayStack<E> {
private E[] S;
private int t = -1; // index of top
@SuppressWarnings("unchecked")
public ArrayStack(int capacity) { S = (E[]) new Object[capacity]; }
public void push(E e) {
if (t == S.length - 1) throw new IllegalStateException("Stack full");
S[++t] = e;
}
public E pop() {
if (t < 0) return null;
E e = S[t]; S[t--] = null; return e;
}
public E top() { return t < 0 ? null : S[t]; }
public int size() { return t + 1; }
public boolean isEmpty() { return t < 0; }
}
Java Implementation - Linked
public class LinkedStack<E> {
private SinglyLinkedList<E> list = new SinglyLinkedList<>(); // assumes a singly linked list
public void push(E e) { list.addFirst(e); }
public E pop() { return list.removeFirst(); }
public E top() { return list.first(); }
public int size() { return list.size(); }
public boolean isEmpty() { return list.isEmpty(); }
}
Use a singly linked list; push = addFirst, pop = removeFirst - both O(1). Grows dynamically without any capacity limit (other than available heap memory).
Stack Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| push | O(1) | Array: increment index and write. Linked: create node and update head pointer. |
| pop | O(1) | Array: read and decrement index. Linked: save head's data and advance head. |
| top / peek | O(1) | Simply read S[t] (array) or head.element (linked) - no modification. |
| size / isEmpty | O(1) | Maintained as a counter or computed from the top index. |
Space: O(n). Array version has fixed capacity; linked version uses per-node pointer overhead.
Stack Use Cases
Function call stack, undo/redo, browser back button, matching parentheses/HTML tags, postfix expression evaluation, reversing a sequence, DFS traversal (explicit stack replaces recursion).
Advantages / Drawbacks
Array: compact, cache-friendly, fixed capacity → FullStackException.
Linked: grows dynamically, no overflow (until heap exhaustion), but pointer overhead and worse cache behavior.
7. Queue
Queue Definition
A Queue is a FIFO (First-In, First-Out) collection - elements enter at the rear and leave at the front, like a line of people waiting. The first person to join the line is the first to be served.
Queues are essential for any scenario involving fairness or order-preservation: process scheduling, print spooling, breadth-first search, and producer-consumer buffering. The key design challenge is implementing FIFO behavior efficiently on a fixed-size array, which is solved by the circular array technique.
Queue Key Operations (all O(1))
- enqueue(e) - add element to the rear.
- dequeue() - remove and return the front element (null or exception if empty).
- first() / peek() - inspect the front element without removing it.
- size / isEmpty.
Java Implementation - Circular Array
The key trick: use modular arithmetic so the queue wraps around the array, avoiding the need to shift elements on every dequeue.
private E[] Q;
private int f = 0; // index of front
private int sz = 0; // number of elements
public void enqueue(E e) {
if (sz == Q.length) throw new IllegalStateException("Queue full");
int r = (f + sz) % Q.length; // rear index
Q[r] = e;
sz++;
}
public E dequeue() {
if (sz == 0) return null;
E ans = Q[f];
Q[f] = null;
f = (f + 1) % Q.length;
sz--;
return ans;
}
public E first() { return sz == 0 ? null : Q[f]; }
public int size() { return sz; }
public boolean isEmpty() { return sz == 0; }
A naïve left-to-right array would force O(n) shifting on every dequeue (to move all remaining elements towards index 0); the wrap-around avoids this entirely.
Queue Java Implementation - Linked
public class LinkedQueue<E> {
// Singly linked list with both head and tail pointers
private SinglyLinkedList<E> list = new SinglyLinkedList<>();
public void enqueue(E e) { list.addLast(e); }
public E dequeue() { return list.removeFirst(); }
public E first() { return list.first(); }
public int size() { return list.size(); }
public boolean isEmpty() { return list.isEmpty(); }
}
Singly linked list with both head and tail pointers: enqueue = append at tail (O(1) with tail pointer), dequeue = remove from head (O(1)). Note: removeFirst is used for dequeue (not removeLast), which is why a singly linked list suffices - no backward pointer needed.
Queue Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| enqueue | O(1) | Circular array: compute rear index with modular arithmetic and write. Linked: append at tail pointer. |
| dequeue | O(1) | Circular array: read and advance front index with modular arithmetic. Linked: advance head pointer. |
| first / peek | O(1) | Read Q[f] or head.element directly. |
| size / isEmpty | O(1) | Maintained as a counter. |
Space: O(n) - circular array has fixed capacity; linked list has per-node pointer overhead.
Queue Use Cases
Printer queues, CPU scheduling (Round-Robin), BFS, buffer between producer/consumer, level-order tree traversal, simulation of waiting lines.
Queue Advantages / Drawbacks
Circular array: O(1) ops, no shifting, but fixed capacity. Linked: dynamic size, extra pointer overhead.
Note - Deque (Double-Ended Queue)
Supports addFirst / addLast / removeFirst / removeLast / first / last - all O(1) on a doubly linked list. A deque generalizes both stacks and queues: you can use it as either. (The Ch06 slides focus on Stack/Queue; Deque is a natural extension.)
8. Priority Queue
Priority Queue Definition
A Priority Queue (PQ) is a collection of entries (key, value) where retrieval is governed by priority (defined by the key), not by insertion order. The key represents the priority; the value is the associated payload. The entry with the smallest key (highest priority, in a min-PQ) is always the one accessible for removal.
Unlike a stack (LIFO) or queue (FIFO), a priority queue imposes no ordering based on when elements arrived - it always serves the most "urgent" element first. This makes it the ideal structure for any scenario where tasks or events have varying importance: process scheduling, event-driven simulation, shortest-path algorithms, etc.
Priority Queue Key Operations
- insert(k, v) - add a new entry with key
kand valuev. - removeMin() - remove and return the entry with the smallest key.
- min() - inspect the minimum-key entry without removing it.
- size / isEmpty.
- Uses an
Entry<K, V>abstraction:getKey(),getValue().
Java Implementation - Unsorted List
public void insert(K k, V v) { data.addLast(new MapEntry<>(k, v)); } // O(1)
public Entry<K,V> removeMin() {
if (data.isEmpty()) return null;
Position<Entry<K,V>> bestPos = data.first();
for (Position<Entry<K,V>> p : data.positions()) {
if (comp.compare(p.getElement().getKey(), bestPos.getElement().getKey()) < 0) {
bestPos = p;
}
}
return data.remove(bestPos); // O(n)
}
Java Implementation - Sorted List
// Assuming a PositionalList (doubly-linked) backing structure
private PositionalList<Entry<K,V>> data = new LinkedPositionalList<>();
public void insert(K k, V v) {
Entry<K,V> newest = new MapEntry<>(k, v);
Position<Entry<K,V>> walk = data.last();
// Traverse backward across larger keys
while (walk != null &and comp.compare(newest.getKey(), walk.getElement().getKey()) < 0) {
walk = data.before(walk);
}
if (walk == null) data.addFirst(newest);
else data.addAfter(walk, newest);
}
public Entry<K,V> removeMin() {
if (data.isEmpty()) return null;
return data.remove(data.first()); // O(1) removal of the lowest key
}
insert walks to the correct position to maintain sorted order - O(n) because it may need to scan all existing entries. removeMin just removes the first element - O(1) because the minimum is always at the front.
Complexity Comparison
| Implementation | insert | removeMin | min | Why |
|---|---|---|---|---|
| Unsorted list | O(1) | O(n) | O(n) | Insert just appends (no ordering maintained). Finding min requires scanning every element. |
| Sorted list | O(n) | O(1) | O(1) | Insert must find the correct position to maintain order (linear scan). Min is always at the front. |
| Heap | O(log n) | O(log n) | O(1) | The heap's tree shape has height O(log n), and both insert/remove traverse at most one root-to-leaf path. Min is always at the root. |
Space O(n) in all cases.
Priority Queue Use Cases
Event-driven simulation, OS process scheduling, ER triage, Dijkstra / Prim, Huffman coding, top-k selection.
Priority Queue Advantages / Drawbacks
- Unsorted list: cheap insert, expensive min-extraction - good when insertions dominate and removals are rare.
- Sorted list: opposite trade-off - good when you frequently need the min but rarely insert.
- Heap balances both - O(log n) for each, making it the default choice for general use.
Priority Queue Special Concepts
- Comparator / Comparable. A
Comparator<K>defines a total order on keys. A total order must be antisymmetric (ifa ≤ bandb ≤ a, thena = b), transitive (ifa ≤ bandb ≤ c, thena ≤ c), and total (for anya, b, eithera ≤ borb ≤ a). These properties are required for the priority queue to behave correctly - without them, the notion of "smallest key" is undefined. - PQ-Sort: insert all
nelements, thenremoveMinntimes → sorted output.- Unsorted PQ ⇒ Selection Sort (O(n²)): each
removeMinscans the unsorted list. - Sorted PQ ⇒ Insertion Sort (O(n²)): each
insertfinds the correct sorted position. - Heap PQ ⇒ Heap Sort (O(n log n)): each operation is O(log n).
- Unsorted PQ ⇒ Selection Sort (O(n²)): each
9. Tree
Tree Definition
A Tree is a hierarchical (non-linear) data structure where elements are organized in a parent-child relationship. Unlike linear structures (arrays, linked lists), trees model one-to-many relationships: a single parent can have multiple children, but each child has exactly one parent (except the root, which has no parent).
Key terminology:
- Root: the unique top-level node with no parent. Every tree has exactly one root.
- Internal node: a node with ≥ 1 child.
- External node / leaf: a node with 0 children (the "endpoints" of the tree).
- Depth(v): the number of ancestors of node
v(equivalently, the number of edges on the path from the root tov). The root has depth 0. - Height(T): the maximum depth of any leaf in tree
T. An empty tree has height 0 (or -1, depending on convention); a single-node tree has height 0. - Subtree rooted at v: the tree consisting of
vand all its descendants.
Binary Tree: a tree where each node has at most 2 children, distinguished as left and right. This left/right distinction is what makes binary trees "ordered" - it matters which child is which.
Binary Search Tree (BST): a binary tree with the BST property: for every node v, all keys in the left subtree are less than key(v), and all keys in the right subtree are greater than key(v). This structural invariant enables efficient searching by eliminating half the remaining nodes at each step.
Tree Key Operations
root(), parent(p), children(p), numChildren(p), isInternal(p), isExternal(p), isRoot(p), size(), isEmpty(), height(p), depth(p). BST adds get(k), put(k, v), remove(k).
Tree Java Implementation
General tree (linked)
class Node<E> { E element; Node<E> parent; List<Node<E>> children; }
Binary tree (linked)
class Node<E> { E element; Node<E> parent, left, right; }
Binary tree (array): Store node at index i; left(i) = 2i+1, right(i) = 2i+2, parent(i) = (i-1)/2. Efficient for complete trees (used by heaps); wasteful for sparse/unbalanced ones because missing nodes leave gaps in the array.
BST search
Node<K,V> treeSearch(K k, Node<K,V> v) {
if (v == null) return null; // key not found
int cmp = k.compareTo(v.key);
if (cmp == 0) return v;
return cmp < 0 ? treeSearch(k, v.left) : treeSearch(k, v.right);
}
BST insert: run treeSearch; if not found, attach new leaf where the search fell off (at the null position where the search terminated).
Tree Time and Space Complexity
| Operation | Complexity | Why |
|---|---|---|
| root / parent / left / right | O(1) | Direct pointer access - each node stores references to parent and children. |
| children(p) | O(c_p) | Must enumerate all children of node p; c_p is the number of children. |
| depth(v) | O(d_v) | Walk from v to the root, following parent pointers - the number of steps equals the depth. |
| height(T) | O(n) | Must visit every node to find the deepest leaf (recursive: each node's height = 1 + max child height). |
BST get / put / remove |
O(h): O(log n) balanced, O(n) worst | Each operation follows one root-to-leaf path. In a balanced tree, h = O(log n); in a degenerate tree (every node has one child), h = n − 1. |
Space: O(n) - one node per element, each with a constant number of pointers.
Traversals - In Detail
(a) Preorder - Root → Left → Right. Visit the node before recursing into its children.
void preOrder(Node<E> v) {
if (v == null) return;
visit(v);
preOrder(v.left);
preOrder(v.right);
}
Uses: print outline/document structure, serialize a tree (the preorder sequence + structure info can reconstruct the tree), copy tree, prefix notation of expressions.
(b) Postorder - Left → Right → Root. Visit the node after recursing into its children.
void postOrder(Node<E> v) {
if (v == null) return;
postOrder(v.left);
postOrder(v.right);
visit(v);
}
Uses: compute subtree size/height (you need children's results first), evaluate expression trees (compute operands before applying operator), delete tree safely (delete children before parent), postfix notation.
(c) Inorder - Left → Root → Right (binary trees only)
void inOrder(Node<E> v) {
if (v == null) return;
inOrder(v.left);
visit(v);
inOrder(v.right);
}
Uses: print BST keys in sorted order - this is the BST's defining benefit. Inorder traversal of a BST visits keys in ascending order because: all left-subtree keys (visited first) are smaller, then the current key, then all right-subtree keys (visited last) which are larger.
(d) Level-order (BFS). Uses a queue, not recursion.
void levelOrder(Node<E> root) {
Queue<Node<E>> q = new LinkedList<>();
if (root != null) q.add(root);
while (!q.isEmpty()) {
Node<E> v = q.remove();
visit(v);
if (v.left != null) q.add(v.left);
if (v.right != null) q.add(v.right);
}
}
Uses: shortest path in unweighted graphs/trees, layer-by-layer printing, finding the minimum-depth leaf.
Traversal complexities
- Time: All traversals are O(n) - each node is visited exactly once.
- Space: Recursive traversals (pre/in/post) use O(h) stack space, where
his the height of the tree (each recursive call adds one frame, and the maximum nesting depth equals the height). Level-order uses O(w) queue space, wherewis the maximum width (the largest number of nodes at any single level). For a complete binary tree,wcan be up ton/2(the last level), making this O(n) in the worst case.
Tree Use Cases
File systems, HTML/XML DOM, expression trees, BSTs for ordered data, decision trees, compiler syntax trees.
Tree Advantages / Drawbacks
Natural for hierarchical data; balanced BST gives O(log n) search/insert/remove. But an unbalanced BST degrades to O(n) - a sequence of sorted insertions produces a linear chain. This motivates self-balancing variants (AVL, Red-Black, 2-4 trees; covered more deeply later).
Tree Special Concepts
- Euler tour: generic framework unifying pre/in/post orders. A single walk around the tree touching each node up to three times (left visit = preorder, bottom visit = inorder, right visit = postorder).
- Proper (full) binary tree: every internal node has exactly 2 children. Property:
# leaves = # internal nodes + 1. - Complete binary tree: every level is completely filled except possibly the last, which fills left-to-right. This is the shape used by heaps; it guarantees height
⌊log₂ n⌋. - Why inorder is for binary trees only: inorder requires visiting the left subtree, then the node, then the right subtree - this "left vs right" distinction is only meaningful when there are exactly two distinguished children. For a general tree with an arbitrary number of children, there is no natural point to visit the parent "between" the children.
10. Heap
Heap Definition
A binary heap is a specialized binary tree that satisfies two invariants simultaneously:
-
Heap-order property (min-heap): for every non-root node
v,key(parent(v)) ≤ key(v). This means the smallest key percolates up to the root - the root always holds the global minimum. Note: unlike a BST, there is no ordering constraint between siblings or between left and right subtrees. The left child may be larger or smaller than the right child. -
Complete binary tree shape: every level is completely filled except possibly the last, which fills strictly left-to-right. This shape constraint guarantees the height is exactly
⌊log₂ n⌋, which is what makes heap operations O(log n).
Contrast with BST: a BST enforces a global ordering (left < parent < right at every level), enabling sorted traversal but not guaranteeing shape. A heap enforces shape and parent-child ordering only, giving O(1) access to the minimum but no sorted traversal capability.
Heap Key Operations
- insert(entry) - append the new entry at the next available position (the leftmost open spot on the last level, maintaining completeness), then up-heap (sift-up) to restore heap-order by repeatedly swapping the entry with its parent while it is smaller.
- removeMin() - return the root (the minimum), replace the root with the last entry (rightmost on the last level), then down-heap (sift-down) to restore heap-order by repeatedly swapping with the smaller child.
- min() - O(1) peek at root.
- heapify(array) - build a heap from an unordered array in O(n) using bottom-up down-heap calls.
- Helpers: upheap / sift-up, downheap / sift-down.
Array-Based Heap (the common form)
For node at index i: left = 2i+1, right = 2i+2, parent = (i-1)/2. The complete tree shape means no gaps in the array - indices 0 through n-1 are all occupied.
private List<Entry<K,V>> heap = new ArrayList<>();
private Comparator<K> comp;
public Entry<K,V> insert(K k, V v) {
Entry<K,V> e = new MapEntry<>(k, v);
heap.add(e);
upheap(heap.size() - 1);
return e;
}
private void upheap(int j) {
while (j > 0) {
int p = (j - 1) / 2;
if (comp.compare(heap.get(j).getKey(), heap.get(p).getKey()) >= 0) break;
swap(j, p);
j = p;
}
}
public Entry<K,V> removeMin() {
if (heap.isEmpty()) return null;
Entry<K,V> ans = heap.get(0);
Entry<K,V> last = heap.remove(heap.size() - 1);
if (!heap.isEmpty()) {
heap.set(0, last);
downheap(0);
}
return ans;
}
private void downheap(int j) {
int n = heap.size();
while (2*j + 1 < n) {
int left = 2*j + 1, right = 2*j + 2, small = left;
if (right < n &and comp.compare(heap.get(right).getKey(), heap.get(left).getKey()) < 0)
small = right;
if (comp.compare(heap.get(j).getKey(), heap.get(small).getKey()) <= 0) break;
swap(j, small);
j = small;
}
}
// Bottom-up heap construction - O(n)
private void heapify() {
for (int i = (heap.size() - 2) / 2; i >= 0; i--) downheap(i);
}
Heap Time and Space Complexity
| Operation | Complexity | Why |
|---|---|---|
| min | O(1) | The minimum is always at the root (index 0) - just read it. |
| insert | O(log n) | The new entry may need to up-heap from the last level to the root - at most h = ⌊log₂ n⌋ swaps. |
| removeMin | O(log n) | The replacement entry at the root may need to down-heap from the root to a leaf - at most h swaps. |
| upheap / downheap | O(log n) | Both traverse at most one root-to-leaf path, whose length is bounded by the tree height h = O(log n). |
| heapify (bottom-up) | O(n) | Not O(n log n) as one might expect. Most nodes are near the leaves and require very few swaps: leaf nodes need 0 swaps, their parents need ≤ 1 swap, etc. The total work is Σ over all nodes of (height of that node), which sums to O(n) by the geometric series convergence. |
Space: O(n); no pointer overhead in array form - parent/child relationships are implicit via index arithmetic.
Heap Sort
heapifythe input → O(n).- Repeatedly
removeMax(for max-heap) and place the extracted element at the end of the array → n × O(log n).
Total: O(n log n). Can be in-place with O(1) extra space (use a max-heap, extract elements from the back of the array).
Heap Use Cases
Priority queue implementation (the standard backing structure), Heap Sort, Dijkstra / Prim, top-k selection (use a min-heap of size k), running median (min-heap + max-heap), event simulation, Huffman coding.
Heap Advantages / Drawbacks
O(log n) for both insert and removeMin (beats both list-based PQs). Array form is cache-friendly and pointer-free. Downsides: no efficient sorted iteration (unlike BST); higher constant factors than QuickSort in practice for sorting; not stable (equal-priority elements may not preserve insertion order).
Heap Special Concepts
- Last-node trick:
insertplaces the new entry at the end of the array (the unique correct spot for a complete tree), then up-heaps.removeMinswaps root with last, removes last, then down-heaps. This maintains the complete tree shape invariant. - Min-heap vs max-heap: same algorithms, reversed comparison. Min-heap has the smallest key at the root; max-heap has the largest.
- Height is O(log n) because a complete binary tree with n nodes has height
⌊log₂ n⌋- each level doubles the number of nodes, solog₂ nlevels are needed to holdnnodes.
11. Map
Map Definition
A Map ADT (also known as an associative array, symbol table, or dictionary in common usage) stores entries (key, value) with the constraint that keys must be unique - each key maps to at most one value. You retrieve, update, or delete values by providing their associated key.
The uniqueness constraint is what distinguishes a Map from a Multimap or Dictionary ADT: calling get(k) is always unambiguous because there is at most one entry with key k. If you call put(k, v) with a key that already exists, the old value is overwritten (not duplicated).
Maps are arguably the most frequently used data structure in software engineering. They provide the abstraction of "looking something up by name" - symbol tables in compilers, caches, configuration stores, and database indices are all maps at their core.
Map Key Operations
- get(k) - return the value associated with key
k, or null if absent. - put(k, v) - insert the entry
(k, v), or overwrite the existing value for keyk; return the old value or null. - remove(k) - remove the entry with key
kand return its value, or null if absent. - size / isEmpty / entrySet / keySet / values.
Java Implementation - Unsorted List Map
public V get(K key) {
for (Entry<K,V> e : entries)
if (e.getKey().equals(key)) return e.getValue();
return null;
}
public V put(K key, V value) {
for (Entry<K,V> e : entries)
if (e.getKey().equals(key)) {
V old = e.getValue();
((MapEntry<K,V>)e).setValue(value);
return old;
}
entries.add(new MapEntry<>(key, value));
return null;
}
Java Implementation - Sorted Search Table (Array-Based)
private ArrayList<MapEntry<K,V>> table = new ArrayList<>();
// Returns index of key, or the index where it should be inserted
private int findIndex(K key) {
int low = 0, high = table.size() - 1;
while (low <= high) {
int mid = (low + high) / 2;
int comp = key.compareTo(table.get(mid).getKey());
if (comp == 0) return mid; // found
else if (comp < 0) high = mid - 1;
else low = mid + 1;
}
return low; // not found, insertion point
}
public V get(K key) {
int j = findIndex(key);
if (j == table.size() || key.compareTo(table.get(j).getKey()) != 0) return null;
return table.get(j).getValue();
}
Other implementations: sorted search table (binary search on a sorted array), hashtable, balanced BST, skip list.
Map Time and Space Complexity
| Implementation | get / put / remove | Why |
|---|---|---|
| Unsorted list | O(n) | Every operation must scan the list linearly to find the key - no ordering to exploit. |
| Sorted search table | O(log n) search, O(n) modify | Binary search finds the key in O(log n), but inserting/removing requires shifting array elements. |
| Hashtable | O(1) average, O(n) worst | Hash function maps keys to array indices in O(1). Worst-case O(n) occurs when all keys collide into the same bucket. |
| Balanced BST | O(log n) | The balanced tree height is O(log n), and every operation traverses one root-to-leaf path. |
Space: O(n).
Map Use Cases
Symbol tables (compilers), caches (URL → page), DB indices, frequency counters, config systems, any "look up a value by key" problem.
Map Advantages / Drawbacks
Natural key/value semantics; hash-backed maps give near-O(1) access. Drawbacks: no inherent ordering unless using a sorted implementation (sorted table or BST); worst-case O(n) for hashtables if hash function is poor.
Map Special Concepts
Entry<K, V>abstraction:getKey(),getValue(),setValue().- Unique keys required so that
get(k)is unambiguous. Need duplicates? Use a Multimap or Dictionary ADT. - Map, Set, Dictionary are siblings: Set = keys only (no values); Dictionary = may allow duplicate keys; Multimap = one key → many values.
12. Hashtable
Hashtable Definition
A Hashtable is a concrete implementation of the Map ADT that achieves O(1) average-case performance for get, put, and remove. It works by storing entries in a bucket array A of size N and using a hash function h: Keys → [0, N−1] to compute an array index for each key. Instead of searching for a key, you compute where it should be - this is the fundamental insight that makes hashing fast.
The trade-off is that the hash function may map different keys to the same index (a collision), requiring a collision-resolution strategy. The performance guarantee is average-case only - in the worst case (all keys collide), performance degrades to O(n).
Hash Functions (Two Stages)
Stage 1 - Hash code h₁(k): keys → integers. Examples: memory address, integer cast, component sum (for long, double), polynomial accumulation (strings): p(z) = a₀ + a₁z + a₂z² + ... where each aᵢ is the character code and z is a constant (commonly 31 or 33).
Stage 2 - Compression h₂(y): integer → [0, N-1]:
- Division:
y mod N(use primeNto spread keys well - primes avoid patterns with common factors). - MAD (Multiply-Add-Divide):
((a·y + b) mod p) mod N, with randoma, b,pprime,a mod p ≠ 0. Provides stronger, more uniform distribution than simple division.
A good hash function is fast (constant time to compute), deterministic (same key always produces the same hash), and uniformly distributing (spreading distinct keys evenly across [0, N-1] to minimize collisions).
Collision Handling
Collisions occur when h(k₁) = h(k₂) for distinct keys k₁ ≠ k₂. Two main strategies:
(a) Separate Chaining
Each bucket holds a small list/map (called a "chain") of all entries that hashed to that index.
public V get(K key) {
List<MapEntry<K,V>> chain = table[hash(key)];
if (chain == null) return null;
for (MapEntry<K,V> e : chain)
if (e.getKey().equals(key)) return e.getValue();
return null;
}
public V put(K key, V value) {
int i = hash(key);
if (table[i] == null) table[i] = new ArrayList<>();
for (MapEntry<K,V> e : table[i])
if (e.getKey().equals(key)) return e.setValue(value);
table[i].add(new MapEntry<>(key, value));
size++;
return null;
}
- Simple; deletion is trivial (just remove from the chain); tolerates load factor λ > 1. − Pointer overhead (each chain is a separate list); poor cache locality (chains are scattered in memory).
(b) Open Addressing - Linear Probing
All entries live directly in the bucket array (no separate lists). On collision, probe the next slots: A[(i+1) mod N], A[(i+2) mod N], ... until finding the key, an empty cell (key not found), or exhausting the table.
private int findSlot(K key) {
int N = table.length;
int i = hash(key), firstDefunct = -1;
for (int j = 0; j < N; j++) {
if (table[i] == null)
return firstDefunct >= 0 ? firstDefunct : -(i+1); // not found
if (table[i] == DEFUNCT) {
if (firstDefunct < 0) firstDefunct = i;
} else if (table[i].getKey().equals(key)) {
return i; // found
}
i = (i + 1) % N;
}
return firstDefunct >= 0 ? firstDefunct : -1;
}
Primary clustering: contiguous runs of filled cells grow and merge, forming long clusters that slow future probes. The longer a cluster, the more likely a new key hashes into it and extends it further.
Deletion: don't leave a null - it would break later probe sequences (a search would stop at the null, missing entries that were placed beyond it). Use a tombstone (DEFUNCT) marker; searches skip over it, inserts may overwrite it.
- Cache-friendly (sequential memory access), no pointer overhead. − Clustering degrades performance, needs tombstones (which can accumulate and slow probing), load factor must stay well below 1.
(c) Quadratic Probing
Probe A[(h(k) + j²) mod N] for j = 0, 1, 2, .... Reduces primary clustering because colliding keys jump to non-contiguous slots. However, keys that hash to the same initial slot still follow the identical probe sequence - this is called secondary clustering. Requires N prime and λ < 0.5 to guarantee that a free cell is found.
(d) Double Hashing
Use a secondary hash function d(k) as the step size: probe A[(h(k) + j · d(k)) mod N]. A common choice: d(k) = q − (k mod q) with q prime, q < N. Because different keys get different step sizes, their probe sequences diverge - this avoids both primary and secondary clustering.
d(k) must be nonzero and ideally coprime to N (guaranteed if N is prime) so the full table is probed.
Load Factor and Rehashing
Load factor λ = n/N (number of entries / table capacity).
| Scheme | Typical threshold |
|---|---|
| Separate chaining | λ ≤ ~0.9 |
| Linear probing | λ ≤ ~0.5–0.75 |
| Double hashing | λ ≤ ~0.75 |
When exceeded, rehash: allocate a new (typically doubled and kept prime) table and reinsert every entry. This is O(n) per rehash, but amortized O(1) per insertion if you double the table size each time (same argument as ArrayList doubling).
Hashtable Time and Space Complexity
| Average | Worst | Why | |
|---|---|---|---|
| get / put / remove | O(1) | O(n) | Average: hash computation + constant expected probing (with good hash and low λ). Worst: all keys hash to the same slot, degenerating to a linear scan of a list or a full probe sequence. |
| space | O(N + n) | - | The bucket array occupies O(N) space, plus O(n) for the entries themselves. With rehashing, N = O(n). |
Hashtable Use Cases
Compiler symbol tables, caches, DB indices, DNS, spell-checking, frequency counting, set membership, backing store for HashSet and HashMap.
Hashtable Advantages / Drawbacks
| Scheme | + | − |
|---|---|---|
| Chaining | simple, high λ ok, easy delete | pointer overhead, cache-unfriendly |
| Linear probing | cache-friendly, no pointers | primary clustering, tombstones needed |
| Double hashing | no clustering | two hash computations, more complex |
Hashtable Special Concepts
hashCode() / equals()contract: ifa.equals(b), thena.hashCode() == b.hashCode(). Violating this breaks hashtables - two "equal" keys could hash to different buckets, sogetwouldn't find an entry thatputstored.- Rehashing is triggered by load factor exceeding the threshold - or by tombstone buildup that degrades probing performance.
- Bad hash functions (constant, poorly mixed, or not utilizing all key bits) collapse performance to O(n) by concentrating entries in a few buckets.
13. Set
Set Definition
A Set is an unordered collection of unique elements - it models the mathematical concept of a set. Conceptually, a set is a Map that stores only keys (with no associated values, or equivalently, with a dummy value). The defining property is uniqueness: adding an element that is already present has no effect.
Sets answer the fundamental question: "Is this element present?" They provide no ordering, no indexing, and no duplicates.
Set Key Operations
- add(e) - insert element if absent; do nothing if already present.
- remove(e) - delete element if present.
- contains(e) - membership test: is
ein the set? - size / isEmpty.
- Set-algebra: union (A ∪ B), intersection (A ∩ B), difference (A \ B).
Java Implementation - Map-Backed (HashSet / TreeSet)
public class MapSet<E> {
// Underlying map (e.g., HashMap) with dummy values
private Map<E, Object> map = new HashMap<>();
private static final Object PRESENT = new Object();
public boolean add(E e) { return map.put(e, PRESENT) == null; }
public boolean remove(E e) { return map.remove(e) != null; }
public boolean contains(E e) { return map.containsKey(e); }
public int size() { return map.size(); }
}
HashSet wraps a HashMap (element = key, dummy constant = value); TreeSet wraps a Red-Black tree (a balanced BST). All set operations are forwarded to the underlying map's key operations.
Java Implementation - List-Backed (Unsorted Array)
public class ListSet<E> {
private ArrayList<E> list = new ArrayList<>();
public boolean add(E e) {
if (contains(e)) return false; // Enforce uniqueness
list.add(e);
return true;
}
public boolean remove(E e) { return list.remove(e); }
public boolean contains(E e) { return list.contains(e); }
}
The List-backed Set checks every element on add to ensure uniqueness, resulting in O(n) additions.
Set Time and Space Complexity
| Implementation | add / remove / contains | Why |
|---|---|---|
| HashSet | O(1) average | Delegates to HashMap - hash computation + constant expected probing. |
| TreeSet | O(log n) | Delegates to Red-Black tree - operations traverse a balanced tree of height O(log n). |
Space: O(n).
Set Use Cases
De-duplication, membership queries, mathematical set operations, uniqueness constraints, graph algorithms (visited-node tracking).
Set Advantages / Drawbacks
Use a Set when you only care about presence/absence. If you need associated values → Map. If you need duplicates → Multiset/Bag. If you need insertion order → LinkedHashSet. TreeSet provides sorted iteration; HashSet does not.
Special Concepts - Set Operations on Sorted Sets (Generic Merge)
Two-pointer linear sweep through two sorted sets A, B:
- Union: Compare current elements. Emit the smaller element and advance its pointer. If elements are equal, emit one and advance both pointers.
- Intersection: Compare current elements. If they are equal, emit the element and advance both. Otherwise, advance the pointer of the smaller element (do not emit).
- Difference (A \ B): Compare current elements. If A's element is strictly smaller, emit it and advance A. If B's element is smaller, advance B. If they are equal, advance both (do not emit).
All run in O(|A| + |B|) on sorted sets. Hash-based sets achieve the same expected time on average.
Set vs Multiset (Bag): a set stores each element at most once; a multiset tracks multiplicities (e.g., {a, a, b} has a with count 2).
14. Multimap
Multimap Definition
A Multimap is a generalization of the Map ADT that allows multiple values to be associated with the same key. In a standard Map, put(k, v) would overwrite any existing value for key k; in a Multimap, put(k, v) adds a new (k, v) entry alongside any existing entries with the same key.
Conceptually, Multimap<K, V> ≈ Map<K, Collection<V>>, but a Multimap provides a cleaner API with built-in collection management - you don't have to manually check for null, create empty lists, or remove empty collections.
Multimap Key Operations
- put(k, v) - add the entry
(k, v). Does not overwrite existing values fork; adds alongside them. - get(k) - return the collection of all values associated with key
k(empty collection if key absent). - remove(k, v) - remove one specific
(k, v)pair. - removeAll(k) - remove every entry with key
k. - entries() / keys() / keySet().
Multimap Java Implementation
private Map<K, List<V>> data = new HashMap<>();
public void put(K key, V value) {
data.computeIfAbsent(key, k -> new ArrayList<>()).add(value);
}
public List<V> get(K key) {
return data.getOrDefault(key, Collections.emptyList());
}
public boolean remove(K key, V value) {
List<V> list = data.get(key);
if (list == null) return false;
boolean ok = list.remove(value);
if (list.isEmpty()) data.remove(key);
return ok;
}
Multimap Time and Space Complexity
| Operation | Time | Why |
|---|---|---|
| put(k, v) | O(1) average | HashMap lookup for the key's bucket is O(1); appending to the value list is O(1). |
| get(k) | O(1) average | HashMap lookup returns the value list directly. (Iterating the returned list is O(s) where s = number of values.) |
| remove(k, v) | O(s) average | HashMap lookup is O(1), but finding and removing v from the value list requires O(s) scanning, where s is the number of values for key k. |
Space: O(n) across all entries (sum of all key-value pairs stored).
Multimap Use Cases
Grouping (students by major), inverted indexes (word → documents), graph adjacency lists (vertex → neighbors), phone directory (name → numbers), book indexes (term → page numbers).
Multimap Advantages / Drawbacks
Cleaner API than manual Map<K, List<V>>: no null-check boilerplate, no forgetting to create an empty list. Slight memory overhead from maintaining a list even when a key has just one value.
Special Concept - Multimap vs Multiset
- Multimap: key → many values (structured one-to-many relationship).
- Multiset (Bag): element → frequency count (just tracks how many times each element appears, with no associated values).
15. Dictionary
Dictionary Definition
A Dictionary ADT is a searchable collection of key-value entries that permits multiple entries with the same key. While the term "dictionary" is used colloquially (and in Python) as a synonym for "map," the course distinguishes them: a Map enforces unique keys, whereas a Dictionary allows duplicate keys.
In practice, a Dictionary is functionally equivalent to a Multimap - both allow multiple entries per key. The distinction is largely historical/terminological and varies across textbooks.
Dictionary Key Operations
- find(k) - return a single entry with key
k(or null if none exist). - findAll(k) - return all entries with key
k. - insert(k, v) - add a new entry
(k, v). Duplicate keys are permitted - this never overwrites. - remove(e) - remove a specific entry
e(not by key alone, since the key may not be unique).
Java Implementation - Unordered Dictionary (List-based)
private List<Entry<K,V>> data = new ArrayList<>();
public Entry<K,V> find(K key) {
for (Entry<K,V> e : data)
if (e.getKey().equals(key)) return e;
return null;
}
public Iterable<Entry<K,V>> findAll(K key) {
List<Entry<K,V>> matches = new ArrayList<>();
for (Entry<K,V> e : data)
if (e.getKey().equals(key)) matches.add(e);
return matches;
}
public void insert(K key, V value) {
data.add(new MapEntry<>(key, value)); // Duplicates are allowed
}
- Unordered (hashtable-backed): O(1) average find/insert/remove. Effectively a multimap built on a hashtable where each bucket can hold multiple entries with the same key.
- Ordered (sorted search table): entries kept sorted by key;
findis O(log n) via binary search,findAll(k)is O(log n + s) wheresis the number of matches (binary search to find one, then scan adjacent entries). Insert/remove cost O(n) from shifting.
Dictionary Time and Space Complexity
| Implementation | find | findAll | insert | remove | Why |
|---|---|---|---|---|---|
| Unordered (hash) | O(1) avg | O(1 + s) avg | O(1) avg | O(1) avg | Hash function maps to the bucket in O(1); scanning within a bucket is proportional to entries there. s is the number of entries with that key. |
| Ordered (sorted table) | O(log n) | O(log n + s) | O(n) | O(n) | Binary search locates the key in O(log n). Insert/remove require shifting array elements to maintain sorted order. |
Space: O(n).
Dictionary Use Cases
Language dictionaries (word → many definitions), bibliographic records, DNS when multiple IPs per domain, phone books (multiple numbers per person), database records with repeated keys.
Advantages vs Map
Natively supports one-to-many key-value relationships without requiring the caller to manage collections of values.
Dictionary Special Concepts
- Ordered vs unordered: ordered dictionaries support range queries (
findAll keys in [a, b]) and sorted traversal; unordered prioritize raw speed. - Dictionary ≈ Multimap in modern terminology - the ADT concepts overlap heavily.
- Note on naming: Java's
java.util.Dictionaryis an obsolete abstract class from Java 1.0 - modern code usesMap/HashMap. The ADT term still appears in textbooks (including this course's).
Quick-Reference Complexity Summary
| Structure | Access/Find | Insert | Delete | Min | Space |
|---|---|---|---|---|---|
| Array | O(1) by index | O(n) | O(n) | O(n) | O(n) |
| Singly LL | O(n) | O(1) head | O(1) head / O(n) tail | O(n) | O(n) |
| Doubly LL | O(n) | O(1) w/ node ref | O(1) w/ node ref | O(n) | O(n) |
| ArrayList | O(1) by index | O(1) amort end / O(n) mid | O(n) | O(n) | O(n) |
| Positional List | O(1) w/ position / O(n) by value | O(1) | O(1) | O(n) | O(n) |
| Stack | O(1) top only | O(1) | O(1) | - | O(n) |
| Queue | O(1) front only | O(1) | O(1) | - | O(n) |
| PQ (sorted list) | O(1) min | O(n) | O(1) min | O(1) | O(n) |
| PQ (unsorted list) | O(n) min | O(1) | O(n) min | O(n) | O(n) |
| Heap | O(1) min only | O(log n) | O(log n) min | O(1) | O(n) |
| BST (balanced) | O(log n) | O(log n) | O(log n) | O(log n) | O(n) |
| Hashtable | O(1) avg | O(1) avg | O(1) avg | O(n) | O(n) |