Unit 5
- A persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
- updates are not in-place
Partially Persistent
-> Access to all versions, but modify only the latest version
Fully persistent
-> If every version can be both accessed and modified
Confluently persistent
is when we merge two or more versions to get a new version.- Example
- Linked List Concatenation
- We have two lists of size n,m ; n > m; we have to concatenate the lists.
- We also have to maintain the version.
- First Approach -> Make copy of all the nodes of the list O(n+m+1) for traversal and O(1) for copying and linking
- Second Approach -> pick list with less nodes, and copy them, here O(m) for traversal and O(1) for copying and linking
- We need to copy in order for the original form to persist!
- BST
- Make a copy of the node we are about to update, proceed for update, cascade the change back through the data structure till you reach the root.
- attach the other children as the unchanged subtree from the previous tree
- Linked List Concatenation
Approaches to make data structures persistent- Path Copying
- As done in the BST
- Copy the node to be updated; Cascade the change back through the data structure;
- How to access the state at time t? Maintain an array of roots indexed by timestamp.
- Fat Nodes
- Every node stores its modification history, thus making it 'fat'
- Nodes with boxes
- Amortized complexity of O(1)
- For a tree, it involves using a BOX that can hold
- One modification to the node (or to the node's key or to some other node specific data)
- Time when the mod was applied.
- With this algo, at an y given time
t
, at most one modification box exists in the data structure with time t - The modification at time t splits the tree into three parts
- data from before time t
- data from after time t
- part unaffected by the modification
- Non tree data structures may require more than one modification box.
- Path Copying
Copy-on-write- a new version of the data structure whenever a modification is made to it.
- old version is still available for reading, while the new version is used for writes.
- memory-efficient, as most of the time, only one version of the data structure is active
Snapshotting- a snapshot of the data structure at a particular moment in time and then storing the snapshot
- simple to implement but can be memory-inefficient as multiple snapshots of the data structure are kept in memory.
Persistent Arrays- Creating a new version of the data structure whenever a modification is made to it, but only modifying the elements that have changed
- memory-efficient but can be complex to implement.
Path Copying- Creating new version of data structure whenever a modification is made to it, but only copying the portion of the data structure that has changed
- Memory efficient approach & simple to implement
- Results in long chains of small update which can slow down the system
Log-structured data structures- logging all modifications made to the data structure, and using the log to recreate the data structure at a later time.
- Memory efficient
- complex to implement and may result in slow updates.
Cache Oblivious Algorithms
-
I/O Model
- Fast (cache) memory of size M; divided into B sized blocks
- Cost: the number of block transfers (or I/Os) from slow memory to fast memory
- Algos are based on M and B
-
Cache Oblivious Algorithms
- Algos not parameterized by B or M
- Analyze the ideal cache model - same as I/O Model but optimal replacement is assumed
- A size-𝑁 array occupies at most 𝑁/𝐵 + 1 = Θ(1 + 𝑁/𝐵)blocks, since you cannot control cache alignment