Unit 5

A persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
updates are not in-place
Partially Persistent
-> Access to all versions, but modify only the latest version
Fully persistent
-> If every version can be both accessed and modified
Confluently persistent
is when we merge two or more versions to get a new version.
Example
1. Linked List Concatenation
  - We have two lists of size n,m ; n > m; we have to concatenate the lists.
  - We also have to maintain the version.
  - First Approach -> Make copy of all the nodes of the list O(n+m+1) for traversal and O(1) for copying and linking
  - Second Approach -> pick list with less nodes, and copy them, here O(m) for traversal and O(1) for copying and linking
  - We need to copy in order for the original form to persist!
2. BST
  - Make a copy of the node we are about to update, proceed for update, cascade the change back through the data structure till you reach the root.
  - attach the other children as the unchanged subtree from the previous tree
Approaches to make data structures persistent
- Path Copying
  - As done in the BST
  - Copy the node to be updated; Cascade the change back through the data structure;
  - How to access the state at time t? Maintain an array of roots indexed by timestamp.
- Fat Nodes
  - Every node stores its modification history, thus making it 'fat'
- Nodes with boxes
  - Amortized complexity of O(1)
  - For a tree, it involves using a BOX that can hold
    1. One modification to the node (or to the node's key or to some other node specific data)
    2. Time when the mod was applied.
  - With this algo, at an y given time t, at most one modification box exists in the data structure with time t
  - The modification at time t splits the tree into three parts
    1. data from before time t
    2. data from after time t
    3. part unaffected by the modification
  - Non tree data structures may require more than one modification box.
Copy-on-write
- a new version of the data structure whenever a modification is made to it.
- old version is still available for reading, while the new version is used for writes.
- memory-efficient, as most of the time, only one version of the data structure is active
Snapshotting
- a snapshot of the data structure at a particular moment in time and then storing the snapshot
- simple to implement but can be memory-inefficient as multiple snapshots of the data structure are kept in memory.
Persistent Arrays
- Creating a new version of the data structure whenever a modification is made to it, but only modifying the elements that have changed
- memory-efficient but can be complex to implement.
Path Copying
- Creating new version of data structure whenever a modification is made to it, but only copying the portion of the data structure that has changed
- Memory efficient approach & simple to implement
- Results in long chains of small update which can slow down the system
Log-structured data structures
- logging all modifications made to the data structure, and using the log to recreate the data structure at a later time.
- Memory efficient
- complex to implement and may result in slow updates.

Cache Oblivious Algorithms
-
I/O Model
- Fast (cache) memory of size M; divided into B sized blocks - Cost: the number of block transfers (or I/Os) from slow memory to fast memory - Algos are based on M and B -
Cache Oblivious Algorithms
- Algos not parameterized by B or M - Analyze the ideal cache model - same as I/O Model but optimal replacement is assumed - A size-𝑁 array occupies at most 𝑁/𝐵 + 1 = Θ（1 + 𝑁/𝐵）blocks, since you cannot control cache alignment