Unit 5

  • A persistent data structure is a data structure that always preserves the previous version of itself when it is modified.
  • updates are not in-place

  • Partially Persistent

    -> Access to all versions, but modify only the latest version

  • Fully persistent

    -> If every version can be both accessed and modified

  • Confluently persistent

    is when we merge two or more versions to get a new version.
  • Example
    1. Linked List Concatenation
      • We have two lists of size n,m ; n > m; we have to concatenate the lists.
      • We also have to maintain the version.
      • First Approach -> Make copy of all the nodes of the list O(n+m+1) for traversal and O(1) for copying and linking
      • Second Approach -> pick list with less nodes, and copy them, here O(m) for traversal and O(1) for copying and linking
      • We need to copy in order for the original form to persist!
    2. BST
      • Make a copy of the node we are about to update, proceed for update, cascade the change back through the data structure till you reach the root.
      • attach the other children as the unchanged subtree from the previous tree

  • Approaches to make data structures persistent

    • Path Copying
      • As done in the BST
      • Copy the node to be updated; Cascade the change back through the data structure;
      • How to access the state at time t? Maintain an array of roots indexed by timestamp.
    • Fat Nodes
      • Every node stores its modification history, thus making it 'fat'
    • Nodes with boxes
      • Amortized complexity of O(1)
      • For a tree, it involves using a BOX that can hold
        1. One modification to the node (or to the node's key or to some other node specific data)
        2. Time when the mod was applied.
      • With this algo, at an y given time t, at most one modification box exists in the data structure with time t
      • The modification at time t splits the tree into three parts
        1. data from before time t
        2. data from after time t
        3. part unaffected by the modification
      • Non tree data structures may require more than one modification box.

  • Copy-on-write

    • a new version of the data structure whenever a modification is made to it.
    • old version is still available for reading, while the new version is used for writes.
    • memory-efficient, as most of the time, only one version of the data structure is active

  • Snapshotting

    • a snapshot of the data structure at a particular moment in time and then storing the snapshot
    • simple to implement but can be memory-inefficient as multiple snapshots of the data structure are kept in memory.

  • Persistent Arrays

    • Creating a new version of the data structure whenever a modification is made to it, but only modifying the elements that have changed
    • memory-efficient but can be complex to implement.

  • Path Copying

    • Creating new version of data structure whenever a modification is made to it, but only copying the portion of the data structure that has changed
    • Memory efficient approach & simple to implement
    • Results in long chains of small update which can slow down the system

  • Log-structured data structures

    • logging all modifications made to the data structure, and using the log to recreate the data structure at a later time.
    • Memory efficient
    • complex to implement and may result in slow updates.


Cache Oblivious Algorithms

-
I/O Model

- Fast (cache) memory of size M; divided into B sized blocks - Cost: the number of block transfers (or I/Os) from slow memory to fast memory - Algos are based on M and B -
Cache Oblivious Algorithms

- Algos not parameterized by B or M - Analyze the ideal cache model - same as I/O Model but optimal replacement is assumed - A size-𝑁 array occupies at most 𝑁/𝐵 + 1 = Θ(1 + 𝑁/𝐵)blocks, since you cannot control cache alignment