Skip to content

Foundations of Data Systems

Reliable, Scalable, and Maintainable Systems

The Internet was done so well that most people think of it as a natural resource like ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error free? ~ Alan Kay

  • Applications today are data-intensive, as opposed to compute-intensive.

    • CPU power rarely is a limiting factor
    • Bigger problems are the amount, complexity of data and speed with which the data is changing.
  • We think about databases, queues, caches as different tools;

    • A database and a message queue have superficial (high level #todo is high level the correct layman description here?) similarity - both store data for some time.
    • But, they have very different access patterns, resulting in different performance characteristics and different implementations.
  • Tricky Questions that come up when you are designing a data system/service.

    • How do you ensure that the data remains correct and complete, even when things go wrong internally?
    • How do you provide consistently good performance to clients, even when parts of the system are degraded?
    • How do you scale to handle an increase in load?
    • What does a good API for the service look like?

Reliability

The system should work correctly, even in the face of adversity.

  • For a software, reliability roughly means the following
    • It performs functions as per user expectations
    • It tolerates the user making mistake or using software in unexpected ways.
    • Its performance is good enough for the required use case, under the expected load and data volume.
    • The system prevents any unauthorised access and abuse.
Fault
  • The things that can go wrong are faults.
  • Fault is different From #Failure.

    Fault - A component of the system deviating from its spec

Failure

The System as a whole stops providing value (required service) to the user.

Fault Tolerance
  • Systems that can anticipate faults and can cope with them are fault-tolerant or resilient.
  • We can only tolerate to certain kinds of faults, we can never make a system 100% fault tolerant.
    • What if Earth goes boom? You would need budget to setup a server on a different planet.
  • counter intuitively, it can make sense to increase the rate of faults by triggering them deliberately. Reference -> [ The Netflix Chaos Monkey ] ( #todo hyperlink this)

Scalability

Maintainability

Keywords ( #todo add definition for the following)
  • Datastore
  • message queue