Foundations of Data Systems¶
Reliable, Scalable, and Maintainable Systems¶
The Internet was done so well that most people think of it as a natural resource like ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error free? ~ Alan Kay
-
Applications today are data-intensive, as opposed to compute-intensive.
- CPU power rarely is a limiting factor
- Bigger problems are the amount, complexity of data and speed with which the data is changing.
-
We think about databases, queues, caches as different tools;
- A database and a message queue have superficial (high level #todo is high level the correct layman description here?) similarity - both store data for some time.
- But, they have very different access patterns, resulting in different performance characteristics and different implementations.
-
Tricky Questions that come up when you are designing a data system/service.
- How do you ensure that the data remains correct and complete, even when things go wrong internally?
- How do you provide consistently good performance to clients, even when parts of the system are degraded?
- How do you scale to handle an increase in load?
- What does a good API for the service look like?
Reliability¶
The system should work correctly, even in the face of adversity.
- For a software, reliability roughly means the following
- It performs functions as per user expectations
- It tolerates the user making mistake or using software in unexpected ways.
- Its performance is good enough for the required use case, under the expected load and data volume.
- The system prevents any unauthorised access and abuse.
Fault¶
- The things that can go wrong are faults.
- Fault is different From #Failure.
Fault - A component of the system deviating from its spec
Failure¶
The System as a whole stops providing value (required service) to the user.
Fault Tolerance¶
- Systems that can anticipate faults and can cope with them are fault-tolerant or resilient.
- We can only tolerate to certain kinds of faults, we can never make a system 100% fault tolerant.
- What if Earth goes boom? You would need budget to setup a server on a different planet.
- counter intuitively, it can make sense to increase the rate of faults by triggering them deliberately. Reference -> [ The Netflix Chaos Monkey ] ( #todo hyperlink this)
Scalability¶
Maintainability¶
Keywords ( #todo add definition for the following)¶
- Datastore
- message queue