Two categories of large-scale production system overloading issues

published on 2023/07/02
  1. Top-Down overload or "Reddit Hug of Death": This is what Bluesky experienced today - suddenly there was a HUGE demand surge and the servers just couldn't for a while. This also happens after superbowl ads or when pop stars announce tours or during DDOS attacks.

  2. Bottom-up: This is the less obvious and more common scenario, when something inside the system fails, that makes the system unable to serve normal load. If you lose a redis cache and everything is reading to DB, you will drastically reduce your ability to serve requests.

Maggie Johnson-Pint

The whole thread is worth reading.