When "letting it crash" is not enough

published on 2024/02/08

When it comes to computer systems, The Erlang/OTP programming platform has an interesting approach to failure handling. It's also widely known under the name "let it crash". The core idea behind it has to do with the fact that modern applications have a huge number of states that they can find themselves in. The more complex your application is, the more variables you need to keep track of everything. Eventually it becomes impossible for developers to predict all combinations of state that these variables will form. Once your app gets into an undesirable state, the best thing you can do is to reset it and start from a fresh, well known and correct state.

This works amazingly well together with Erlang's property that each process has a separate memory space. Allowing us to selectively reset only parts of our application, while keeping the system as a whole running. Then we can divide the application into a tree-like structure and keep restarting parts of it until we reach the root and are forced to restart the whole thing. This concept is also known as supervision trees.