Using load shedding to avoid overload

A server could always be hit with a flash mob, or more expensive requests than it was scaled for at that instant in time. Sure, autoscaling and even serverless speed of autoscaling help a bunch, but there's always some configured maximum available capacity at any point in time.

In a lot of ways, this is a no-win scenario. There isn't enough capacity to go around, so you need to decide who you're going to disappoint. But to make matters worse, without designing for handling overload, many servers curl up into the fetal position and fail everyone by becoming too slow to be useful — for all requests. It's much better to design your system to shed load, so the requests that it does take on will be served quickly, and the requests that it doesn't take on will be rejected cheaply right away.

Comparison of service behavior with and without load shedding under overload — Without load shedding, latency and error rates spiral as load increases. With it, the service degrades gracefully by shedding excess work.

I wrote about this topic in a bunch of detail in the Amazon Builders' Library, so go check it out over there!