Using load shedding to avoid overload
≡ paperA server could always be hit with a flash mob, or more expensive requests than it was scaled for at that instant in time. Sure, autoscaling and even serverless speed of autoscaling help a bunch, but there's always some configured maximum available capacity at any point in time.
In a lot of ways, this is a no-win scenario. There isn't enough capacity to go around, so you need to decide who you're going to disappoint. But to make matters worse, without designing for handling overload, many servers curl up into the fetal position and fail everyone by becoming too slow to be useful — for all requests. It's much better to design your system to shed load, so the requests that it does take on will be served quickly, and the requests that it doesn't take on will be rejected cheaply right away.
I wrote about this topic in a bunch of detail in the Amazon Builders' Library, so go check it out over there!