Monitoring Production Services at Amazon
🎥 talk
Practical tips on how to make sure your alarms don't cry wolf, change your instrument to measure different "cost" of queries separately, how to measure client errors when they're actually your fault, do distributed tracing, and avoid being misled by single percentile aggregates.
→ Watch the talk
https://www.youtube.com/watch?v=hnPcf_Czbvw