Tracing in Production: When 1–150ms Turns Into 700ms
We enabled tracing on a production ML system expecting a small latency overhead. Instead inference latency increased by ~700ms — far beyond the documented range, revealing how benchmarking assumptions may break under real workloads.