We Enabled Tracing on Our ML Serving Platform. Documentation Said 1–150ms. We Measured 700ms.
We run machine learning models in production, serving real-time traffic. When our managed serving platform introduced built-in tracing, we wanted to enable it. Tracing promised visibility into inference behaviour — input shapes, intermediate steps, output payloads. The documentation stated the expected latency overhead was 1–150ms. We enabled it on a