A Crash Course in Queueing Theory

https://runwayml.com/news/borrowing-the-night-reclaiming-idle-inference-gpus-for-research(runwayml.com)

Production inference demand for GPUs fluctuates daily, creating a challenge of either over-provisioning for peak times or facing long queues during high traffic. A capacity controller was built to dynamically reallocate GPUs between production and research based on these demand cycles. The system lends GPUs to research during off-peak hours overnight and reclaims them for production before the morning rush. This allocation strategy is informed by queueing theory to predict demand, optimize the number of servers needed, and ensure latency targets are met. This dynamic reallocation results in more efficient use of the GPU fleet and provides more compute resources for research initiatives without impacting production performance.

0 points•by hdt•1 hour ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?