Heaps do lie: debugging a memory leak in vLLM.

https://mistral.ai/news/debugging-memory-leak-in-vllm(mistral.ai)

A memory leak was investigated within the vLLM inference library when running a disaggregated serving setup for a large model. The issue manifested as a slow, linear increase in system memory, but only under specific conditions involving KV Cache transfer. Standard Python memory profiling tools failed to detect the problem, indicating the leak was occurring outside the managed heap. Using lower-level tools like Heaptrack and pmap, the team observed that the resident set size (RSS) was growing due to expanding anonymous memory mappings. This discovery pointed the investigation towards system-level memory calls like `mremap` within the underlying communication libraries responsible for data exchange.

0 points•by will22•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?