Run a vLLM Server on HF Jobs in One Command

https://huggingface.co/blog/vllm-jobs(huggingface.co)

A private, OpenAI-compatible LLM endpoint can be created on Hugging Face infrastructure using a single command. The process utilizes `hf jobs run` to deploy a vLLM server from a Docker image on specified GPU hardware. Once launched, the endpoint can be queried from anywhere using tools like curl or the OpenAI Python client by providing an HF token for authentication. The guide also covers scaling to larger models, managing costs by stopping the job, and extending functionality with a Gradio UI or as a backend for a coding agent.

0 points•by ogg•2 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?