0
Run a vLLM Server on HF Jobs in One Command
https://huggingface.co/blog/vllm-jobs(huggingface.co)A private, OpenAI-compatible LLM endpoint can be created on Hugging Face infrastructure using a single command. The process utilizes `hf jobs run` to deploy a vLLM server from a Docker image on specified GPU hardware. Once launched, the endpoint can be queried from anywhere using tools like curl or the OpenAI Python client by providing an HF token for authentication. The guide also covers scaling to larger models, managing costs by stopping the job, and extending functionality with a Gradio UI or as a backend for a coding agent.
0 points•by ogg•2 hours ago