Gemma 4 VLA Demo on Jetson Orin Nano Super

https://huggingface.co/blog/nvidia/gemma4(huggingface.co)

A detailed tutorial explains how to run a Gemma 4 Vision Language Agent (VLA) demo locally on an NVIDIA Jetson Orin Nano Super. The process involves setting up the hardware, installing system packages, creating a Python environment, and freeing up RAM. It guides the user through building `llama.cpp` with CUDA support, downloading the quantized Gemma 4 model and its vision projector, and serving it via `llama-server`. The final demo script integrates Parakeet for speech-to-text and Kokoro for text-to-speech, allowing users to ask questions verbally, which the model answers by deciding whether to capture and analyze an image from a webcam.

0 points•by will22•2 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?