AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

https://huggingface.co/blog/ServiceNow-AI/aprielguard(huggingface.co)

AprielGuard is an 8-billion parameter safeguard model designed to enhance the safety and security of modern Large Language Model (LLM) systems. It detects 16 categories of safety risks and a wide range of adversarial attacks, including prompt injections and jailbreaks. The model is specifically built for complex agentic workflows, multi-turn conversations, and long-context scenarios. Trained on a large-scale synthetic dataset, AprielGuard can operate in both a low-latency classification mode and an explainable reasoning mode to justify its decisions.

0 points•by will22•6 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?