DeepSeek-V4: a million-token context that agents can actually use

https://huggingface.co/blog/deepseekv4(huggingface.co)

DeepSeek-V4 is a new Mixture-of-Experts (MoE) model with a one-million-token context window, designed specifically for efficient, long-running agentic tasks. Its primary innovation is a hybrid attention mechanism that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). This architecture drastically reduces the required KV cache memory and inference FLOPs, making the large context practical for deployment. The model also incorporates agent-focused features, such as preserving reasoning history across multi-turn tool calls and a new XML-based schema for more reliable tool use. These agent behaviors were trained using reinforcement learning within a custom sandbox environment called DSec.

0 points•by hdt•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?