TriAttention open-sourced to enable 32B model long-context reasoning on 24GB GPUs via 10.7x KV cache

Coding📅 2026/04/07

#Developer#GitHub#KV 缓存#Low Risk#Manual Trigger#Reusable#Semi-Automatic#代码#代码仓库#大模型推理#显存优化

We’re thrilled to open-source TriAttention! 🚀

Running OpenClaw (32B) on a 24GB GPU but hitting OOM with long contexts? > We’ve got you covered. TriAttention achieves 10.7x KV cache compression, making long-context reasoning smooth on a single RTX 4090. No more memory errors—just efficiency.

Thanks to @WeianMaoX @Erix035 @AaronWeiHuang @YuxinXie4 @TianfuF  @supremeZhuang @songhan_mit @yukangchen_