开源 TriAttention 技术,通过 10.7 倍 KV 缓存压缩实现在 24GB 显存上流畅运行 32B 大模型长上下文推理。
编码实现📅 2026/04/07
#开发者#GitHub#KV 缓存#低风险#手动触发#可复用#半自动#代码#代码仓库#大模型推理#显存优化
We’re thrilled to open-source TriAttention! 🚀 Running OpenClaw (32B) on a 24GB GPU but hitting OOM with long contexts? > We’ve got you covered. TriAttention achieves 10.7x KV cache compression, making long-context reasoning smooth on a single RTX 4090. No more memory errors—just efficiency. Thanks to @WeianMaoX @Erix035 @AaronWeiHuang @YuxinXie4 @TianfuF @supremeZhuang @songhan_mit @yukangchen_
