The new open reasoning model delivers 30B-class intelligence in a 16B-parameter footprint, with 3.1B active parameters, validated independently on NVIDIA accelerated computing infrastructure.
LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.
Multiverse Computing SL, a startup with technology that reduces the hardware footprint of artificial intelligence models, is reportedly raising new capital. Sources told Bloomberg today the Spanish ...