This leap is made possible by near-lossless accuracy under 4-bit weight and KV cache quantization, allowing developers to process massive datasets without server-grade infrastructure.
OpenAI has expanded the availability of its GPT-5.3-Codex model to third-party developers via API and Microsoft Foundry.
The new Mercury 2 AI model uses diffusion reasoning to generate 1,000 tokens per second; it runs about 5x faster than Haiku, speed limits are ...