From the "https://github.com/NVIDIA/Model-Optimizer/tree/main/examples/gpt-oss" code, we try to utilize QAT after training with SFT. When using QATSFTTrainer with a ...
Production-ready test-time compute optimization framework for LLM inference. Implements Best-of-N, Sequential Revision, and Beam Search strategies. Validated with models up to 7B parameters.