flash-attention-with-sink implements an attention variant used in GPT-OSS 20B that integrates a "sink" step into FlashAttention. This repo focuses on the forward path and provides an experimental ...
This server operates in READ-ONLY mode for safety. It can read and analyze memory but cannot modify it. All operations are logged for security auditing.