Memory Compression Algorithms

19h

New KV cache compaction technique cuts LLM memory 50x without accuracy loss

MIT researchers developed Attention Matching, a KV cache compaction technique that compresses LLM memory by 50x in seconds — ...

Databricks built a RAG agent it says can handle every kind of enterprise search

Databricks' KARL agent uses reinforcement learning to generalize across six enterprise search behaviors — the problem that breaks most RAG pipelines.

Nota AI Reduces Memory Usage of Upstage's Solar LLM by 72%, Demonstrating Proprietary Quantization Technology

Nota AI, an AI optimization technology company behind the Nota AI brand, announced that it has developed a next-generation ...

SOCAMM2 Is The Memory Standard AI Is Looking For

AI infrastructure can't evolve as fast as model innovation. Memory architecture is one of the few levers capable of accelerating deployment cycles. Enter SOCAMM2 ...

IEEE

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference

Abstract: With the widespread deployment of long-context large language models (LLMs), efficient and high-quality generation is becoming increasingly important. Modern LLMs employ batching and ...

GitHub

Cross-agent shared memory for Claude Code and Codex CLI — no RAG, no embeddings, no databases.

Two background processes (Observer + Reflector) compress your conversation history from multiple AI coding agents into a single shared long-term memory. Every agent reads it on startup and instantly ...

IEEE

A Combined Discrete Cosine Transform Algorithm and Downsampling Techniques for Data Compression and Transformation on IoT Sensors

Abstract: This research develops a multi-sensor data transformation and compression system based on the Internet of Things (IoT) using the Discrete Cosine Transform (DCT) algorithm to improve the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results