Inference Methods - Search News

Microsoft's new AI training method eliminates bloated system prompts without sacrificing model performance

Microsoft researchers have developed On-Policy Context Distillation (OPCD), a training method that permanently embeds ...

InfoWorld

Multi-token prediction technique triples LLM inference speed without auxiliary draft models

With reported 3x speed gains and limited degradation in output quality, the method targets one of the biggest pain points in production AI systems: latency at scale.

Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding

Researchers from the University of Maryland, Lawrence Livermore, Columbia and TogetherAI have developed a training technique that triples LLM inference speed without auxiliary models or infrastructure ...

IEEE

TopoKG: Infer Internet AS-Level Topology From Global Perspective

Abstract: Internet Autonomous System (AS) level topology includes AS topology structure and AS business relationships, describes the essence of Internet inter-domain routing, and is the basis for ...

University of Oklahoma

Newly funded research to develop defenses against wireless inference threats

NORMAN, Okla. – Song Fang, a researcher with the University of Oklahoma, has been awarded funding from the U.S. National Science Foundation to create training-free detection methods and novel ...

GitHub

FreqExit: Enabling Early-Exit Inference for Visual Autoregressive Models via Frequency-Aware Guidance

FreqExit is a dynamic inference framework for Visual AutoRegressive (VAR) models, which decode from coarse structures to fine details. Existing methods fail on VAR due to the absence of semantic ...

The Next Platform

Google Shows Off Its Inference Scale And Prowess

If the hyperscalers are masters of anything, it is driving scale up and driving costs down so that a new type of information technology can be cheap enough so it can be widely deployed. The ...

eLife

High frequency spike inference with particle Gibbs sampling

In their study, Diana et al. introduce a novel method for spike inference from calcium imaging data using a Monte Carlo-based approach, emphasizing the quantification of uncertainties in spike time ...

marktechpost

This AI Paper from Microsoft Introduces WINA: A Training-Free Sparse Activation Framework for Efficient Large Language Model Inference

Large language models (LLMs), with billions of parameters, power many AI-driven services across industries. However, their massive size and complex architectures make their computational costs during ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results