Abstract: When object detection is carried out in settings with sparse and irregular data acquisition, conventional sequencing techniques that depend on continuous tracking or dense observations ...
Abstract: The parallel efficient global optimization (EGO) algorithm was developed to leverage the rapid advancements in high-performance computing. However, conventional parallel EGO algorithm based ...
ABSTRACT: Artificial deep neural networks (ADNNs) have become a cornerstone of modern machine learning, but they are not immune to challenges. One of the most significant problems plaguing ADNNs is ...
Two major sources of training data exist for post-training modern language models: on-policy (model-generated rollouts) data and off-policy (human or other-model demonstrations) data. In this paper, ...
Four algorithms are compared: Greedy, Epsilon-Greedy, Optimistic Greedy, and Gradient Bandit, evaluated over 1000 simulations with 2000 steps each. Performance metrics include average per-step reward ...
As one of the important statistical methods, quantile regression (QR) extends traditional regression analysis. In QR, various quantiles of the response variable are modeled as linear functions of the ...
Policy gradient methods have significantly advanced the reasoning capabilities of LLMs, particularly through RL. A key tool in stabilizing these methods is Kullback-Leibler (KL) regularization, which ...
ABSTRACT: In this paper, we consider a more general bi-level optimization problem, where the inner objective function is consisted of three convex functions, involving a smooth and two non-smooth ...