Eval JavaScript - Search News

Developer-targeting campaign using malicious Next.js repositories

A developer-targeting campaign leveraged malicious Next.js repositories to trigger a covert RCE-to-C2 chain through standard ...

InfoWorld

How to choose the best LLM using R and vitals

Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...

GitHub

METR/eval-analysis-public

This repository contains the analysis code and data for METR's time horizon methodology, as described in "Measuring AI Ability to Complete Long Tasks". . ├── src/horizon/ # Analysis code (installable ...

IEEE

CAST-Eval: A Domain-Specific Benchmark for Large Language Models in Civil Aviation Safety

Abstract: In this paper, we present CAST-Eval, a novel, comprehensive and domain-specific benchmark designed to assess the knowledge and reasoning capabilities of large language models (LLMs) in the ...

GitHub

Block or report XiaoMaColtAI

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si… ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results