We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...
Abstract: Our research focuses on the intersection of artificial intelligence (AI) and software development, particularly the role of AI models in automating code generation. With advancements in ...
ccswarm is a workflow automation framework for coordinating specialized AI agents using Claude Code CLI. It provides task delegation infrastructure, template-based scaffolding, and Git worktree ...
Abstract: Constant dimension codes (CDCs) are essential for error correction in random network coding. A fundamental problem of CDCs is to determine their maximal ...