The CoVar Zeitgeist: July, 2026

A curated list of the latest research in AI.

Check out the CoVar website!

Autonomy

World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications

Provides a comprehensive taxonomy of world models, surveying diverse architectures, reasoning strategies, and applications to unify the field and guide future research directions.

Reinforcement Learning

Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

Introduces Model-Based Diffusion Policy Optimization (MBDPO), a framework that unifies search and policy optimization using diffusion processes to improve world model scalability and mitigate training misalignment.

AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

AutoTool introduces a dynamic framework enabling LLM agents to select and integrate tools adaptively during complex reasoning, improving performance across diverse tasks and unseen toolsets.

Theory

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Investigates why larger models learn rare tasks better than smaller ones, proposing that increased capacity reduces data-induced interference and allows for better resource allocation across diverse tasks.

VLMs

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

Introduces a multimodal diffusion language model and benchmark for multi-region captioning. Experiments indicate competitive performance with significant speedup.

Adversarial Methods

DarkLLM: Learning Language-Driven Adversarial Attacks with Large Language Models

DarkLLM trains an LLM to translate natural language instructions into visual adversarial perturbations, creating a unified, flexible framework for generating effective attacks against diverse foundation models.

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

Proposes Gradient Token Masking (GTM) to defend multimodal models against visual prompt injection by localizing and neutralizing critical image tokens using hidden-state gradient norms.

Object Detection

RadarSim: Simulating Single-Chip Radar via Multimodal Neural Fields

RadarSim proposes a differentiable renderer that leverages camera data to generate high-resolution range-Doppler radar images, improving geometry reconstruction beyond the physical constraints of radar-only methods.