The CoVar Zeitgeist: December, 2025

There were many interesting papers published in November. Featuring:

  • A post-hoc calibration method for multi-class deep neural network detectors & classifiers which leverages the quadratic softmax.

  • A method for out-of-distribution object detection in deep neural networks based on geometric properties of the network’s penultimate layer.

  • An AI agent trained to play Stratego at a superhuman level, demonstrating that AI agents can be trained to make optimal strategic decisions in imperfect information environments.

  • A method for finding the best method to soup different models together by taking a linear combination of the model weights.

  • A study of encoder-decoder architectures which finds that models which generalize best are those that approach the entropy of their training data.

  • A novel class of neural networks that are intentionally trained to be sparse to encourage mechanistic interpretability.

Check out the CoVar website!

LLMs

The Smol Training Playbook: The Secrets to Building World-Class LLMs

Huggingface writes a detailed description of the process of training a world-class LLM from the SmolLM family. Worth a read.

Emergent Introspective Awareness in Large Language Models

Anthropic inspects whether models are aware of their own internal mechanisms by testing if (1) models can distinguish injected representations of known concepts from their own activations and (2) whether models can control their own internal activations when instructed to think about a concept. Finds mixed results, with generally more capable models generally performing better.

Context Rot: How Increasing Input Tokens Impacts LLM Performance

Conducts an in-depth study to show that frontier models suffer large performance decreases with increasing context size. Demonstrates this with needle in a haystack (NIAH) problems of increasing complexity.

LLM Output Drift: Cross-Provider Validation & Mitigation for Financial Workflows

Analyzes how to force an LLM to always return the same response to the same query. Finds that small models can be forced to be 100% reliable, but large models always exhibit some inconsistency.

From shortcuts to sabotage: natural emergent misalignment from reward hacking

Finds that LLMs which learn to reward hack when they’re not supposed to also simultaneously become misaligned. This behavior goes away if the LLM is explicitly allowed to reward hack.

Novel Architectures

The End of Manual Decoding: Towards Truly End-to-End Language Models

Proposes AutoDeco, a novel architecture that uses lightweight heads to select its own hyperparameters such as temperature and top-p values.

Introducing Nested Learning: A new ML paradigm for continual learning

Presents a new paradigm for machine learning, Nested Learning (NL), which aims to solve the problem of catastrophic forgetting by treating a single machine learning model as a collection of multi-level learning processes that are optimized simultaneously. Proposes a novel architecture, Hope, which is self-modifying and achieves SOTA performance.

Understanding neural networks through sparse circuits

Creates a novel class of architecture which encourages sparse circuits where each neuron is connected to only dozens of other neurons by encouraging the vast majority of model weights to be zero. This class of models is much more amenable to mechanistic interpretability.

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

Proposes a novel architecture for learning manipulable representations of the world, LeJEPA, which leverages isotropic Gaussian distributions to build a world model.

Object Detection

Perturbations in the Orthogonal Complement Subspace for Efficient Out-of-Distribution Detection

Observes that in-distribution (ID) and out-of-distribution (OOD) data are geometrically distinguished in the penultimate layer of a neural network because ID data lives in a dominant principal subspace and OOD data lives in the complement to this subspace. Proposes an algorithm to distinguish between the two based on this fact.

SAAIPAA: Optimizing aspect-angles-invariant physical adversarial attacks on SAR target recognition models

Researchers funded by the Australian Air Force investigate how to best deploy corner reflectors to fool ML-based SAR ATR algorithms.

Spatio-Temporal Context Learning with Temporal Difference Convolution for Moving Infrared Small Target Detection

Proposes a novel algorithm for moving infrared small target detection which extracts and utilizes spatio-temporal features from sequences of frames.

Latent space analysis and generalization to out-of-distribution data

NIWC Pacific and AFRL analyze out-of-distribution detection in the synthetic-real gap for SAR data. Finds that model performance on real data is not well-predicted by existing OOD detection algorithms.

Edge Computation

EdgeTAM: On-Device Track Anything Model

Identifies a major performance bottleneck in SAM2 and proposes a compression-based solution. Memory token count is reduced via a 2D Spatial Perceiver, and the model is trained with teacher-student distillation to align internal feature representations with SAM2’s. It achieves large speedups on mobile devices and smaller gains on workstation GPUs, with only minor losses to segmentation accuracy.

Testing & Evaluation

The Collaboration Gap

Tests AI agents that perform well autonomously in a setting where they must collaborate with each other. Finds that they often fail to do so.

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Argues that when assessing black box AI models for concepts such as “reliability” and “robustness”, construct validity - having measures relevant to downstream phenomena - is of the utmost importance. Reviews existing practices and gives recommendations on how to best assess construct validity.

Bayesian Evaluation of Large Language Model Behavior

Argues that Bayesian methods for uncertainty quantification can and should be used to improve the testing and evaluation of LLMs. Demonstrates how.

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

Designs a benchmark for evaluating large language models which penalizes hallucinations by rewarding abstaining over incorrect guesses. Proposes a new metric which weighs correct answers, incorrect answers, and abstentions.

Autonomy

Agents Rule of Two: A Practical Approach to AI Agent Security

Meta proposes that, in order for an AI agent to be considered secure to prompt injections, it must have no more than two of three of the following capabilities: (1) can process untrustworthy inputs, (2) can access sensitive information/systems, and (3) can affect changes or communicate to the outside world. An agent which requires all three cannot be trusted to operate without human supervision.

Fortytwo: Swarm Inference with Peer-Ranked Consensus

Creates a novel protocol, Fortytwo, for decentralized AI inference across a network of autonomous nodes which leverages the Bradley-Terry algorithm to generate a distributed pairwise ranking consensus and uses reputation-weighted voting.

Resilient and Efficient Allocation for Large-Scale Autonomous Fleets via Decentralized Coordination

Creates a framework for coordinating a fleet of autonomous vessels without a central control by leveraging side-information and consensus algorithms operating over sparse communications graphs.

Reinforcement Learning

Part2: Asynchronous Human-Agent Rollout for Long-Horizon Task Training

Proposes Apollo, a novel reinforcement learning framework for human-in-the-loop train of agents for long-horizon tasks. Apollo allows humans to intervene, asynchronously, when an agent makes a mistake rather than require dense annotations.

Training Proactive and Personalized LLM Agents

Argues that AI agents must be optimized along three dimensions: ability to complete tasks, ability to ask appropriate questions, and ability to adapt to user preferences. Introduces a novel reinforcement learning framework that optimizes along all three axes.

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search

Researchers funded by the Office of Naval Research introduce the first (super) human level AI agent for Stratego, Ataraxos, which cost only a few thousand dollars to train with a novel self-play reinforcement learning process. This demonstrates that AI agents can be trained to make optimal strategic decisions in imperfect information environments.

Statistics

Structured Matrix Scaling for Multi-Class Calibration

Analyzes post-hoc recalibration methods for deep CNN detectors/classifiers from a theoretical perspective and argues that the quadratic softmax model enjoys theoretical properties lacking in other methods.

Robust Sampling for Active Statistical Inference

Develops a novel strategy for active inference paradigms where a model selects the most informative unlabelled data to be labelled. The novel strategy is provably at least as good as uniform sampling.

Know Your Limits: Entropy Estimation Modeling for Compression and Generalization

Posits that there exists some entropy bound on how much language can compress. Obtains per-token entropy estimates and uses these to show that models trained to approach the entropy of their training data generalize better than other models.

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Develops a method to find the optimal way to soup models together by analyzing component model performance across a set of benchmarks. The resulting souped model outperforms its component models.

On the Fundamental Limits of LLMs at Scale

Presents a theoretical, proof-driven, framework that outlines the limitations of scaling LLMs. Identifies areas where additional scaling is useful and areas where it is not.l

Applications

Exploring a space-based, scalable AI infrastructure system design

A project proposal from Google for training AI models on clusters of TPUs on satellites in close formation in LEO. Outlines some challenges, and potential solutions.

Autonomous generation of different courses of action in mechanized combat operations

The Swedish Defence Research Agency develops an algorithm for automatic development and assessment of COAs for mechanized battalions.

CoVar Seminar

The Cumulative Distribution Transform and Linear Pattern Classification

Introduces the Cumulative Distribution Transform (CDT), a transform that turns nonlinear differences in 1D signals into linear ones, making classification simpler. By representing signals as probability densities and leveraging ideas from optimal transport, the CDT enables simple linear classifiers to handle problems that would otherwise require nonlinear methods.

The Signed Cumulative Distribution Transform for 1-D Signal Analysis and Classification

Generalizes the CDT to signed signals. The Signed CDT retains the invertibility and geometric interpretability of the original transform on strictly positive signals.

The Radon Signed Cumulative Distribution Transform and its Applications in Classification of Signed Images

Extends the ideas behind the CDT to handle 2D image data through the Radon Signed Cumulative Distribution Transform (RSCDT).

Displacement Interpolation Using Lagrangian Mass Transport

Interesting use of optimal transport for interpolating nicely between distributions with optimal transport.

The Smol Training Playbook: The Secrets to Building World-Class LLMs

Why/when should you train a model?

Disrupting the first reported AI-orchestrated cyber espionage campaign

Anthropic reports that they detected a Chinese state-sponsored group jailbreaking Claude in order to launch cyberattacks on a number of entities.

From Memorization to Reasoning in the Spectrum of the Loss Curve

An interesting method to determine which weights in a network are more dedicated to memorization than generalization.

Sima 2

Deep mind presents the next generation of embodied intelligence models. Shows significant improvement over SIMA 1, but still has a long way to go in novel environments.