The CoVar Zeitgeist: April, 2024

A curated list of the latest research in AI/ML.

LLMs

Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?

Observes that LLMs are really good at multiple choice questions, and investigates how good they are if they just receive the answers and not the questions. Unreasonably good as it turns out. Begs the question: have the LLMs seen it in training or are they picking up some weird underlying structure?

LONG-FORM FACTUALITY IN LARGE LANGUAGE MODELS

LLMs can do factchecking by breaking claims down into factual statements and evaluating each claim with a multi-step reasoning process and the ability to query Google Search. Claims to be better and faster than crowdsourced human factchecking.

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Every piece of text has some implicit reasoning contained inside itself. This paper forces LLMs to explicitly implicitly reason when generating new text, and improves performance by doing so.

Stealing Part of a Production Language Model

Develops a method that, for less than $20, enables you to steal an LLM’s embeddings using a web API. Notified the owners and it’s been patched so you can’t do this anymore.

Chronos: Learning the Language of Time Series

Uses LLMs for time series prediction. Works pretty well, especially in a zero-shot context. The intuition is that both LLMs and time series are sequentially ordered.

VLMs

Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models

Large vision language models (LVLMs) can see an increase in performance if you prompt them to focus in on an area of interest. Maybe they get confused by large images?

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Investigates whether math diagrams improve multi-modal LLM performance on math questions. Finds that they do not, and in fact sometimes performance decreases.

How Far Are We from Intelligent Visual Deductive Reasoning?

Investigates how well VLMs perform at visual reasoning, and the answer is poorly: LLMs tend not to reason very well when presented with visual instead of textual evidence.

Object Detection

Point2Building: Reconstructing Buildings from Airborne LiDAR Point Clouds

Proposes a method to autoregressivly 3D meshes from LiDAR point clouds of buildings.

LDSF: Lightweight Dual-Stream Framework for SAR Target Recognition by Coupling Local Electromagnetic Scattering Features and Global Visual Features

National University of Defense Technology in China implements a method for incorporating EM backscattering into algorithms for ATR with SAR data. Demonstrates method on MSTAR.

Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery

Few-shot detection in remote sensing data using vision transformers. Classes are represented by prototypes in some embedding space - the model can then learn and fit these prototypes.

NIGHT - Non-Line-of-Sight Imaging from Indirect Time of Flight Data

Can you sense an object behind a corner and out of line of sight? This paper proposes a method to do just that using (1) Time of Flight sensors and (2) opposing walls as mirrors.

Autonomy

Toward Autonomous Cooperation in Heterogeneous Nanosatellite Constellations Using Dynamic Graph Neural Networks

How to coordinate EO satellites to maximize observations using dynamic graph neural nets. More constrained, but similar to drone swarm type problems.

Control and Automation for Industrial Production Storage Zone: Generation of Optimal Route Using Image Processing

Plans optimal routes for multiple agents based off of imagery. Describes the entire pipeline.

Collaborative AI Teaming in Unknown Environments via Active Goal Deduction

How to get AI agents that don’t know each other to work together in the presence of both (1) goal uncertainty and (2) imperfect info about other agents? The paper proposes that agent attempt to learn what the other agents’ preferences are and, based off of that, act so as to further your own goals. Tests in a Starcraft II environment.

Theory

Out-of-Distribution Detection Should Use Conformal Prediction (and Vice-versa?)

Finds that traditional methods of evaluating out-of-distribution detections may be too optimistic, and conformal prediction should be used instead.

Do CLIPs Always Generalize Better than ImageNet Models?

Constructs a dataset with lots of spurious correlations to evaluate CLIP and ImageNet style models for zero-shot prediction. Found that ImageNets did better in these circumstances.

Deep Neural Networks Tend To Extrapolate Predictably

As data gets OOD, neural nets tend to produce solutions which default to the solution which, when treated as constant, minimizes loss over the training set.

Scalable Bayesian inference for the generalized linear mixed model

Stochastic gradient descent MCMC for Bayesian GLMMs. Significantly faster than Gibbs sampling, but still slow compared to frequentist methods.

PLANT-CAPTURE METHODS FOR ESTIMATING POPULATION SIZE FROM UNCERTAIN PLANT CAPTURES

Develops methods for counting how large a population is based on a capture-recapture model. Similar to, but meaningfully different than, the German tank problem.

Active Statistical Inference

Proposes an active learning for choosing which datapoints to label next. Could be useful for situations where we have more datapoints than we can hope to label.

Computational Efficiency

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Introduces a pruning method for LLMs which maintains SOTA performancs across a range of benchmarks while pruning quite significantly. Prunes by measuring the influence of each level and eliminating the less influential ones.

Maxwell’s Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Proposes a method for training a neural net that controls the number of dead neurons, leading to network sparsity.

The Unreasonable Ineffectiveness of the Deeper Layers

Investigates pruning LLMs and finds you can prune up to half of all layers without much degradation in performance.

Knowledge Graphs

Counterfactual Reasoning with Knowledge Graph Embeddings

Explores counterfactual reasoning on knowledge graphs.

Logistics

A Multi-population Integrated Approach for Capacitated Location Routing

Searches for the optimal way to transport objects from a set of depots to a population of users.

Applications

A communication-efficient, online changepoint detection method for monitoring distributed sensor networks

Investigates how to do changepoint detection from an array of distributed sensors while minimizing communication costs.

Estimating the household secondary attack rate with the Incomplete Chain Binomial model

Discrete-time SIR model for infectious diseases, but explicitly modelling the spread within each household using an Incomplete Chain Binomial model.

Swarm Characteristics Classification Using Neural Networks

An analysis of drones swarms from the Naval Postgraduate School. Finds that supervised neural nets can analyze and predict drone swarm behavior very well.

Velox: Meta’s Unified Execution Engine

Meta demonstrates a novel method to store, access, and use data of very type. Already in use internally at Meta.

New Models

Introducing DBRX: A New State-of-the-Art Open LLM

A new SOTA open source LLM that claims to be better and more efficient than LLaMa2-70B, Mixtral, and Grok-1.