The CoVar Zeitgeist: January, 2024

A curated list of the latest research in AI/ML.

LLMs

Are Emergent Abilities of Large Language Models a Mirage?

Argues that emergent behavior, as model scale increases, is simply due to choice of metric rather than any underlying behavior.

The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”

Finds that an LLM that has memorized certain relationships cannot natively implement the symmetric property. For instance, it might memorize that Tom Cruise’s mother is Mary Lee Pfiefer, but does not know the son of Mary Lee Pfiefer, because one is much more common in the training data.

Scalable Extraction of Training Data from (Production) Language Models

Discovers an exploit - if you ask ChatGPT to repeat a word forever, then, after a while, it starts repeating training data. This is now against OpenAI terms of service. See Also

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

Finds black-box jailbreaks for LLMs. Getting GPT4 to give you a recipe for meth is very doable. It’s not a GPT problem, it’s a standard LLM problem. This has big implications for Secret and Top Secret LLMs.

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Proposes a method for extremely computationally efficient LLMs. Reduces data transfer by reusing old activated neurons and increases the size of data chunks read from flash memory.

VLMs

Sequential Modeling Enables Scalable Learning for Large Vision Models

Builds a Large Vision Model that can do a wide variety of tasks without using any language data. Creative. Also interesting

Object Detection

Reconstructing Hands in 3D with Transformers

Implements a transformer that can look at pictures of hands and output a 3D mesh.

QuickQuakeBuildings: Post-earthquake SAR-Optical Dataset for Quick Damaged-building Detection

Uses satellite/high-altitude SAR datasets to assess which buildings were damaged by earthquakes by combining binary classification and anomaly detection methods.

Benchmarking Deep Learning Classifiers for SAR Automatic Target Recognition

Coauthors from DEVCOM Army Research Lab. Analyze SAR classifiers for classification accuracy, runtime performance in terms of inference throughput, and analytical performance in terms of number of parameters, number of layers, model size and number of operations.

Autonomy

Vision-Language Models as a Source of Rewards

Uses vision-language models from the CLIP to generate dense rewards for use in training reinforcement learning algorithms.

Using Surprise Index for Competency Assessment in Autonomous Decision-Making

Proposes a surprise index to evaluate how autonomous AI makes decisions and evaluates on space-maneuvers.

Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach

Lets LLMs play SC2 by building an SC2 to text to Chain of Summarization pipeline for developing strategies and allowing the LLM to interact with the videogame.

Scaling Opponent Shaping to High Dimensional Games

Proposes a method for shaping your opponents’ behaviors in multi-agent games to get to produce outcomes for everyone.

Gaussian splatting

Splatter Image: Ultra-Fast Single-View 3D Reconstruction

Proposes a Gaussian Splatting method to make a “splatter image” that defines a 3D reconstruction, which they can then feed to a Gaussian splatter renderer. From those small number channels they can render other views of that target.

iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching

A Gaussian Splatting implementation that also solves the 6D camera pose estimation problem.

Theory

Graph Convolutions Enrich the Self-Attention in Transforms!

Claims that we are reaching the limits of self-attention as a mechanism in transformers. Represents self-attention as a graph filter and redesigns from graph signal processing perspective. Increased performance.

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

Interprets learning rate as temperature, and proposes a method for varying the learning rate in a DNN on a layer-by-layer basis. Significantly outperforms existing SGD methods

Understanding the Detrimental Class-level Effects of Data Augmentation

Analyzes how data augmentation can hurt individual class-level accuracy while improving average class level accuracy.

Deep Internal Learning: Deep Learning from a Single Input

Review paper. Covers methods for doing deep internal learning - training a model from a very small amount of inputs - with a focus on CV

Can a Transformer Represent a Kalman Filter?

Yes: an intuitive result. Short paper, focusses on theory.

A Mathematical Perspective on Transformers

New mathematical perspective on transformers. Worth a read.

Knowledge Graphs

Beyond Transduction: A Survey on Inductive, Few Shot, and Zero Shot Link Prediction in Knowledge Graphs

Review paper for knowledge graphs.

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

Extends knowledge graph methods to nested triples or triples of triples: “(e.g., ((BarackObama, holds position, President), succeed by, (DonaldTrump, holds position, President)))”

SAT-Based Algorithms for Regular Graph Pattern Matching

Proposes ReGaPs (regular graph patterns) to do better graph matching - isomorphisms, approximations, subsets, etc, as well as Boolean satisfiability (SAT) encoding, and a simplification technique.

Applications

Human mobility is well described by closed-form gravity-like models learned automatically from data

Finds that simple, gravity-like machine learning models better describe human mobility than either gravity models or deep learning models.

Do Bayesian Neural Networks Weapon System Improve Predictive Maintenance?

From the Naval Surface Warfare Center. Uses Bayesian Neural Nets to estimate time to failure for highly reliable weapons systems.

New Models

Mixtral 8x7B Explained

Mixtral explains a new model: Mixture of Experts 7B (like GPT4). Instead of using one big model say 56GB we can actually use a mixture of 8 7B models. We end up saving a bit of VRAM space in the process and it works better.

Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

StripedHyena is a new model that comparable with transformers despite having a different architecture. It’s based on signal processing inspired sequence models which means that sometimes you use an FFT.

Efficient SAM

A new, faster, implementation of SAM.

General Object Foundation Model for Images and Videos at Scale

A new foundation model, GLEE, with many capabilities. Open source.