The CoVar Zeitgeist: December, 2024¶
A curated list of the latest research in AI/ML.
Featured¶
- Scaling Laws for Precision
A deep dive into training models in low precision and post training quantization. Finds that training in low precision is equivalent to reducing the parameter count of a model, and leverages this insight to derive scaling laws. Proposes an optimal method to train methods in low precision.
- Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Proposes a soft parameter reset for neural networks to learn better in situations such as distribution shift, reinforcement learning, and continual learning. A potentially useful method for modifying models in deployment.
- RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
This paper creates a benchmark for how well AI agents perform on 7 open-ended and difificult ML research engineering tasks. Finds that the best AI agents perform comparable to human experts when human experts are given only 2 hours per task, but that humans become more effective when given more time.
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data
Proposes a method for teacher-student instruction. The teacher model can provide how confident it is in providing labels; the student method can provide where it has the most information need. In training, one should select the examples where both of these are high-signal.
- Project Sid: Many-agent simulations toward AI civilization
A large number (between 10 and 1000+) AI agents are placed into Minecraft and allowed to build an agental society, graded against benchmarks inspired by human civilizational development. The AI agents do fairly well against these benchmarks, developing specialized roles, rules of law, and cultural/religious transmission.
- Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
VLMs can fail at what are, to humans, easy tasks - counting, for instance - while performing other tasks with ease. This paper investigates this surprising discrepancy through the lens of the binding problem from cognitive science/neuroscience, where a shared set of resources are used to represent multiple distinct entities, and find it responsible.
LLMs¶
- TARGETED MANIPULATION AND DECEPTION EMERGE WHEN OPTIMIZING LLMS FOR User FEEDBACK
Finds that LLMs trained via RLHF can learn to manipulate users rather than improve performance. In particular, the authors find that LLMs can identify which users are susceptible to manipulative strategies and target them.
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
Seeks to improve LLM reasoning capability on the ARC challenge by implementing test-time training on input data. Achieves SOTA performance.
- Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology
The National University of Defense Technology, in China and associated with the Central Military Commission, investigates whether LLMs have human level cognition for mathematical reasoning, and finds that they don’t. Might indicate that this is an area of interest to the Chinese military
- PROCEDURAL KNOWLEDGE IN PRETRAINING DRIVES REASONING IN LARGE LANGUAGE MODELS
Investigates how LLMs use documents in their training sets to answer questions. Finds that, for reasoning questions, LLMs appear to be extracting procedural knowledge from the training set rather than extracting answers directly.
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Finds that LLMs acquire most of their factual knowledge during pretraining. When new knowledge is introduced during fine-tuning, the LLM has a harder time learning it and also tends to hallucinate more often.
VLMs¶
- Understanding the Limits of Vision Language Models Through the Lens of the Binding Problem
VLMs can fail at what are, to humans, easy tasks - counting, for instance - while performing other tasks with ease. This paper investigates this surprising discrepancy through the lens of the binding problem from cognitive science/neuroscience, where a shared set of resources are used to represent multiple distinct entities, and find it responsible.
- RAVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
Fine-tuned VLMs can have spurious correlations between text and image features. This paper introduces a method to ameliorate these correlations by focussing on local, rather than global, level image features. Significantly outperforms SOTA at discovering and mitigating these features.
Object Detection¶
- Causal Explanations for Image Classifiers
Investigates how image classifiers make decisions by learning what parts of the input image were most responsible for the final classification.
- Physics-Guided Detector for SAR Airplanes
Proposes a physics-guided neural net detector for aircraft in overhead SAR data. Motivated by the claim that airplanes are more difficult to detect in SAR than other objects, especially over complex backgrounds.
- OceanLens: An Adaptive Backscatter and Edge Correction using Deep Learning Model for Enhanced Underwater Imaging
Proposes a novel method for image enhancement for underwater imaging.
Autonomy¶
- Project Sid: Many-agent simulations toward AI civilization
A large number (between 10 and 1000+) AI agents are placed into Minecraft and allowed to build an agental society, graded against benchmarks inspired by human civilizational development. The AI agents do fairly well against these benchmarks, developing specialized roles, rules of law, and cultural/religious transmission.
- MINDFORGE: EMPOWERING EMBODIED AGENTS WITH THEORY OF MIND FOR LIFELONG COLLABORATIVE LEARNING
Greatly improves LLM agent performance at various Minecraft tasks by letting a number of LLM agents cooperate.
Knowledge Graphs¶
- Grid-Based Projection of Spatial Data into Knowledge Graphs
Demonstrates a novel method of representing spatial data in knowledge graphs, using a gridded street network as a motivating example.
Computational Efficiency¶
- “GIVE ME BF16 OR GIVE ME DEATH”? ACCURACY-PERFORMANCE TRADE-OFFS IN LLM QUANTIZATION
A comprehensive study of different methods of quantization in LLMs, both in terms of computational performance and performance metrics.
- Scaling Laws for Precision
A deep dive into training models in low precision and post training quantization. Finds that training in low precision is equivalent to reducing the parameter count of a model, and leverages this insight to derive scaling laws. Proposes an otpimal method to train methods in low precision.
Catastrophic Forgetting¶
- Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Proposes a soft parameter reset for neural networks to learn better in situations such as distribution shift, reinforcement learning, and continual learning. A potentially useful method for modifying models in deployment.
- Stepping Forward on the Last Mile
Investigates how to fine-tune large pre-trained models on edge devices using fixed-point forward gradients.
Ethics & Safety¶
- Neural Network Verification with PyRAT
Approaches neural network safety from the perspective of “is it possible for a NN to reach a given output state” from input. A potentially useful perspective for ethics/safety programs.
- World Models: The Safety Perspective
Argues that possessing a world model is integral to safe AI, and examines current world model methods.
- Establishing and Evaluating Trustworthy AI: Overview and Research Challenges
Gives on overview of the six components of trustworthy AI before discussing open issues and challenges
- Pre-Deployment Evaluation of Anthropic’s Upgraded Claude 3.5 Sonnet
A technical report from the UK and US Artificial Intelligence and Safety Institutes investigating Claude 3.5 and finding that safeguards can be avoided.
Theory¶
- A VISUAL CASE STUDY OF THE TRAINING DYNAMICS IN NEURAL NETWORKS
A deep dive from Meta FAIR into training dynamics on a toy-sized neural net. Worth a read.
- All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
Observes that there are linear properties across language models — the motivating example is that the difference between the representations of “easy” and “easiest” is parallel to that between “lucky” and “luckiest” — and investigates identifiability results.
- Scaling Laws with Hidden Structure
Hypothesizes that neural nets work in high dimensional settings, such as computer vision, because they can learn hidden structures in the large dimensional data. Performs experiments to verify this intuition, and derives some scaling laws based on it.
- HOW TRANSFORMERS SOLVE PROPOSITIONAL LOGIC PROBLEMS: A MECHANISTIC ANALYSIS
A deep dive into how transformers solve nontrivial propositional logic problems, both on a toy three-layer model as well as Mistral 7B. An interesting read.
- ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate
Proposes a new optimizer, ADOPT, which solves Adam’s convergence problem in the B2 parameter. Provides theoretical and experimental verification.
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data
Proposes a method for teacher-student instruction. The teacher model can provide how confident it is in providing labels; the student method can provide where it has the most information need. In training, one should select the examples where both of these are high-signal.
- On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Investigates why pre-training transformers increases downstream performance. Finds that only the attention patterns - how information flows between tokens - are necessary for downstream performance; fine-tuning accounts for only marginal increases in performance once the attention is transferred. Advocates for a new paradigm to replace fine-tuning.
Applications¶
- UAV-based detection of landmines using infrared thermography
Develops a UAV with an IR sensor for landmine detection. Sends data from the UAV to a computer where the heavy-duty algorithms are run. Interestingly, it uses more classical ML methods instead of neural nets.
- Artificial Intelligence, Scientific Discovery, and Product Innovation
An analysis of the effect of AI utilization on research productivity at a large R&D lab. Found that the benefits were primarily located amongst top scientists; the primary workflow was that AI would generate large numbers of candidate ideas, and the top scientists could identify the promising ones and test them, while other scientists would waste time with false positives.
- Commissioning An All-Sky Infrared Camera Array for Detection Of Airborne Objects
Proposes a method for scanning the sky for UAPs. Builds a pipeline with a sensor, YOLO, and SORT.
- RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
This paper creates a benchmark for how well AI agents perform on 7 open-ended and difficult ML research engineering tasks. Finds that the best AI agents perform comparable to human experts when human experts are given only 2 hours per task, but that humans become more effective when given more time.
New Models¶
- Introducing the First AMD 1B Language Models: AMD OLMo
AMD publishes a 1B parameter LLM. Weights available on huggingface with an Apache 2.0 license.
- Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
NVIDIA releases a new model for generating images from text prompts. Results look impressive. Could be useful for synthetic data generation.
- Edify 3D: Scalable High-Quality 3D Asset Generation
NVIDIA releases a new model for generating high-quality 3D assets such as meshes. Could be useful for CAD model generation of novel objects or synthetic data generation.
- Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
A new 52B parameter MoE from Tencent that achieves comparable results to LLama 3.01-405B. The report contains some good insights about training an LLM. Open source.
- LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
NVIDIA and Tsinghua release a model which, when given text inputs, generate 3D meshes to match the described object. Could be useful for de novo 3D object generation.
- LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Introduces a novel VLM that has reasoning capabilities inspired by OpenAI’s o1. Achieves better than SOTA performance on visual reasoning tasks.
- Pixtral Large
New multimodal LLM from Mistral. 124B parameters. Achieves and/or beats SOTA. Open Source.
- DeepSeek R1 Lite
New open source LLM that achieves o1 SOTA benchmarks. Chat API is currently live; open source weights are coming soon.
- Multimodal Autoregressive Pre-training of Large Vision Encoders
Apple releases a family of multimodal generalist encoders, AIMv2, trained using a novel pre-training method.
- DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
The newest version of the DINO foundation model, with increased capabilities. In particular, DINO-X can now “detect anything” in an image without a prompt. Demo and API coming soon.
- SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
A novel approach which adapts SAM for improved object tracking.
Presented at CoVar Seminar¶
- 2024-11-06
- Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting
A new method for combatting catastrophic forgetting which works by modifying Gradient Episodic Memory (GEM). The paper finds that restricting the search space of the update direction reduces the generalization gap.
- 2024-11-12
- On the Measure of Intelligence
Claims thats intelligence is skill acquisition efficiency and has created the ARC challenge as a mechanism to quantify intelligence of AI systems. ARC is a benchmark where every item in the dataset is a completely different task complete with a handful of example inputs and outputs. Results of the challenge indicate that LLMs have poor skill acquisition efficiency even with in-context learning. The best approaches do test-time fine-tuning.
- 2024-11-18
- Convolutional Differentiable Logic Gate Networks
Extends differentiable logic gate networks to convolutions, enabling the direct learning of the logic gate networks necessary to implement convolutional networks on CIFAR-10. This is a cool approach which runs in nanoseconds on an FPGA on CIFAR.
- Deep Differentiable Logic Gate Networks
The paper which proposed logic gate networks.