The CoVar Zeitgeist: May, 2024¶

A curated list of the latest research in AI.

Featured¶

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation: Can LLMs act as Machine Learning Engineers and conduct effective ML experimentation when presented with a dataset? This paper investigates, to mixed results.
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data: Investigates model collapse, where generative models trained on their own outputs tend to collapse. Part of the reason this occurs is because new generated data replaces old real data; instead, if old data is supplemented by rather than replaced with new generated data then model collapse does not occur.
TransformerFAM: Feedback attention is working memory: Introduces a feedback loop into the transformer model to allow it to self-attend to its own latent representations. Authors claim this is like giving a transformer working memory and allows it to process indefinitely long sequences.
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos: Recovers object meshes from a video of an object.
KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion: Proposes a method to explain knowledge graph completions done with knowledge graph embeddings by investigating connected subgraphs. Makes intuitive sense and seems to improve performance in practice.

LLMs¶

Automated Social Science: Language Models as Scientist and Subjects: This paper has LLMs roleplay and act as human agents in simulated situations to test social science hypotheses in silico. Finds that LLMs as roleplay agents can reproduce results form the social science literature that they claim not to know when asked directly.
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation: Can LLMs act as Machine Learning Engineers and conduct effective ML experimentation when presented with a dataset? This paper investigates, to mixed results.
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data: A deep dive into how to optimally fine-tune LLMs in a variety of situations.
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions: Proposes that LLMs be trained to prioritize some instructions over others, instead of treating all instructions equally. This can help with alignment concerns.
Why do small language models underperform? Studying LM Saturation via the Softmax Bottleneck: Investigates why smaller LLMs experience performance drops and plateaus during training. The answer is that the hidden dimension of smaller LLMs is too small to capture the distribution it is targeting and encounters the softmax bottleneck.
Let’s Think Dot by Dot: Hidden Computation in Transformer Language Models: Chain-of-thought may be outperforming other methods because it provides LLMs with more computing power rather than because it increases reasoning capabilities. To test this, the authors give LLMs filler tokens and demonstrate that it can use these like it would Chain-of-Thought, but needs to be trained in a very specific manner.

Object Detection¶

Improving Detection in Aerial Images by Capturing Inter-Object Relationships: Entities tend to be spatially correlated, but existing overhead ATR methods don’t take this into account. This paper does so by putting a transformer on top of traditional two-stage detectors to examine regions of interest.
DiffDet4SAR: Diffusion-based Aircraft Target Detection Network for SAR Images: ConvNets/transformers for overhead sensing in SAR are limited by varying target size, spikiness of SAR data, and general noise. The papers attempts to ameliorate these problems by (1) using a denoising diffusion process and (2) using a scattering feature enhancement to model the SAR data. Seems to improve results.
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos: Recovers object meshes from a video of an object.
Bridging Remote Sensors with Multisensor Geospatial Foundation Models: Investigates how to fuse together multiple modalities in remote sensing. Creates distinct embedding layers for each sensor, inputs all of them into a shared encoder, and decodes on a per-sensor level.
A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task: Uses the Univariate Marginal Distribution Algorithm (UMDA) to select the optimal Landsat band for overhead monitoring.
LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification: Makes use of LiDAR to select the best hyperspectral bands using self-attention encoders, then uses these for image classification.

Autonomy¶

Laser Learning Environment: A new environment for coordination-critical multi-agent tasks: Introduces a new environment for multi-agent reinforcement learning. One problem they encounter and highlight is that the agents can get stuck in a state space.
Advancing Forest Fire Prevention: Deep Reinforcement Learning for Effective Firebreak Placement: Implements deep reinforcement learning on satellite pictures to discover optimal placement for firebreaks in case of forest fires. We could implement a similar approach to find optimal spots for, e.g., fortifications.
Learn to Tour: Operator Design For Solution Feasibility Mapping in Pickup-and-delivery Traveling Salesman Problem: Uses reinforcement learning for the pickup and delivery traveling salesman problem. Could be an interesting route-finding algorithm for autonomous vehicles.
A survey of air combat behavior modeling using machine learning: Norwegian Defence researchers analyze how well current reinforcement learning methods are producing en silico agents for simulation of aerial combat.

Theory¶

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data: Investigates model collapse, where generative models trained on their own outputs tend to collapse. Part of the reason this occurs is because new generated data replaces old real data; instead, if old data is supplemented by rather than replaced with new generated data then model collapse does not occur.
Variational Stochastic Gradient Descent for Deep Neural Networks: Proposes a new method for gradient descent, Variational Stochastic Gradient Descent, which outperforms both ADAM and regular SGD on the image classification examples in the paper. VSGD is a generalization of regular SGD and ADAM.
The Illusion of State in State-Space Models: Finds that state-space models with finite layers have no advantage over transformers in state-space tracking: SSMs are limited at keeping track of entities in narratives, playing chess, or evaluating code.
TransformerFAM: Feedback attention is working memory: Introduces a feedback loop into the transformer model to allow it to self-attend to its own latent representations. Authors claim this is like giving a transformer working memory and allows it to process indefinitely long sequences.
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention: Uses compressive memory to store input tokens as parameters which can be updated. This, theoretically, enables handling and processing of infinite input data.
An exactly solvable model for emergence and scaling laws: Explicitly models where scaling begins for neural net training in terms of training time, training data, and model size in two-layer NNs.
On the Learnability of Out-of-distribution Detection: Investigates and proves when OOD detection is theoretically impossible and when it’s possible.
HELLINGER-UCB: A NOVEL ALGORITHM FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEM AND COLD START PROBLEM IN RECOMMENDER SYSTEM: Proposes new multi-armed bandit algorithm with applications to cold-start scenarios in recommender systems.
Estimating the Number of Components in Finite Mixture Models via Variational Approximation: ELBO-based method to try to estimate number of components in mixture models.

Computational Efficiency¶

GCV-Turbo: End-to-end Acceleration of GNN-based Computer Vision Tasks on FPGA: From DEVCOM Army Research Office. Investigates how to pu CNNs and GNNs for computer-vision tasks on FPGAs.

Knowledge Graphs¶

FLawN-T5: An Empirical Examination of Effective Instruction Tuning Data Mixtures for Legal Reasoning: Claims that legal reasoners tend to have poor performances because there isn’t a proper legal reasoning dataset. This paper introduces one, finetunes a model, and demonstrates much better performance.
Chain event graphs for assessing activity-level propositions in forensic science in relation to drug traces on banknotes: Implements legal reasoning via turning arguments into graphical models, assigning probabilities to edges, and turning the crank.
KGExplainer: Towards Exploring Connected Subgraph Explanations for Knowledge Graph Completion: Proposes a method to explain knowledge graph completions done with knowledge graph embeddings by investigating connected subgraphs.

Applications¶

Mapping the Increasing Use of LLMs in Scientific Papers: Trawls arXiv to investigate out how much of it is LLM generated, looking at near a million papers. Finds that 17.5 percent of CS papers are LLM-generated.

New Models¶

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models: Deepmind proposes a new LLM. Doesn’t use global attention, but instead uses local attention and linear recurrences.
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models: A multimodal foundation model. Does not seem to be open source.
Llama 3: Meta’s new LLM.
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone: New LLM from Microsoft that is small enough to run natively on an iPhone 14 but achieves comparable results to GPT-4. Most of the penalty it pays for its small size takes the form of less factual knowledge.
Capabilities of Gemini Models in Medicine: Google release Med-Gemini, which is Gemini but fine-tuned to the medical domain.