June 22, 2025

Crowd-Sourcing A Dataset To Make Agents Code Like Humans

Alfred Nguyen, Mihir Mahajan, Franz Srambical

We introduce crowd-code, a VS Code/Cursor extension that allows anyone to participate in crowd-sourcing a software engineering dataset to eventually finetune models on. Install once, and forget about it.

April 29, 2025

PPO Is An Off-Policy Algorithm

Franz Srambical, Mihir Mahajan

PPO is commonly referred to as an on-policy algorithm. We argue that this is confusing, and show that truly on-policy PPO reduces to vanilla policy gradient REINFORCE with baseline.

March 26, 2025

Performance-degradation Free Value Assertions in JAX

Mihir Mahajan, Franz Srambical

Traditional value assertions in jitted JAX lead to performance degredation. A new (not yet public) JAX API fixes this.

February 12, 2025

PPO Is Secretly Using Monte Carlo Advantage Estimation In LLM Post-Training

Franz Srambical

When using PPO in LLM post-training, hyperparameter settings turn Generalized Advantage Estimation into Monte Carlo Advantage Estimation.

September 26, 2024

NNs Do Not Generalize OOD

Franz Srambical, Mihir Mahajan

Neural networks are mean-seeking. They work well when you run inference on data points that lie around the mean of their training data. They embarrassingly fail otherwise.

June 8, 2024

Going Beyond the Causal Mask in Language Modeling

Franz Srambical

Although ubiquitously used in large-scale language modeling, the necessity of the causal mask is seldom questioned in the literature. Why do we really need the causal mask?

December 7, 2023

ACT: Adaptive Compute Transformer

Mihir Mahajan, Franz Srambical

Large language models exhibit remarkable reasoning capabilities with scale. However, a fundamental flaw of current-generation transformer-based language models is their uniform allocation of compute per token.