We introduce AGI-CAST, a continually growing dataset of unlabeled screen recordings of long-horizon AGI research.

Figure 1: AGI-CAST in action.

Behaviour-Cloning Knowledge Work

Internet-scale pre-training, preference modeling, and reinforcement learning using verification signals offer a compelling pathway for language models to attain human-level performance , yet data is increasingly bottlenecking progress from spiky towards general intelligence. A natural way to extend the current paradigm to automation of arbitrary knowledge work is to behaviour-clone from screen recordings of human practitioners. This requires moving from predominantly text-based towards video-based behaviour-cloning, a nascent field .

A longstanding goal of AGI research is automating the process of conducting research itself. While a long line of work tried to tackle necessary capabilities for automating research individually (coding, ideation, exploration, planning), research automation does not warrant special treatment compared to other types of knowledge work, and behaviour-cloning from screen recordings is a natural way to bootstrap models in general.

We introduce AGI-CAST, a dataset of unlabeled screen recordings of AGI research, intended to facilitate research on behaviour-cloning from screen recordings. We go beyond crowd-code, our previous work on crowd-sourcing a dataset of IDE interactions, by capturing not only the IDE, but also browser-use, notes, and paper exploration. While crowd-code is intended as a low-threshold crowd-sourcing effort, AGI-CAST captures the entire day of researchers at p(doom), with all its idiosyncrasies and nuances. While we only started recording recently, the entire up-to-date dataset is available as a playlist on YouTube and will be updated continuously.

All uncaptured data is lost data. AGI-CAST represents the first step towards capturing and openly releasing the longest-horizon data imaginable. We invite the community to follow our lead and openly release screen recordings of their own research.

Contributions

FS worked on all aspects of this post. The dataset exclusively captures the work of the authors of this post.