We introduce AGI-CAST, a continually growing dataset of unlabeled screen recordings of long-horizon AGI research.
Internet-scale pre-training, preference modeling, and reinforcement learning using verification signals offer a compelling pathway for language models
to attain human-level performance
A longstanding goal of AGI research is automating the process of conducting research itself. While a long line of work tried to tackle necessary capabilities for automating research individually (coding, ideation, exploration, planning), research automation does not warrant special treatment compared to other types of knowledge work, and behaviour-cloning from screen recordings is a natural way to bootstrap models in general.
We introduce AGI-CAST, a dataset of unlabeled screen recordings of AGI research, intended to facilitate research on behaviour-cloning from screen recordings. We go beyond crowd-code, our previous work on crowd-sourcing a dataset of IDE interactions, by capturing not only the IDE, but also browser-use, notes, and paper exploration. While crowd-code is intended as a low-threshold crowd-sourcing effort, AGI-CAST captures the entire day of researchers at p(doom), with all its idiosyncrasies and nuances. While we only started recording recently, the entire up-to-date dataset is available as a playlist on YouTube and will be updated continuously.
All uncaptured data is lost data. AGI-CAST represents the first step towards capturing and openly releasing the longest-horizon data imaginable. We invite the community to follow our lead and openly release screen recordings of their own research.
FS worked on all aspects of this post. The dataset exclusively captures the work of the authors of this post.