Open call · № 05

Member of Technical Staff, RL.

Munich · full-time · remote

We use RL on computer-use tasks to train models for long-horizon work. The interface is streaming visual observation and low-level actions. Rewards come from task completion or rubric-based generative reward models.

This role owns the RL stack: training infrastructure, environment selection, algorithms, capability retention, evals, and stability at long horizons.

You will design training procedures, build evals for failure modes, run experiments, and publish the results.

Apply

Email franz@pdoom.org with five bullet points about evidence of exceptional ability.