I’m Mo, a Research Engineer at Google DeepMind in RL Engineering Team. I earned my master’s at Mila and have interned at Google Research, ServiceNow Research, and EPFL. I also co-founded Zette AI. Before that, I studied at Sharif University of Technology, where as an undergrad I co-founded Shenasa, a multidisciplinary student community on cognitive science.
My research goal is building generally intelligent machines that can plan under uncertainty in complex environments. I’m particulary interested in devloping machines that learn from experience, form rich internal models of the world, simulate possible futures, and choose actions with foresight. My current focus is on Genie.
My first paper on world models, asking a simple question: if an agent could recall better, could it make better decisions? We focused on the memory bottleneck in world models and introduced R2I, an RL agent with improved memory. R2I not only achieves superhuman performance on memory-heavy tasks but also remains competitive across diverse domains.
I also explored world models in the context of LLMs, asking: do larger models, with stronger overall performance, develop richer internal representations of the world? Our experiments suggest they do, showing that larger models are notably more resilient to misleading in-context cues, thanks to their ability to integrate prompt information with a stronger underlying world model.
Not a world model paper per se, but closely related in spirit: examining how a foundation-model agent perceives the world and acts based on its perception. Through mechanistic interpretability, we discovered patterns in how the VPT agent links recent observations to decisions, as well as scenarios where its perception led to misaligned behavior, like killing villagers.
In the same spirit of agents learning through interaction with a world or its proxy, this work frames SVG generation as an RL loop: the model writes code, renders it, and optimizes for visual fidelity, learning how its actions shape outcomes without requiring differentiable rendering.
From an earlier chapter of my work, when I was particularly interested in applying causality to enhance decision-making systems. We explored how explicitly modeling causal structure can lead to more reliable policies. In the context of autonomous driving, this approach helped train policies that avoided common pitfalls such as inertia and collisions.