Hello! I am a 2nd-year Ph.D. student in Computer Science at Johns Hopkins University, advised by Tianmin Shu. I’m also a student researcher at Meta FAIR, mentored by Jason Weston. My research is funded by the Amazon AI PhD Fellowship.
I received bachelor’s degrees in Honors Computer Science and Mathematics from NYU Courant (2020–2024), where the CS Department recognized me as its most promising student. I was as a research intern at MIT (2023), advised by Josh Tenenbaum. I received an Outstanding Paper Award at ACL 2024 for my work on multimodal Theory of Mind.
I’m developing AI models and agents with advanced social intelligence (ASI). Particularly, I am working on:
- I lead MMToM-QA (ACL'24 Outstanding Paper) and AutoToM (NeurIPS'25 Spotlight), pushing the frontier of machine Theory of Mind (ToM)—teaching AI systems to understand people's minds (e.g., goals, beliefs) from their behavior.
- We bridge the deliberate human-like reasoning of agent models with the adaptability of language models, advancing ToM by grounding it in cognitive theories, leveraging multi-modal, multi-agent, and multi-perspective data, scaling to long-context, complex scenarios, and learning with zero mental annotations.
News
- [Feb, 2026] Co-organizing CVPR 2026 Workshop on “Rediscovering Intelligence: Can AI Still Learn from Humans?”.
- [Oct, 2025] Invited talks on “Reinforcement Learning from Human Interaction” at Google DeepMind, Meta TBD Lab, WashU, and Northwestern.
- [Oct, 2025] Excited to be awarded the Amazon AI PhD Fellowship!
- [Sep, 2025] AutoToM is accepted by NeurIPS 2025 as a Spotlight Presentation!
- [June, 2025] Lead-organized RSS 2025 Workshop on Continual Robot Learning from Humans.
- [June, 2025] I was selected as a Notable Reviewer for ICLR 2025.
- [Jan, 2025] MuMA-ToM is accepted by AAAI 2025 as an Oral Presentation!
- [Aug, 2024] MMToM-QA won the Outstanding Paper Award at ACL 2024! It was covered by Futurity, Synced, and JHU News.
Recent Publications
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions Chuanyang Jin, Binze Li, Haopeng Xie, Cathy Mengying Fang, Tianjian Li, Shayne Longpre, Hongxiang Gu, Maximillian Chen, Tianmin Shu
Tech Report
We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human-AI conversations with users' self-reported thoughts. Our analysis shows that these thoughts are distinct from messages, difficult for frontier LLMs to infer, and provide actionable signals for user-behavior prediction and model alignment, opening new directions for user modeling, model training, and evaluation. The Era of Real-World Human Interaction: RL from User Conversations Chuanyang Jin, Jing Xu, Bo Liu, Leitian Tao, Olga Golovneva, Tianmin Shu, Wenting Zhao, Xian Li, Jason Weston
Tech Report / 🔍 Invited Talk at Google DeepMind, Meta TBD Lab / ⭐️ Paper of the Week by Huggingface, DAIR.AI, TuringPost
We posit that to achieve continual model improvement and multifaceted alignment, future models must learn from natural human interaction. We introduce Reinforcement Learning from Human Interaction (RLHI), a paradigm that learns directly from in-the-wild user conversations, leveraging organic replies and long-term history as learning signals. Trained on WildChat, RLHI outperforms RLHF in personalization and instruction-following, and similar feedback enhances performance on reasoning benchmarks. SPICE: Self-Play In Corpus Environments Improves Reasoning Bo Liu, Chuanyang Jin, Seungone Kim, Weizhe Yuan, Wenting Zhao, Ilia Kulikov, Xian Li, Sainbayar Sukhbaatar, Jack Lanchantin, Jason Weston
Tech Report / 📰 Featured in VentureBeat
SPICE is a reinforcement learning framework where a single model improves itself by playing two roles: a Challenger that creates tasks based on corpora, and a Reasoner that solves them. By grounding this self-play in corpora, SPICE addresses hallucination and lack of diversity issues, significantly outperforming standard (ungrounded) self-play across reasoning benchmarks. MindZero: Learning Online Mental Reasoning With Zero Annotations Shunchi Zhang*, Jin Lu*, Chuanyang Jin*, Yichao Zhou*, Zhining Zhang, Tianmin Shu
ICML 2026 / HCAIR@ICLR 2026 (Oral)
We introduce MindZero, a self-supervised reinforcement learning framework that trains multimodal LLMs for efficient and robust online Theory-of-Mind reasoning. During training, the model is rewarded for generating mental state hypotheses that best explain observed human actions. Across four experimental settings, MindZero significantly outperforms LLMs and model-based methods in accuracy and efficiency. AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling Zhining Zhang*, Chuanyang Jin*†, Mung Yao Jia*, Shunchi Zhang*, Tianmin Shu (†: project lead)
NeurIPS 2025 (Spotlight)
AutoToM is an automated agent modeling method for scalable, robust, and interpretable mental inference. Given a ToM problem, AutoToM proposes a minimal agent model, performs Bayesian inverse planning based on the model, and iteratively refines the model until it can confidently infer the target mental state. It achieves SOTA on five benchmarks, produces human-like confidence estimates, and supports embodied decision-making. OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Chuanyang Jin*, Qiushi Sun*, Kanzhi Cheng*, Zichen Ding*, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu
ACL 2025 / ⭐️ Huggingface Daily Papers Top-1
We introduce OS-Genesis, a manual-free data pipeline for synthesizing GUI agent trajectory. It enables agents to actively explore web and mobile environments through stepwise interactions, then derive meaningful low- and high-level task instructions from observed interactions and state changes. MuMA-ToM: Multi-modal Multi-Agent Theory of Mind Haojun Shi*, Suyu Ye*, Xinyu Fang, Chuanyang Jin, Leyla Isik, Yen-Ling Kuo, Tianmin Shu
AAAI 2025 (Oral) / ⭐️ Featured as a CVPR 2026 Challenge
MuMA-ToM evaluates Theory of Mind reasoning in embodied multi-agent interactions, revealing that current multimodal LLMs significantly lag behind human performance. To bridge this gap, we propose LIMP, a method that combines language models with inverse multi-agent planning to achieve superior results. MMToM-QA: Multimodal Theory of Mind Question Answering Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua Tenenbaum, Tianmin Shu
ACL 2024 (Outstanding Paper Award) / 🔍 Invited Talk at University of Washinton / 📰 Featured in Futurity, Synced, ...
Can machines understand people's minds from multimodal inputs? We introduce a comprehensive benchmark, MMToM-QA, and highlight key limitations in current multimodal LLMs. We then propose a novel method that combines the flexibility of LLMs with the robustness of Bayesian inverse planning, achieving promising results. Feel free to check out my undergrad projects. A mountain of gratitude to those who have kindly mentored and inspired me with their vision and passion!
Selected Honors & Awards
- Amazon AI PhD Fellowship, 2025
- Notable Reviewer Award, ICLR 2025
- Outstanding Paper Award, ACL 2024
- Presidential Honors Scholar and Summa cum Laude, New York University, 2024
- Computer Science Prize for the Most Promising Student, New York University (1 person/year), 2023
- Dean’s Undergraduate Research Fund, New York University, 2023
- COMAP International Scholarship Award (Top 0.1%), 2022
- MAA Award in Mathematical Contest in Modeling (Top 0.1%), 2022
- Bronze Medal of Shing-Tung Yau Computer Science Award (Top 1%), 2019
- Finalist of FIRST Robotics Competition World Championship (Top 0.2%), 2019
- NFLS Outstanding Student Leader Award and Zhou Enlai Scholarship (Top 1%), 2018
- First Prize of Chinese Mathematical Olympiad (Top 0.1%), 2018
- Champion of International Regions Mathematics League, 2018
Services