About Me
I am a first-year CS PhD student at the University of Chicago, advised by Prof. Chenhao Tan. I am a member of the Chicago Human+AI (CHAI) lab, and affiliated with the broader Communication and Intelligence (C&I) group. I am also working closely with Prof. Hao Peng at the University of Illinois at Urbana-Champaign (UIUC).
Previously, I received my Bachelor’s degree in Artificial Intelligence from Fudan University in 2025. During my undergraduate study, I was fortunate to intern at UIUC with Prof. Hao Peng and Prof. Jiaqi W. Ma, and at Shanghai Jiao Tong University with Prof. Dequan Wang.
Research Interests
I am broadly interested in training data and algorithms for large language models (LLMs). My current research focuses on:
Extending the generalization scope of their reasoning and thinking behaviors. I believe that genuine thoughtful reasoning should be a robust behavior that can transfer to versatile domains (e.g., philosophical and social science writing) and formalisms (e.g., causal reasoning, agentic reasoning, continual learning, etc.), but the current math-centered post-training paradigm makes LLMs struggle towards this goal.
Evaluating and improving their complex, composite but underexplored real-world capabilities. I am especially interested in training LLMs to (1) proactively explore and discover, (2) leverage ambiguity in strategic communications, and (3) balance accurate reasoning with controllable creativity.
Data foundations throughout the lifecycle of language-centered AI. I am always fascinated by the role of data in shaping model behaviors, with a consistent interest in data curation, selection, attribution, and data-efficient supervision paradigms, especially in but not limited to the language modality.
Selected Publications
* denotes equal contributions, and † denotes equal advising.
- Executable Counterfactuals: Improving LLMs’ Causal Reasoning Through Code
Aniket Vashishtha*, Qirun Dai*, Hongyuan Mei, Amit Sharma†, Chenhao Tan†, Hao Peng†
ICLR 2026; NeurIPS 2025 Workshop on FoRLM
[paper] [code]
- The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
NeurIPS 2025 (Spotlight)
[paper]
- Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai, Dylan Zhang, Jiaqi W. Ma, Hao Peng
Findings of EMNLP 2025; ICLR 2025 Workshop on DATA-FM
[paper]
News
[01/2026] One paper accepted by ICLR 2026! Check out how Executable Counterfactuals fills the gap in counterfactual reasoning evaluation by operationalizing abduction with verifiable supervision.
[12/2025] Attending NeurIPS 2025 and will present my work, GRAPE (Spotlight) and Executable Counterfactuals (FoRLM Workshop).
[09/2025] Officially started my CS PhD study as an honored member of the CHAI lab.
[09/2025] One paper accepted by EMNLP 2025 (Findings), and two papers by NeurIPS 2025 (Spotlight & Workshop)!
[06/2025] Officially graduated from Fudan University and received my Bachelor’s degree.
[04/2025] Attending ICLR 2025 and will present my work, Balanced and Influential Data Selection (BIDS), at the 2nd DATA-FM Workshop.
