About Me
I am a first-year CS PhD student at the University of Chicago, advised by Prof. Chenhao Tan. I am a member of the Chicago Human+AI (CHAI) lab, and affiliated with the broader Communication and Intelligence (C&I) group. I am also working closely with Prof. Hao Peng at the University of Illinois at Urbana-Champaign (UIUC).
Previously, I received my Bachelor’s degree in Artificial Intelligence from Fudan University in 2025. During my undergraduate study, I was fortunate to intern at UIUC with Prof. Hao Peng and Prof. Jiaqi W. Ma, and at Shanghai Jiao Tong University with Prof. Dequan Wang.
Research Interests
I am broadly interested in post-training data and algorithms for large language models (LLMs). My current research focuses on the following specific topics:
Extending the generalization scope of reasoning and thinking behaviors of LLMs. I believe that genuine thoughtful reasoning should be a robust behavior that can transfer to versatile domains (e.g., philosophical and social science writing) and formalisms (e.g., Bayesian and causal reasoning), but the current math-centered post-training paradigm struggles towards this goal.
Evaluating and advancing complex and composite real-world capabilities of LLMs. I am especially interested in training LLMs to (1) proactively explore and discover, (2) leverage ambiguity in strategic communications, and (3) balance accurate reasoning with controllable creativity.
In the past, I have mainly worked on data-centric topics of post-training, including data selection, data attribution, and data-efficient supervision paradigms. I still enjoy following the latest advances of these data-centric techniques, and am always excited about applying them to achieve my dynamic research goals.
Selected Publications
- Executable Counterfactuals: Improving LLMs’ Causal Reasoning Through Code
Aniket Vashishtha*, Qirun Dai*, Hongyuan Mei, Amit Sharma, Chenhao Tan, Hao Peng
NeurIPS 2025 Workshop on Foundations of Reasoning in Language Models
[paper] [code]
- The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang, Qirun Dai, Hao Peng
NeurIPS 2025 (Spotlight)
[paper]
- Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
Qirun Dai, Dylan Zhang, Jiaqi W. Ma, Hao Peng
Findings of EMNLP 2025; ICLR 2025 Workshop on DATA-FM
[paper]
News
[12/2025] Attending NeurIPS 2025 and will present my work, GRAPE (Spotlight) and Executable Counterfactuals (FoRLM Workshop).
[09/2025] Officially started my CS PhD study as an honored member of the CHAI lab.
[09/2025] One paper accepted by EMNLP 2025 (Findings), and two papers by NeurIPS 2025 (Spotlight & Workshop)!
[06/2025] Officially graduated from Fudan University and received my Bachelor’s degree.
[04/2025] Attending ICLR 2025 and will present my work, Balanced and Influential Data Selection (BIDS), at the 2nd DATA-FM Workshop.
