Run-Ze Fan

PhD Student
Manning College of Information & Computer Sciences
University of Massachusetts Amherst

Email: runze.fan(at)icloud(dot)com

runzefan(at)umass(dot)edu

Profile

I am a first-year PhD student in Manning College of Information & Computer Sciences, University of Massachusetts Amherst, advised by Prof. Hamed Zamani. Before that, I was a research assistant at Generative AI Research Lab (GAIR) to explore Generative AI, fortunately working with Prof. Pengfei Liu. I received the M.S. degree in Computer Technology at Institute of Computing Technology (ICT) of Chinese Academy of Sciences (CAS) supervised by Prof. Jiafeng Guo in 2024 and the B.E. degree in computer science and technology from Shanghai Maritime University in 2021.

Research Interests: My primary research interests include natural language processing, large language models, and machine learning. Specifically, My current research focuses:

LLMs Pre-training, Post-training, and Evaluation
Data Science and Engineering
LLM for Science (Especially Mathematics)
Digital Agent

I am happy to collaborate and/or answer questions about my research. If you are interested in research collaboration or have any inquiries about my experience, please send me an email.

News

2025.07: MegaScience has been published. Check out our datasets and models.
2025.04: Cognition Engineering has been published.
2024.12: PC-Agent has been published.
2024.09: OlympicArena has been accepted by NeurIPS 2024 Datasets and Benchmarks.
2024.09: ReAlign has been accepted by EMNLP2024 Findings.
2024.07: I am happy to contribute to The 1st Workshop on Data Contamination (CONDA).
2024.04: BenBench has been published.

Selected Publications | Full Publications

(* indicates equal contribution)

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
Run-Ze Fan*, Zengzhi Wang*, Pengfei Liu
arXiv, 2025
[PDF] [Abstract] [Bib] [Code] [HuggingFace (Datasets & Models)] [Evaluation System] [Featured by AK] [Ritvik Rastogi's Medium] [量子位] [Poster]

@article{fan2025megascience,
  title={MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning},
  author={Fan, Run-Ze and Wang, Zengzhi and Liu, Pengfei},
  year={2025},
  journal={arXiv preprint arXiv:2507.16812},
  url={https://arxiv.org/abs/2507.16812}
}

Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu*, Zengzhi Wang*, Run-Ze Fan*, Pengfei Liu.
arXiv, 2024
[PDF] [Abstract] [Bib] [Code] [Page] [HuggingFace Demo]

@article{xu2024benchmarking,
      title={Benchmarking Benchmark Leakage in Large Language Models},
      author={Xu, Ruijie and Wang, Zengzhi and Fan, Run-Ze and Liu, Pengfei},
      year={2024},
      journal={arXiv preprint arXiv:2404.18824},
      url={https://arxiv.org/abs/2404.18824}
}

Reformatted Alignment
Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, Pengfei Liu.
EMNLP, 2024, Findings
[PDF] [Abstract] [Bib] [Code] [Page] [Featured by AK] [量子位]

@article{fan2024reformatted,
      title={Reformatted Alignment},
      author={Fan, Run-Ze and Li, Xuefeng and Zou, Haoyang and Li, Junlong and He, Shwai and Chern, Ethan and Hu, Jiewen and Liu, Pengfei},
      year={2024},
      journal={arXiv preprint arXiv:2402.12219},
      url={https://arxiv.org/abs/2402.12219}
}

Deep Research: A Systematic Survey
Zhengliang Shi, Yiqun Chen, Haitao Li, Weiwei Sun, Shiyu Ni, Yougang Lyu, Run-Ze Fan, Bowen Jin, Yixuan Weng, Minjun Zhu, Qiujie Xie, Xinyu Guo, Qu Yang, Jiayi Wu, Jujia Zhao, Xiaqiang Tang, Xinbei Ma, Cunxiang Wang, Jiaxin Mao, Qingyao Ai, Jen-Tse Huang, Wenxuan Wang, Yue Zhang, Yiming Yang, Zhaopeng Tu, Zhaochun Ren
arXiv, 2025
[PDF] [Abstract] [Bib] [Github] [机器之心]

@article{shi2025deep,
title={Deep Research: A Systematic Survey},
author={Zhengliang Shi and Yiqun Chen and Haitao Li and Weiwei Sun and Shiyu Ni and Yougang Lyu and Run-Ze Fan and Bowen Jin and Yixuan Weng and Minjun Zhu and Qiujie Xie and Xinyu Guo and Qu Yang and Jiayi Wu and Jujia Zhao and Xiaqiang Tang and Xinbei Ma and Cunxiang Wang and Jiaxin Mao and Qingyao Ai and Jen-Tse Huang and Wenxuan Wang and Yue Zhang and Yiming Yang and Zhaopeng Tu and Zhaochun Ren},
journal={arXiv preprint arXiv:2512.02038},
year={2025},
url={https://arxiv.org/abs/2512.02038}
}

@article{fan2025megascience,
  title={MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning},
  author={Fan, Run-Ze and Wang, Zengzhi and Liu, Pengfei},
  year={2025},
  journal={arXiv preprint arXiv:2507.16812},
  url={https://arxiv.org/abs/2507.16812}
}

Generative AI Act II: Test Time Scaling Drives Cognition Engineering
Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu
arXiv, 2025
[PDF] [Abstract] [Bib] [Code] [Page] [机器之心]

@article{xia2025generativeaiactii,
      title={Generative AI Act II: Test Time Scaling Drives Cognition Engineering},
      author={Shijie Xia and Yiwei Qin and Xuefeng Li and Yan Ma and Run-Ze Fan and Steffi Chern and Haoyang Zou and Fan Zhou and Xiangkun Hu and Jiahe Jin and Yanheng He and Yixin Ye and Yixiu Liu and Pengfei Liu},
      year={2025},
      journal={arXiv preprint arXiv:2504.13828},
      url={https://arxiv.org/abs/2504.13828},
}

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
Yanheng He*, Jiahe Jin*, Shijie Xia, Jiadi Su, Runze Fan, Haoyang Zou, Xiangkun Hu, Pengfei Liu.
arXiv, 2024
[PDF] [Abstract] [Bib] [Code] [Page] [机器之心]

Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.

@article{he2024pcagentsleepai,
      title={PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World},
      author={Yanheng He and Jiahe Jin and Shijie Xia and Jiadi Su and Runze Fan and Haoyang Zou and Xiangkun Hu and Pengfei Liu},
      year={2024},
      journal={arXiv preprint arXiv:2412.17589},
      url={https://arxiv.org/abs/2412.17589}
}

Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz, Iker García-Ferrero, Alon Jacovi, Jon Ander Campos, Yanai Elazar, Eneko Agirre, Yoav Goldberg, Wei-Lin Chen, Jenny Chim, Leshem Choshen, Luca D'Amico-Wong, Melissa Dell, Run-Ze Fan, Shahriar Golchin, Yucheng Li, Pengfei Liu, Bhavish Pahwa, Ameya Prabhu, Suryansh Sharma, Emily Silcock, Kateryna Solonko, David Stap, Mihai Surdeanu, Yu-Min Tseng, Vishaal Udandarao, Zengzhi Wang, Ruijie Xu, Jinglin Yang.
ACL 2024 The 1st Workshop on Data Contamination (CONDA)
[PDF] [Abstract] [Bib] [Page] [Data Contamination Database]

@inproceedings{sainz-etal-2024-data,
    title = "Data Contamination Report from the 2024 {CONDA} Shared Task",
    author = "Sainz, Oscar  and  Garc{\'\i}a-Ferrero, Iker  and  Jacovi, Alon  and  Ander Campos, Jon  and  Elazar, Yanai  and  Agirre, Eneko  and  Goldberg, Yoav  and  Chen, Wei-Lin  and  Chim, Jenny  and  Choshen, Leshem  and  D{'}Amico-Wong, Luca  and  Dell, Melissa  and  Fan, Run-Ze  and  Golchin, Shahriar  and  Li, Yucheng  and  Liu, Pengfei  and  Pahwa, Bhavish  and  Prabhu, Ameya  and  Sharma, Suryansh  and  Silcock, Emily  and  Solonko, Kateryna  and  Stap, David  and  Surdeanu, Mihai  and  Tseng, Yu-Min  and  Udandarao, Vishaal  and  Wang, Zengzhi  and  Xu, Ruijie  and  Yang, Jinglin",
    editor = "Sainz, Oscar  and  Garc{\'\i}a Ferrero, Iker  and  Agirre, Eneko  and  Ander Campos, Jon  and  Jacovi, Alon  and  Elazar, Yanai  and  Goldberg, Yoav",
    booktitle = "Proceedings of the 1st Workshop on Data Contamination (CONDA)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.conda-1.4",
    pages = "41--56",
}

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu.
NeurIPS 2024 Datasets and Benchmarks
[PDF] [Abstract] [Bib] [Code] [Page] [Featured by AK] [机器之心]

The evolution of Artificial Intelligence (AI) has been significantly accelerated by advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), gradually showcasing potential cognitive reasoning abilities in problem-solving and scientific discovery (i.e., AI4Science) once exclusive to human intellect. To comprehensively evaluate current models' performance in cognitive reasoning abilities, we introduce OlympicArena, which includes 11,163 bilingual problems across both text-only and interleaved text-image modalities. These challenges encompass a wide range of disciplines spanning seven fields and 62 international Olympic competitions, rigorously examined for data leakage. We argue that the challenges in Olympic competition problems are ideal for evaluating AI's cognitive reasoning due to their complexity and interdisciplinary nature, which are essential for tackling complex scientific challenges and facilitating discoveries. Beyond evaluating performance across various disciplines using answer-only criteria, we conduct detailed experiments and analyses from multiple perspectives. We delve into the models' cognitive reasoning abilities, their performance across different modalities, and their outcomes in process-level evaluations, which are vital for tasks requiring complex reasoning with lengthy solutions. Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy, illustrating current AI limitations in complex reasoning and multimodal integration. Through the OlympicArena, we aim to advance AI towards superintelligence, equipping it to address more complex challenges in science and beyond. We also provide a comprehensive set of resources to support AI research, including a benchmark dataset, an open-source annotation platform, a detailed evaluation tool, and a leaderboard with automatic submission features.

@article{huang2024olympicarena,
      title={OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI},
      author={Zhen Huang and Zengzhi Wang and Shijie Xia and Xuefeng Li and Haoyang Zou and Ruijie Xu and Run-Ze Fan and Lyumanshan Ye and Ethan Chern and Yixin Ye and Yikai Zhang and Yuqing Yang and Ting Wu and Binjie Wang and Shichao Sun and Yang Xiao and Yiyuan Li and Fan Zhou and Steffi Chern and Yiwei Qin and Yan Ma and Jiadi Su and Yixiu Liu and Yuxiang Zheng and Shaoting Zhang and Dahua Lin and Yu Qiao and Pengfei Liu},
      year={2024},
      journal={arXiv preprint arXiv:2406.12753},
      url={https://arxiv.org/abs/2406.12753}
}

Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu*, Zengzhi Wang*, Run-Ze Fan*, Pengfei Liu.
arXiv, 2024
[PDF] [Abstract] [Bib] [Code] [Page] [HuggingFace Demo]

@article{xu2024benchmarking,
      title={Benchmarking Benchmark Leakage in Large Language Models},
      author={Xu, Ruijie and Wang, Zengzhi and Fan, Run-Ze and Liu, Pengfei},
      year={2024},
      journal={arXiv preprint arXiv:2404.18824},
      url={https://arxiv.org/abs/2404.18824}
}

@article{fan2024reformatted,
      title={Reformatted Alignment},
      author={Fan, Run-Ze and Li, Xuefeng and Zou, Haoyang and Li, Junlong and He, Shwai and Chern, Ethan and Hu, Jiewen and Liu, Pengfei},
      year={2024},
      journal={arXiv preprint arXiv:2402.12219},
      url={https://arxiv.org/abs/2402.12219}
}

RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation
Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng.
ECIR, 2024
[PDF] [Abstract] [Bib] [Code]

@inproceedings{fan2024right,
  title={RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation},
  author={Fan, Run-Ze and Fan, Yixing and Chen, Jiangui and Guo, Jiafeng and Zhang, Ruqing and Cheng, Xueqi},
  booktitle={European Conference on Information Retrieval},
  pages={39--55},
  year={2024},
  organization={Springer},
  url={https://link.springer.com/chapter/10.1007/978-3-031-56027-9_3}
}

Generative Judge for Evaluating Alignment
Junlong Li, Shichao Sun, Weizhe Yuan, Run-Ze Fan, Hai Zhao, Pengfei Liu.
ICLR, 2024
[PDF] [Abstract] [Bib] [Code] [Page] [机器之心]

@article{li2023generative,
  title={Generative Judge for Evaluating Alignment},
  author={Li, Junlong and Sun, Shichao and Yuan, Weizhe and Fan, Run-Ze and Zhao, Hai and Liu, Pengfei},
  journal={arXiv preprint arXiv:2310.05470},
  year={2023},
  url={https://arxiv.org/abs/2310.05470}
}

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao.
EMNLP, 2023 (Oral)
[PDF] [Abstract] [Bib] [Code]

@inproceedings{he-etal-2023-merging,
    title = "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts",
    author = "He, Shwai and Fan, Run-Ze and Ding, Liang and Shen, Li and Zhou, Tianyi and Tao, Dacheng",
    editor = "Bouamor, Houda and Pino, Juan and Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.907",
    doi = "10.18653/v1/2023.emnlp-main.907",
    pages = "14685--14691",
    abstract = "Scaling the size of language models usually leads to remarkable advancements in NLP tasks. But it often comes with a price of growing computational cost. Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e.g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility. Can we retain the advantages of adding more experts without substantially increasing the computational costs? In this paper, we first demonstrate the superiority of selecting multiple experts and then propose a computation-efficient approach called \textbf{Merging Experts into One} (MEO), which reduces the computation cost to that of a single expert. Extensive experiments show that MEO significantly improves computational efficiency, e.g., FLOPS drops from 72.0G of vanilla MoE to 28.6G (MEO). Moreover, we propose a token-level attention block that further enhances the efficiency and performance of token-level MEO, e.g., 83.3{\%} (MEO) vs. 82.6{\%} (vanilla MoE) average score on the GLUE benchmark. Our code will be released upon acceptance. Code will be released at: \url{https://github.com/Shwai-He/MEO}.",
}

MerA: Merging Pretrained Adapters For Few-Shot Learning
Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao.
arXiv, 2023
[PDF] [Abstract] [Bib]

@article{he2023mera,
  title={Mera: Merging pretrained adapters for few-shot learning},
  author={He, Shwai and Fan, Run-Ze and Ding, Liang and Shen, Li and Zhou, Tianyi and Tao, Dacheng},
  journal={arXiv preprint arXiv:2308.15982},
  year={2023},
  url={https://arxiv.org/abs/2308.15982}
}

Talks

2025/09: MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning. @Shanghai AI Laboratory. [Slides]
2024/10: Reformatted Alignment. @NICE. [Slides]
2024/10: Reformatted Alignment. @AITIME. [Slides]

Research Experience

Shanghai Jiao Tong University, 2023.05 - 2025.06
Generative AI Research Lab (GAIR)
Research assistant, supervised by Prof. Pengfei Liu.
JD Explore Academy, 2022.12 - 2023.05
Research intern, supervised by Dr. Liang Ding.

Education Experience

University of Massachusetts Amherst, 2025.09 - Present
Manning College of Information & Computer Sciences
Ph.D. in Computer Science, supervised by Prof. Hamed Zamani.
University of Chinese Academy of Sciences, 2021.09 - 2024.06
Institute of Computing Technology
M.S. in Computer Science and Technology, supervised by Prof. Jiafeng Guo.
Shanghai Maritime University, 2017.09 - 2021.06
B.E. in Computer Science and Technology

Blogs

2022-04-08: 信息抽取新SOTA！首个结构化生成式信息抽取预训练模型，一统信息抽取四大任务

Selected Honors & Awards

2026: Jim Gray Scholarship in Computer Science, University of Massachusetts Amherst
2024: Excellent Master’s Graduation Thesis, Institute of Computing Technology
2021: Excellent Bachelor's Graduation Thesis, Shanghai Maritime University
2021: Excellent Graduate, Shanghai Maritime University
2019, 2020, 2021: First Class Scholarship, Shanghai Maritime University

Academic Service

Reviewer:
- ICLR (2025, 2026), NeurIPS (2025), ICML (2026)
- EMNLP (2023), ARR (Feb 2024), EMNLP Industry Track (2023, 2024, 2025), NAACL Industry Track (2025), EACL Industry Track (2026)

Miscellaneous

Powerlifting (3yrs+): At a body weight of 70 kg, my prs are: Bench Press (85kg), Squat (110kg), and Deadlift (155kg).

Email:	`runze.fan(at)icloud(dot)com`
	`runzefan(at)umass(dot)edu`