Houyi Li (李厚意)

Profile photo
Google Scholar
Email
Twitter
I was previously the Pretrain Team Leader at StepFun, reporting to Xiangyu Zhang. I led pre-training for the Step3 (321B) model and served as the end-to-end Lead for Step2-mini (pre-training and post-training). I also contributed to the model architecture design of Step3.5-flash (198B). I also built StepFun’s pre-training team from the ground up.

Prior to StepFun, I was the LLM Lead at Alibaba International in 2023, reporting to Kaifu Zhang. I managed the full model lifecycle, from pre-training and post-training to driving the core LLM technology for commercial initiatives such as Aidge . I led the team that delivered multilingual large models Marco-7B and Marco-13B, and pioneered their path to commercial viability.

My earlier experience at Taobao and Kuaishou focused on personalized recommendation and advertising. I specialized in deploying deep learning at scale in consumer-facing (ToC) products to drive measurable business impact and boost profitability. In the same timeframe, I developed the classic personalized retrieval algorithm PDN (paper).

After graduating from Xidian University, I began my career in Ant Group's Infrastructure(Infra) team, where I spent four years with a focus on AI Infra. A key achievement from this period was developing the distributed training framework GraphTheta from scratch, one of the early frameworks in China to support thousand-machine parallel training.

✨ News

[2026.04] Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource was accepted as an ICLR 2026 Oral. I gave the oral presentation in Brazil on April 25, 2026. Camera ready / Poster / Slides

[2025.12] Predictable Scale (Part II) -- Farseer: A Refined Scaling Law in LLMs was accepted as a NeurIPS 2025 Spotlight. Camera ready / Poster

[2025.07] Step3(321B-A38B) is opensourced. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning.

[2025.06] More than 4 hundreds language models are opensourced, a very large sweep of models up to ~10^10 params and ~10^11 tokens, to boost the area of scaling law.

[2025.04] We released almost 4000 models under various hyper-parameter and various setting. This is largest open-source project in optimal hyper-parameter for LLM pre-training.

[2025.01] Step2-mini, a LLM with extremely fast inference speed and very low cost, is now available on the open platform. The price is only 1 RMB for one million tokens. (introduction in Chinese)

📝 Selected Publications

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
Houyi Li*, Ka Man Lo*, Ziqi Wang, Zili Wang, Wenzhen Zheng, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
ICLR 2026 Oral
PDF | Models | Homepage | Poster | Slides

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
Houyi Li*, Wenzhen Zheng*, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
NeurIPS 2025 Spotlight
PDF | Github | Models | WanDB | Homepage | Poster

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
Houyi Li*, Wenzhen Zheng*, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Zhenyu Ding, Haoying Wang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
PDF | Github | Models | WanDB | Homepage | Media (Chinese)

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
The StepFun Team
(Houyi Li: Core Model Architecture Contributor)
(Technical Report)
PDF | Github | Huggingface | Media (Chinese)

Multi-matrix Factorization Attention
Jingcheng Hu*, Houyi Li*, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, Daxin Jiang
(* = Equal Contribution)
ACL 2025
PDF | Media (Chinese)

Path-based Deep Network for Candidate Item Matching in Recommenders
Houyi Li, Zhihong Chen, Chenliang Li, Rong Xiao, Hongbo Deng, Peng Zhang, Yongchao Liu, Haihong Tang
SIGIR 2021
PDF | Media (Chinese)

GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy
Yongchao Liu*, Houyi Li*, Guowei Zhang, Xintan Zeng, Yongyong Li, Bin Huang, Peng Zhang, Zhao Li, Xiaowei Zhu, Changhua He, Wenguang Chen
(* = Equal Contribution)
PDF

📞 Contact

Please feel free to contact me via my email (left) if you are interested in our papers, my experience, or you just have any problem on research which I may help.
Ant Group
2016 - 2020
Alibaba Group
2020 - 2021 & 2023 - 2024
Kuaishou Group
2021 - 2023
StepFun
2024 - Present