Houyi Li @ Stepfun (Updated on Aug. 2025)

Houyi Li (李厚意)

Google Scholar

Twitter

I was previously the Pretrain Team Leader at StepFun, reporting to Xiangyu Zhang. I led pre-training for the Step3 (321B) model and served as the end-to-end Lead for Step2-mini (pre-training and post-training). I also contributed to the model architecture design of Step3.5-flash (198B). I also built StepFun’s pre-training team from the ground up.

Prior to StepFun, I was the LLM Lead at Alibaba International in 2023, reporting to Kaifu Zhang. I managed the full model lifecycle, from pre-training and post-training to driving the core LLM technology for commercial initiatives such as Aidge . I led the team that delivered multilingual large models Marco-7B and Marco-13B, and pioneered their path to commercial viability.

My earlier experience at Taobao and Kuaishou focused on personalized recommendation and advertising. I specialized in deploying deep learning at scale in consumer-facing (ToC) products to drive measurable business impact and boost profitability. In the same timeframe, I developed the classic personalized retrieval algorithm PDN (paper).

After graduating from Xidian University, I began my career in Ant Group's Infrastructure(Infra) team, where I spent four years with a focus on AI Infra. A key achievement from this period was developing the distributed training framework GraphTheta from scratch, one of the early frameworks in China to support thousand-machine parallel training.

[2026.04] New Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource was accepted as an ICLR 2026 Oral. I gave the oral presentation in Brazil on April 25, 2026. Camera ready / Poster / Slides

[2025.12] New Predictable Scale (Part II) -- Farseer: A Refined Scaling Law in LLMs was accepted as a NeurIPS 2025 Spotlight. Camera ready / Poster

[2025.07] New Step3(321B-A38B) is opensourced. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning.

[2025.06] New More than 4 hundreds language models are opensourced, a very large sweep of models up to ~10^10 params and ~10^11 tokens, to boost the area of scaling law.

[2025.04] New We released almost 4000 models under various hyper-parameter and various setting. This is largest open-source project in optimal hyper-parameter for LLM pre-training.

[2025.01] New Step2-mini, a LLM with extremely fast inference speed and very low cost, is now available on the open platform. The price is only 1 RMB for one million tokens. (introduction in Chinese)

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
Houyi Li*, Ka Man Lo*, Ziqi Wang, Zili Wang, Wenzhen Zheng, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
ICLR 2026 Oral
PDF | Models | Homepage | Poster | Slides

Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
Houyi Li*, Wenzhen Zheng*, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
NeurIPS 2025 Spotlight
PDF | Github | Models | WanDB | Homepage | Poster

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
Houyi Li*, Wenzhen Zheng*, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Zhenyu Ding, Haoying Wang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
PDF | Github | Models | WanDB | Homepage | Media (Chinese)

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
The StepFun Team
(Houyi Li: Core Model Architecture Contributor)
(Technical Report)
PDF | Github | Huggingface | Media (Chinese)

Multi-matrix Factorization Attention
Jingcheng Hu*, Houyi Li*, Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, Daxin Jiang
(* = Equal Contribution)
ACL 2025
PDF | Media (Chinese)

Path-based Deep Network for Candidate Item Matching in Recommenders
Houyi Li, Zhihong Chen, Chenliang Li, Rong Xiao, Hongbo Deng, Peng Zhang, Yongchao Liu, Haihong Tang
SIGIR 2021
PDF | Media (Chinese)

GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy
Yongchao Liu*, Houyi Li*, Guowei Zhang, Xintan Zeng, Yongyong Li, Bin Huang, Peng Zhang, Zhao Li, Xiaowei Zhu, Changhua He, Wenguang Chen
(* = Equal Contribution)
PDF

Please feel free to contact me via my email (left) if you are interested in our papers, my experience, or you just have any problem on research which I may help.

Ant Group

2016 - 2020

Alibaba Group

2020 - 2021 & 2023 - 2024

Kuaishou Group

2021 - 2023

StepFun

2024 - Present