✨ News
[2026.04]
Mixture-of-Experts Can Surpass Dense LLMs Under Strictly Equal Resource was accepted as an
[2025.12]
Predictable Scale (Part II) -- Farseer: A Refined Scaling Law in LLMs was accepted as a
[2025.07]
Step3(321B-A38B) is opensourced. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning.
[2025.06]
More than 4 hundreds language models are opensourced, a very large sweep of models up to ~10^10 params and ~10^11 tokens, to boost the area of scaling law.
[2025.04]
We released almost 4000 models under various hyper-parameter and various setting. This is largest open-source project in optimal hyper-parameter for LLM pre-training.
[2025.01]
Step2-mini, a LLM with extremely fast inference speed and very low cost, is now available on the open platform. The price is only 1 RMB for one million tokens. (introduction in Chinese)
📝 Selected Publications
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
, Ka Man Lo*, Ziqi Wang, Zili Wang, Wenzhen Zheng, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
ICLR 2026 Oral
PDF
Models
Homepage
Poster
Slides
Predictable Scale: Part II, Farseer: A Refined Scaling Law in Large Language Models
, Wenzhen Zheng*, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
NeurIPS 2025 Spotlight
PDF
Github
Models
WanDB
Homepage
Poster
Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
, Wenzhen Zheng*, Qiufeng Wang, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Zhenyu Ding, Haoying Wang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
(* = Equal Contribution)
PDF
Github
Models
WanDB
Homepage
Media (Chinese)
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
The StepFun Team
(: Core Model Architecture Contributor)
(Technical Report)
PDF
Github
Huggingface
Media (Chinese)
Multi-matrix Factorization Attention
Jingcheng Hu*, , Yinmin Zhang, Zili Wang, Shuigeng Zhou, Xiangyu Zhang, Heung-Yeung Shum, Daxin Jiang
(* = Equal Contribution)
ACL 2025
PDF
Media (Chinese)
Path-based Deep Network for Candidate Item Matching in Recommenders
, Zhihong Chen, Chenliang Li, Rong Xiao, Hongbo Deng, Peng Zhang, Yongchao Liu, Haihong Tang
SIGIR 2021
PDF
Media (Chinese)
GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy
Yongchao Liu*, , Guowei Zhang, Xintan Zeng, Yongyong Li, Bin Huang, Peng Zhang, Zhao Li, Xiaowei Zhu, Changhua He, Wenguang Chen
(* = Equal Contribution)
PDF
📞 Contact
Please feel free to contact me via my email (left) if you are interested in our papers, my experience, or you just have any problem on research which I may help.