I am a second-year PhD student (2024-2028) at City University of Hong Kong under the supervision of Prof. Dapeng Oliver Wu. I received my Bachelor and Master degrees from Beijing Institute of Technology (2017-2024) under the supervision of Prof. Chi Harold Liu.

I am now working on LLM reasoning, post-training, and deep reinforcement learning. I have published several papers at top international conferences (e.g., ACL, NeurIPS, KDD, INFOCOM) and journals (e.g., ToN, TC, JSAC).

Contact: If you’re interested in discussing or collaborating, please feel free to email me at hao.wang@my.cityu.edu.hk.

🔥 News

2026.04: Our paper “CurioSFT: Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models” is accepted by ACL 2026 Main as an Oral paper.

📝 Publications

🤖 LLM Post-Training

CurioSFT: Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models, Hao Wang, Hao Gu, Hongming Piao, Kaixiong Gong, Yuxiao Ye, Xiangyu Yue, Sirui Han, Yike Guo, Dapeng Wu, ACL 2026 (Oral), 2026.
DEEPMED: Building a Medical DeepResearch Agent via Multi-hop Med-Search Data and Turn-Controlled Agentic Training & Inference, Zihan Wang, Hao Wang (Equal Contribution), Shi Feng, Xiaocui Yang, Daling Wang, Yiqun Zhang, Jinghao Lin, Haihua Yang, Xiaozhong Ji, Findings of ACL 2026, 2026.
QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training–Inference Mismatch, Hao Gu, Hao Wang, Jiacheng Liu, Lujun Li, Qiyuan Zhu, Bei Liu, Binxing Xu, Lei Wang, Xintong Yang, Sida Lin, Sirui Han, Yike Guo, Findings of ACL 2026, 2026.
Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs, Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo, ACL 2026, 2026.
A^3E: Towards Compositional Model Editing, Hongming Piao, Hao Wang, Dapeng Wu, Ying Wei, NeurIPS 2025, 2025.
GTPO and GRPO-S: Token and Sequence-Level Reward Shaping with Policy Entropy, Hongze Tan, Zihan Wang, Jianfei Pan, Jinghao Lin, Hao Wang, Yifan Wu, Tao Chen, Zhihang Zheng, Zhihao Tang, Haihua Yang, ICML 2026, 2026.

🚖 Recommender System

HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning, Hao Wang, Bo Tang, Chi Harold Liu, Shangqin Mao, Jiahong Zhou, Zipeng Dai, Yaqi Sun, Qianlong Xie, Xingxing Wang, Dong Wang, IEEE Transactions on Computers, 2024.

🙋‍♂️ Mobile Crowdsensing

Multi-Task-Oriented Emergency-Aware UAV Crowdsensing: A Hierarchical Multi-Agent Deep Reinforcement Learning Approach, Chen Fang, Chi Harold Liu, Hao Wang, Guangpeng Qi, Zhongyi Liu, Dapeng Wu, IEEE Journal on Selected Areas in Communications (JSAC), 2026.
Indoor Periodic Fingerprint Collections by Vehicular Crowdsensing via Primal-Dual Multi-Agent Deep Reinforcement Learning, Haoming Yang, Qiran Zhao, Hao Wang, Chi Harold Liu, Guozheng Li, Guoren Wang, Jian Tang, Dapeng Wu, IEEE Journal on Selected Areas in Communications (JSAC), 2024.
QoI-Aware Mobile Crowdsensing for Metaverse by Multi-Agent Deep Reinforcement Learning, Yuxiao Ye, Hao Wang (Equal Contribution), Chi Harold Liu, Zipeng Dai, Guozheng Li, Guoren Wang, Jian Tang, IEEE Journal on Selected Areas in Communications (JSAC), 2024.
Ensuring Threshold AoI for UAV-assisted Mobile Crowdsensing by Multi-Agent Deep Reinforcement Learning with Transformer, Hao Wang, Chi Harold Liu, Haoming Yang, Guoren Wang, Kin K Leung, IEEE Transactions on Networking (ToN), 2023.
Energy-Efficient 3D Vehicular Crowdsourcing for Disaster Response by Distributed Deep Reinforcement Learning, Hao Wang, Chi Harold Liu, Zipeng Dai, Jian Tang, Guoren Wang, ACM SIGKDD 2021 (best paper award runner up).
Mobile Crowdsensing for Data Freshness: A Deep Reinforcement Learning Approach, Zipeng Dai, Hao Wang, Chi Harold Liu, Rui Han, Jian Tang, Guoren Wang, IEEE INFOCOM 2021 (Oral).

🎖 Honors and Awards

2024.05 Hong Kong PhD Fellowship Scheme (HKPFS, 350 awardees per year)
2022.10 National Scholarship (Top 1%)
2021.08 Best Paper Award, Applied Data Science Runner Up of KDD 2021 (1/238)
2021.06 Xu Teli Scholarship for Undergraduates (most prestigious scholarship at BIT, annual awarded 10 undergraduates)

📖 Educations

2024.09 - Now, PhD, City University of Hong Kong, Hong Kong.
2021.09 - 2024.06, Master, Beijing Institute of Techonolgy, Beijing.
2017.09 - 2021.06, Undergraduate, Beijing Institute of Techonolgy, Beijing.
2014.09 - 2017.06, Tsinghua University High School, Beijing.

💻 Internships

2023.08 - 2023.12, Tencent AI Lab, Shenzhen. (Mentor: Haobo Fu)
2022.11 - 2023.04, Meituan Advertising Group, Beijing. (Mentor: Bo Tang)

🤝 Partner Links

Zipeng Dai, my senior during master’s studies, ENFP, helpful and energetic. You can always trust him to come up with a doable solution ^_^.
Yuxiao Ye, a talented player and research genius, is currently seeking a PhD position for Fall 2025.

Design and source code inspired from Yi Ren’s awesome template.

Hao Wang (王昊)