Research Vision

Applying advanced AI to automate software implementation, testing, and program verification

LLM4SE LLM4FM Code Agents Benchmarking

I am a Research Assistant Professor at the Department of Computer Science and Engineering, The Hong Kong University of Science and Technology. I will join the Department of Computing, Imperial College London as an Assistant Professor in late 2026. I received the PhD degree from the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology (HKUST), under the supervision of Prof. Shing-Chi Cheung in the CASTLE lab.

My research focuses on applying advanced AI techniques to automate software implementation, testing, and program verification, to produce software that is reliable by construction. My research interests lie in the intersection of Software Engineering (SE), Large Language Models (LLMs), with an emphasis on LLM4SE, LLM4FM (formal methods), and LLM Evaluation/Benchmarking. I have 30 publications at top conferences and journals, including ICSE, FSE, ASE, TOSEM, CAV, Usenix Security, AAAI, etc. I serve as a program committee member in top conferences such as ICSE, FSE, ASE, and ISSTA, and am a reviewer for TOSEM, TSE, and EmSE. My doctoral dissertation, "Towards Automatic Testing and Fault Localization in Natural Language Processing Systems", was recognized with the 🏆 ACM SIGSOFT Outstanding Doctoral Dissertation Award for 2025. Additionally, I was honored with the 🏆 2025 Young Scientist Award in Engineering Science (one awardee per year in Hong Kong).

🏆
ACM SIGSOFT Outstanding Dissertation Award 2025
Only 1-2 recipients worldwide per year
🏆
Young Scientist Award in Engineering Science 2025
Only 1 recipient per year in Hong Kong
ICSEFSEASECAVACLICMLAAAIUSENIX SecurityTOSEMTSE ICSEFSEASECAVACLICMLAAAIUSENIX SecurityTOSEMTSE

🔥 Open Positions — Imperial College London

I'm looking for highly self-motivated students who enjoy building agentic systems and solving hard problems. Let's do something interesting and impactful!

Looking for:
  • 2–3 fully funded PhD students
  • Self-funded / visiting PhD students
  • Remote RAs/interns
AI for Software Engineering AI for Formal Verification Coding & Testing Agents

📩 Send CV + representative works + research plan to: jialuncao [at] ust [dot] hk

News
2026.06.25🎉 Two papers accepted by ISSTA 2026! Congrats!
2026.06.20🎉 Two papers accepted by ASE Tool&Dataset 2026! Congrats!
2026.05.02🎉 Two papers accepted by ICML 2026! SWE-ABS and Position: Code Benchmarks.
2026.04Joining Imperial College London as Assistant Professor in late 2026!
2026.02.21Skills-4-SE released — 180+ Claude Skills for SE. [Website]
2026.01.08🎉 Code Translation via Pseudocode accepted by TOSEM 2026. Congrats to Songqiang!
2025.12.17🎉 Paper accepted by ICSE 2026. Congrats to Ruiyang!
2025.12.13🏆 Honored with Young Scientist Award in Engineering Science. [News]
2025.05.15🎉 From Informal to Formal and CruxEval-X accepted by ACL 2025 Main!
2025.03.08HuggingFace repo reached 1.6k+ downloads. Social media reached 37k+ reads.
Visiting & Program Experience
2024.10Honored to visit Prof. Michael Pradel at the University of Stuttgart.
2024.03Honored to visit Prof. Pinjia He at the Chinese University of Hong Kong, Shenzhen.
Research Topics
01

LLM Benchmark

Rigorous evaluation of LLMs for code generation, reasoning, and SE tasks.

02

LLM for SE

Natural Language

Write a function that takes a list of integers and returns the second largest unique element. Return None if it doesn't exist.

LLM
Generated Code
def second_largest(nums):
unique = list(set(nums))
if len(unique) < 2:
return None
unique.sort(reverse=True)
return unique[1]
03

LLM for Formal Methods

C Code
int abs_val(int x) {
if (x < 0)
return -x;
return x;
}
LLM
C + ACSL Spec
/*@ requires x > INT_MIN;
@ ensures \result >= 0;
@ ensures \result == x
@ || \result == -x;
@*/
int abs_val(int x) {
if (x < 0)
return -x;
return x;
}
04

SE for AI

Input Hidden 1 Hidden 2 Output
Scanning... 0 bugs found
LLM Benchmark7+
code generationcode reasoningdata contaminationmultilingualadversarial
LLM for SE8+
code translationbug reproductionprogram repairembeddedAPI docs
LLM for Formal Methods5+
theorem provingspecificationmodel checkingTLA+formal proofs
SE for AI8+
NLP testingDL testingfault localizationmetamorphicsecurity
ICML '26

Position: Code Benchmarks Should Prioritize Rigor, Reliability, and Reproducibility

Jialun Cao, Yuk-Kit Chan*, Zixuan Ling*, Wenxuan Wang†, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung
ACL '25

From Informal to Formal — Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs

Jialun Cao, Yaojie Lu, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen, Le Sun, Hongyu Zhang, Shengchao Qin, Shing-Chi Cheung, Cong Tian
1.6k+ HuggingFace downloads · 37k+ social media reads · 2k+ forwards
CAV '24

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

Cheng Wen, Jialun Cao (Corresponding), Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian
ICSE '22

DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs

Jialun Cao, Meiziniu Li, Xiao Chen, Ming Wen, Yongqiang Tian, Bo Wu, Shing-chi Cheung
Publications
30+
Google Scholar Citations
202611 papers
ICMLICSEFSEACLFMTOSEM
[C26] Ruiyang Xu, Jialun Cao (Co-1st), Mingyuan Wu, Wenliang Zhong, Yaojie Lu, Ben He, Xianpei Han, Shing-Chi Cheung, Le Sun. EmbedAgent: Benchmarking Large Language Models in Embedded System Development. In ICSE 2026. [Paper]
[J5] Songqiang Chen, Congying Xu, Jingyi Chen, Jialun Cao (Corresponding), Jiarong Wu, Shing-Chi Cheung. Can Emulating Semantic Translation Help LLMs with Code Translation? A Study Based on Pseudocode. In TOSEM 2026. [Paper]
[C27] Qiming Zhu, Jialun Cao, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung. Across Programming Language Silos: A Study on Cross-Lingual Retrieval-augmented Code Generation. In ACL findings 2026. [Paper]
[C28] Zhiyong Chen, Jialun Cao, Chang Xu, Shing-Chi Cheung. ModelWisdom: An Integrated Toolkit for TLA+ Model Visualization, Digest and Repair. In FM 2026 Tool track. [Paper] [Website]
[C29] Junyi Wang, Jialun Cao, Zhongxin Liu. iCoRe: Iterative Correlation-Aware Retriever for Bug Reproduction. FSE 2026. [Paper]
[C30] Jialun Cao, Yuk-Kit Chan*, Zixuan Ling*, Wenxuan Wang†, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung. Position: Code Benchmarks Should Prioritize Rigor, Reliability, and Reproducibility. ICML Position 2026. [Paper]
[C31] Boxi Yu, Yang Cao, Yuzhong Zhang, Liting Lin, Junjielong Xu, Zhiqing Zhong, Qinghua Xu, Guancheng Wang, Jialun Cao, Shing-Chi Cheung, Pinjia He, Lionel Briand. SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark. ICML 2026. [Paper]
[J6] Jingyi Chen, Songqiang Chen, Jialun Cao (Corresponding), Jiasi Shen, Shing-Chi Cheung. When Retrieval Augmentation Meets API Documentation: Can LLMs Code with Less-Common Libraries? TOSEM 2026. [Paper]
[Pre8] Zhiyong Chen, Jialun Cao, Jiarong Wu, Chang Xu, Shing-Chi Cheung. Can Large Language Models Model Programs Formally? arXiv. [Paper]
[Pre9] Dong Xu, Jialun Cao, Guozhao Mo, Junjie Hu, Cheng Wen, Hongyu Lin, Xianpei Han, Shengchao Qin, Cong Tian, Shing-Chi Cheung, Le Sun, Yaojie Lu. LiveFMBench: Unveiling the Power and Limits of Agentic Workflows in Specification Generation. arXiv. [Paper] [HF]
[Pre10] Yuanyi Wang, Yifan Yang, Su Lu, Yanggan Gu, Pengkai Wang, Wenjun Wang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Jialun Cao, Shing-Chi Cheung, Hongxia Yang. Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training. arXiv. [Paper]
202516 papers
ACLFSEAAAIASETOSEM
[C19] Jialun Cao, Songqiang Chen, Wuqi Zhang, Hau Ching Lo, Yeting Li, Shing-Chi Cheung. CodeCleaner: Elevating Standards with A Robust Data Contamination Mitigation Toolkit. Internetware 2025. [Paper] [Code]
[J3] Jialun Cao, Meiziniu Li, Ming Wen, Shing-chi Cheung. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. ASEJ. [arxiv] [Official]
[C20] Jialun Cao, Yaojie Lu, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen, Le Sun, Hongyu Zhang, Shengchao Qin, Shing-Chi Cheung, Cong Tian. From Informal to Formal — Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs. ACL Main 2025. [Paper] [HF] [Media]
[C21] Xiao Chen, Hengcheng Zhu, Jialun Cao (Corresponding), Ming Wen, Shing-Chi Cheung. SemBIC: Semantic-aware Identification of Bug-inducing Commits. FSE 2025. [Paper]
[C22] Qiming Zhu, Jialun Cao (Co-1st), Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Le Sun, Shing-Chi Cheung. DomainEval: An Auto-Constructed Benchmark for Multi-Domain Code Generation. AAAI 2025. [Paper] [Leaderboard] [Code]
[C23] Ruiyang Xu, Jialun Cao (Co-1st), Yaojie Lu, Ming Wen, Hongyu Lin, Xianpei Han, Ben He, Shing-Chi Cheung, Le Sun. CruxEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution. ACL Main 2025. [Paper] [Leaderboard] [Code]
[C24] Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, Zhongming Jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu. ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI 2025. [Paper]
[C25] Xingchu Chen, Chengwei Liu, Jialun Cao, Yang Xiao, Xinyue Cai, Yeting Li, Jingyi Shi, Tianqi Sun, Haiming Chen, Wei Huo. Vulnerability-Affected Versions Identification: How Far Are We? ASE 2025.
[J4] Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung. Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries. TOSEM 2025. [Paper]
[Pre1] Jialun Cao, Yuk-Kit Chan*, Zixuan Ling*, Wenxuan Wang†, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung. How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs. arXiv. [Paper]
[Pre2] Jialun Cao, Wuqi Zhang, Shing-Chi Cheung. Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. arXiv. [Paper]
[Pre3] Jiarong Wu, Songqiang Chen, Jialun Cao (Corresponding), Hau Ching Lo, Shing-Chi Cheung. Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval. arXiv. [Paper]
[Pre4] Dekun Dai, MingWei Liu, Anji Li, Jialun Cao, Yanlin Wang, Chong Wang, Xin Peng, Zibin Zheng. FeedbackEval: A Benchmark for Evaluating Large Language Models in Feedback-Driven Code Repair Tasks. arXiv. [Paper]
[Pre5] Xiaolei Li, Jialun Cao, Yepang Liu, Shing-Chi Cheung, Hailong Wang. ReuseDroid: A VLM-empowered Android UI Test Migrator Boosted by Active Feedback. arXiv. [Paper]
[Pre6] Le Deng, Zhonghao Jiang, Jialun Cao, Michael Pradel, Zhongxin Liu. NoCode-bench: A Benchmark for NL-Driven Feature Addition. arXiv. [Paper] [Leaderboard] [HF]
[Pre7] Dongze Li, Songqiang Chen, Jialun Cao (Corresponding), Shing-Chi Cheung. What Builds Effective In-Context Examples for Code Generation? arXiv. [Paper]
20246 papers
ASECAVAPSECTASE
[C13] Jialun Cao, Zhiyong Chen*, Jiarong Wu, Shing-chi Cheung, Chang Xu. JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models. ASE 2024. [Paper] [Leaderboard] [Code]
🏆 [C14] Distinguished paper award. Zongze Jiang, Ming Wen, Jialun Cao, Xuanhua Shi, Hai Jin. Towards Understanding the Effectiveness of Large Language Models on Directed Test Input Generation. ASE 2024. [Paper] [Code]
[C15] Congying Xu, Songqiang Chen, Jiarong Wu, Valerio Terragni, Shing-chi Cheung, Hengcheng Zhu, Jialun Cao (Corresponding). MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing. ASE 2024. [Paper]
[C16] Cheng Wen, Jialun Cao (Corresponding), Jie Su, Zhiwu Xu, Shengchao Qin, Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian. Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification. CAV 2024. [Paper] [Homepage]
[C17] Bo Yang, Jiawei Hu, Jialun Cao (Corresponding). SDEFL: A Lightweight Fault Detection and Localization Method for Deep Neural Networks. APSEC 2024.
[C18] Kunpeng Jian, Yanyan Zou, Yeting Li, Jialun Cao, Menghao Li, Jian Sun, Jingyi Shi, Wei Huo. Fuzzing for Stateful Protocol Implementations: Are We There Yet? TASE 2024.
20233 papers
FSETOSEM
[C11] Jialun Cao, Yaojie Lu, Ming Wen, Shing-Chi Cheung. Testing Coreference Resolution Systems without Labeled Test Sets. ESEC/FSE 2023. [Paper] [Code]
[C12] Xiaohu Du, Xiao Chen, Jialun Cao, Ming Wen, Shing-Chi Cheung, Hai Jin. Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems. ESEC/FSE 2023. [Paper]
[J2] Meiziniu Li, Jialun Cao, Yongqiang Tian, Tsz On Li, Ming Wen, Shing-Chi Cheung. COMET: Coverage-guided Model Generation For Deep Learning Library Testing. TOSEM. [Paper] [Code]
20223 papers
ICSEUSENIX SecTOSEM
[J1] Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-chi Cheung. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems. TOSEM. [Paper] [Code]
[C9] Jialun Cao, Meiziniu Li, Xiao Chen, Ming Wen, Yongqiang Tian, Bo Wu, Shing-chi Cheung. DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs. ICSE 2022. [Paper] [Code]
[C10] Yeting Li, Yecheng Sun, Zhiwu Xu, Jialun Cao, Yuekang Li, Rongchen Li, Haiming Chen, Shing-Chi Cheung, Yang Liu, Yang Xiao. RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix. USENIX Security 2022. [Paper]
20212 papers
ICSEUSENIX Sec
[C7] Yeting Li, Zixuan Chen, Jialun Cao, Zhiwu Xu, Qiancheng Peng, Haiming Chen, Liyuan Chen, Shing-Chi Cheung. ReDoSHunter: A Combined Static and Dynamic Approach for Regular Expression DoS Detection. USENIX Security 2021. [Paper]
[C8] Yeting Li, Shuaimin Li, Zhiwu Xu, Jialun Cao, Zixuan Chen, Yun Hu, Haiming Chen, Shing-Chi Cheung. TransRegex: Multi-modal Regular Expression Synthesis by Generate-and-Repair. ICSE 2021. [Paper]
20202 papers
ICDEASE
[C5] Yeting Li, Jialun Cao, Haiming Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng. FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data. ICDE 2020. [Paper]
[C6] Yeting Li, Zhiwu Xu, Jialun Cao, Haiming Chen, Tingjian Ge, Shing-Chi Cheung. FlashRegex: Deducing Anti-ReDoS Regexes from Examples. ASE 2020. [Paper]
20192 papers
ICCDDASFAA
[C3] Yongjian Li, Jialun Cao (1st student author), Jun Pang. A Learning-Based Framework for Automatic Parameterized Verification. ICCD 2019. [Paper]
[C4] Yeting Li, Xiaolan Zhang, Jialun Cao, Haiming Chen, Chong Gao. Learning k-Occurrence Regular Expressions with Interleaving. DASFAA 2019. [Paper]
20182 papers
ASEQRS
[C1] Jialun Cao, Yongjian Li, Jun Pang. L-CMP: an automatic learning-based parameterized verification tool. ASE 2018 Demo. [Paper] [Code] [Video]
[C2] Yongjian Li, Jialun Cao (1st student author), Kaiqiang Duan. An automatic parameterized verification of FLASH cache coherence protocol. QRS 2018. [Paper]
Teaching
2025 SprInstructor in COMP 1021 — Introduction to Computer Science. [Materials]
2023 FallTeaching Assistant in COMP 1021 — Introduction to Computer Science.
2020 FallTeaching Assistant in COMP 3021 — Java Programming.
2020 SprTeaching Assistant in COMP 3021 — Java Programming.
Honors and Awards
  • 2025ACM SIGSOFT Outstanding Doctoral Dissertation Award (1~2 worldwide/yr)
  • 2024Rising Stars Women in Engineering Workshop (Shortlisted)
  • 2024 — Hong Kong Postgraduate Studentship
  • 2024 — ACM SIGSOFT CAPS Travel Grant (ASE 2024)
  • 2023 — ACM SIGSOFT CAPS Travel Grant (ESEC/FSE 2023)
  • 2019-2023 — Huawei Fellowship Scholarship
  • 2017 — China National Scholarship (Postgraduate, Rank 1/106, Top 1%)
  • 2014 — China National Scholarship (Undergraduate, Rank 1/52, Top 2%)
Working Experience
2024.08 - now
Research Assistant Professor
Department of Computer Science and Engineering, HKUST
2024.04 - 2024.07
Postdoctoral Fellow
HKUST, working with Prof. Shing-Chi Cheung
2023.07 - 2023.09
Intern at Huawei (Hong Kong) — Fermat Lab
2022.09 - 2023.01
Intern at Huawei — Trusted Software Engineering and Open Source Laboratory
2021.09 - 2022.08
Intern at Huawei — Theory Lab
Service
Program Committee Member
Session Chair
Reviewer
TOSEM · TSE · EMSE · JASE · JSME · TKDD · TMC
Invited Talks
2025.04Exploring Code Generation and Reasoning Capabilities of LLMs. Peking University. [Link]
2025.01From Benchmarks to Practice. Huawei AI R&D Seminar. [Recording]
2025.01From Requirement to Formal Specification via LLMs. Zhejiang University.
2024.11Is LLM a Rescue for Code Generation & Reasoning? CCF China Open Source Conference.
2024.10Automatic Testing and Verification using LLMs. Xidian University.
2024.08Trusted Architecture of Intelligent CPS. Fudan University. [Link]
2024.08Can AI be a Panacea for Software Reliability? University College London.
2024.07Data Contamination in Code LMs. IEEE Cloud & AI Symposium.
2024.05From Requirement to Formal Specification via LLMs. CCF FM Seminar. [Recording]
2023.12Crafting Future: A Dancer's Leap into CS. ChinaSoft Women Scholars Forum.
2023.12Data Contamination in the Era of LLMs. ChinaSoft AIGC Forum.
Visitors
0Unique Visitors
0Countries / Regions
Copyright © 2026 Jialun Cao
🐠 guppyLLM-9M
Hi! I'm guppyLM-9M-fish raised by Jialun. Ask about Jialun's research, awards, or open positions!
Chat with Jialun's Fish
LLM