Jialun CAO received her PhD degree from the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology (HKUST), under the supervision of Prof. Shing-Chi Cheung in the CASTLE lab. She is now a Research Assistant Professor in HKUST.
Her research interests lie in the intersection of Software Engineering (SE) and Large Language Models (LLMs), with an emphasis on LLM4SE, and LLM Evaluation. She has published more than 20 papers at the top conferences and journals, including ICSE, FSE, ASE, TOSEM, CAV, Usenix Security, AAAI, etc. She serves as a program committee member in top conferences such as ICSE, FSE, and ASE, SANER, Internetware, APSEC, etc; and is a reviewer for top journals including TOSEM, TSE, EmSE, etc.
๐ฅ News
- 2025.03.31: ย ๐๐ Our paper โCodeCleaner: Elevating Standards with A Robust Data Contamination Mitigation Toolkitโ has been accepted by Internetware 2025. Congrats!
- 2025.03.08: Our ๐คHuggingface repo reached 1.6k+ downloads. Contributions are welcomed ๐ [Link] The Chinese article has reached 37k+ reads๐ and 2k+ forwardsโ๏ธ. Full paper ๐ [Paper]
- 2025.02.12: ย ๐๐ Our paper ๐ โSemBIC: Semantic-aware Identification of Bug-inducing Commitsโ has been accepted to FSE 2025. Congrats to Xiao!
- 2025.02.07: ย ๐๐ Honored to appear in CSE Department News page, titled โDr. Jialun Cao Receives ACM SIGSOFT Outstanding Dissertation Awardโ at ๐[News]
- 2025.02.02: ย ๐๐ Our paper ๐ โA study on Prompt Design, Advantages and Limitations of ChatGPT for Deep Learning Program Repairโ has been accepted to JASE 2025. Finally!
- 2025.01.23: ย ๐๐ I am honored to receive the prestigious ๐ACM SIGSOFT 2025 Outstanding Dissertation Award! Only 1 or 2 award receivers worldwide per year ๐
- 2024.12.10: ย ๐๐ Our paper ๐ โDomainEval: An Auto-Constructed Benchmark for Multi-Domain Code Generationโ has been accepted to AAAI 2025. Congrats to Qiming!
- 2024.12.10: ย ๐๐ Our paper ๐ โICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderationโ has been accepted to AAAI 2025. Congrats to Mengyang!
๐ Visiting & Program Experience
-
2024.12. I am honored to be mentored by Prof. Paola Ricaurte Quijano at Harvard University in the 5th cohort of the Asia Pacific Women in Leadership (APWiL) Mentoring Program.
- 2024.10. I am honored to visit Prof. Michael Pradel at the University of Stuttgart.
- 2024.03. I am honored to visit Prof. Pinjia He at the Chinese University of Hong Kong, Shenzhen.
๐ Publications
2025
๐ Jialun Cao, Songqiang Chen, Wuqi Zhang, Hau Ching Lo, Shing-Chi Cheung. CodeCleaner: Elevating Standards with A Robust Data Contamination Mitigation Toolkit. In Internetware 2025. ๐[Paper] ๐ป [Github]
๐ Jialun Cao, Meiziniu Li, Ming Wen, Shing-chi Cheung. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. In Journal of Automated Software Engineering (ASEJ). ๐[paper]
๐ Xiao Chen, Hengcheng Zhu, Jialun Cao (Corresponding), Ming Wen, Shing-Chi Cheung (Corresponding). SemBIC: Semantic-aware Identification of Bug-inducing Commits. In the ACM International Conference on the Foundations of Software Engineering (FSE 2025). ๐[paper]
๐ Qiming Zhu, Jialun Cao (Co-1st), Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Le Sun, Shing-Chi Cheung. DomainEval: An Auto-Constructed Benchmark for Multi-Domain Code Generation. In 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025). ๐[Paper] ๐ฏ[Leaderboard] ๐ป [Github]
๐ Mengyang Wu, Yuzhi Zhao, Jialun Cao, Mingjie Xu, zhongming jiang, Xuehui Wang, Qinbin Li, Guangneng Hu, Shengchao Qin, Chi-Wing Fu. ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. In 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025). ๐[Paper]
๐ Jialun Cao, Yuk-Kit Chan, Zixuan Ling, Wenxuan Wangโ , Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung. How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs. In arXiv 2025. ๐[Paper]
๐ Jialun Cao, Yaojie Lu, Meiziniu Li, Haoyang Ma, Haokun Li, Mengda He, Cheng Wen, Le Sun, Hongyu Zhang, Shengchao Qin, Shing-Chi Cheung, Cong Tian. From Informal to Formal โ Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs. In arXiv 2025. ๐[Paper] ๐ค [Huggingface]
๐ Jialun Cao, Wuqi Zhang, Shing-Chi Cheung. Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. In arXiv. ๐[Paper]
๐ Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung. DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis. In arXiv. ๐[Paper]
๐ Jingyi Chen, Songqiang Chen, Jialun Cao (Corresponding), Jiasi Shen (Corresponding), Shing-Chi Cheung. When LLMs Meet API Documentation: Can Retrieval Augmentation Aid Code Generation Just as It Helps Developers? In arXiv. ๐[Paper]
2024
๐ Jialun Cao, Zhiyong Chen*, Jiarong Wu, Shing-chi Cheung, Chang Xu. JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language Models. In the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024). ๐[Paper] ๐ฏ[Leaderboard] ๐ป [Github]
๐ ๐ Distinguished paper award. Zongze Jiang, Ming Wen, Jialun Cao, Xuanhua Shi and Hai Jin. Towards Understanding the Effectiveness of Large Language Models on Directed Test Input Generation. In the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024). ๐[Paper] ๐ป [Github]
๐ Congying Xu, Songqiang Chen, Jiarong Wu, Valerio Terragni, Shing-chi Cheung (Corresponding), Hengcheng Zhu, Jialun Cao (Corresponding). MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing. In the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024). ๐[Paper]
๐ Cheng Wen, Jialun Cao (Corresponding), Jie Su, Zhiwu Xu, Shengchao Qin (Corresponding), Mengda He, Haokun Li, Shing-Chi Cheung, Cong Tian. Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification. In 37th International Conference on Computer Aided Verification (CAV 2024) ๐[Paper] ๐ป [Homepage]
๐ Bo Yang, Jiawei Hu, Jialun Cao (Corresponding) SDEFL: A Lightweight Fault Detection and Localization Method for Deep Neural Networks. In 31st Asia-Pacific Software Engineering Conference (APSEC 2024)
๐ Kunpeng Jian, Yanyan Zou, Yeting Li, Jialun Cao, Menghao Li, Jian Sun, Jingyi Shi and Wei Huo. Fuzzing for Stateful Protocol Implementations: Are We There Yet? In The 18th Theoretical Aspects of Software Engineering Conference (TASE 2024)
๐ Ruiyang Xu, Jialun Cao (Co-1st), Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Shing-Chi Cheung, Le Sun. CruxEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution. In arXiv. ๐[Paper] ๐ฏ[Leaderboard] ๐ป [Github]
2023
๐ Jialun Cao, Yaojie Lu, Ming Wen, Shing-Chi Cheung. Testing Coreference Resolution Systems without Labeled Test Sets. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). ๐[Paper] ๐ป [Github]
๐ Xiaohu Du, Xiao Chen, Jialun Cao, Ming Wen, Shing-Chi Cheung, Hai Jin. Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). ๐[Paper]
๐ Meiziniu Li, Jialun Cao, Yongqiang Tian, Tsz On Li, Ming Wen, Shing-Chi Cheung. COMET: Coverage-guided Model Generation For Deep Learning Library Testing. In ACM Transactions on Software Engineering and Methodology (TOSEM) ๐[Paper] ๐ป [Github]
2022
๐ Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-chi Cheung. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems. In ACM Transactions on Software Engineering and Methodology (TOSEM). ๐[Paper] ๐ป [Github]
๐ Jialun Cao, Meiziniu Li, Xiao Chen, Ming Wen, Yongqiang Tian, Bo Wu, Shing-chi Cheung. DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs. In Proceedings of the 44th International Conference on Software Engineering (ICSE 2022). ๐[Paper] ๐ป [Github]
๐ Yeting Li, Yecheng Sun, Zhiwu Xu, Jialun Cao, Yuekang Li, Rongchen Li, Haiming Chen, Shing-Chi Cheung, Yang Liu, Yang Xiao. RegexScalpel: Regular Expression Denial of Service (ReDoS) Defense by Localize-and-Fix. In the 31st USENIX Security Symposium. ๐[Paper]
2021
๐ Yeting Li, Zixuan Chen, Jialun Cao, Zhiwu Xu, Qiancheng Peng, Haiming Chen, Liyuan Chen, Shing-Chi Cheung. ReDoSHunter: A Combined Static and Dynamic Approach for Regular Expression DoS Detection. In the 30th USENIX Security Symposium. ๐[Paper]
๐ Yeting Li, Shuaimin Li, Zhiwu Xu, Jialun Cao, Zixuan Chen, Yun Hu, Haiming Chen, Shing-Chi Cheung. TransRegex: Multi-modal Regular Expression Synthesis by Generate-and-Repair. In the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE 2021). ๐[Paper]
2020
๐ Yeting Li, Jialun Cao, Haiming Chen, Tingjian Ge, Zhiwu Xu, Qiancheng Peng. FlashSchema: Achieving High Quality XML Schemas with Powerful Inference Algorithms and Large-scale Schema Data. In the IEEE 36th International Conference on Data Engineering (ICDE 2020). ๐[Paper]
๐ Yeting Li, Zhiwu Xu, Jialun Cao, Haiming Chen, Tingjian Ge, Shing-Chi Cheung. FlashRegex: Deducing Anti-ReDoS Regexes from Examples. In the Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 2020). ๐[Paper]
2019
๐ Yongjian Li, Jialun Cao (1st student author), Jun Pang. A Learning-Based Framework for Automatic Parameterized Verification. In 2019 IEEE 37th International Conference on Computer Design (ICCD 2019). ๐[Paper]
๐ Yeting Li, Xiaolan Zhang, Jialun Cao, Haiming Chen, Chong Gao. Learning k-Occurrence Regular Expressions with Interleaving. In the International Conference on Database Systems for Advanced Applications (DASFAA 2019) ๐[Paper]
2018
๐ Jialun Cao, Yongjian Li, Jun Pang. L-CMP: an automatic learning-based parameterized verification tool. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE 2018 Demo). ๐[Paper] ๐ป [Github] ๐ฌ [Video]
๐ Yongjian Li, Jialun Cao (1st student author), Kaiqiang Duan. An automatic parameterized verification of FLASH cache coherence protocol. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS 2018) ๐[Paper]
๐ฏ Teaching
- 2025 Spring, Instructor in COMP 1021 - Introduction to Computer Science. Course materials โ๏ธ [Link]
- 2023 Fall, Teaching Assistant in COMP 1021 - Introduction to Computer Science.
- 2020 Fall, Teaching Assistant in COMP 3021 - Java Programming.
- 2020 Spring, Teaching Assistant in COMP 3021 - Java Programming.
๐ Honors and Awards
- 2025 ACM SIGSOFT Outstanding Doctoral Dissertation Award (Only 1~2 award receivers worldwide per year)
- 2024 Shortlisted Participant for the Rising Stars Women in Engineering Workshop at Asian Deansโ Forum
- 2024 Hong Kong Postgraduate Studentship
- 2024 ACM SIGSOFT CAPS Travel Grant (ASE 2024)
- 2023 ACM SIGSOFT CAPS Travel Grant (ESEC/FSE 2023)
- 2019 - 2023: Huawei Fellowship Scholarship
- 2017: China National Scholarship (Postgraduate, Rank 1/106, Top 1%)
- 2014: China National Scholarship (Undergraduate, Rank 1/52, Top 2%)
๐ Educations
- 2019.09 - 2024.03, Ph.D, The Hong Kong University of Science and Technology.
- 2016.09 - 2019.06, M.S., State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences.
- 2012.09 - 2016.06, B.S., Shandong University.
๐ฉ๐ปโ๐ป Working Experience
- 2024.08 - now, Research Assistant Professor at the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology
- 2024.04 - 2024.07, Postdoctoral Fellow at HKUST, working with Prof. Shing-Chi Cheung
- 2023.07 - 2023.09, Intern at huawei (Hong Kong) - Fermat Lab.
- 2022.09 - 2023.01, Intern at Huawei - Trusted Software Engineering and Open Source Laboratory.
- 2021.09 - 2022.08, Intern at Huawei - Theory Lab.
๐ชด Service
Award committee member:
- ACM SIGSOFT Outstanding Research Award 2025
- ACM SIGSOFT Distinguished Service Award 2025
- ACM SIGSOFT Influential Educator Award 2025
Program committee member
- ICSE 2026 research track
- ICSE 2025 research track
- ASE 2025 research track
- FSE 2025 research track
- ISSRE 2024 research track
- Workshop on Responsible AI Engineering 2025 (RAIE)
- APSEC 2025 Technical Track
- Internetware 2025 Research Track
- CAIN 2025 Research and Experience Papers-track
- Forge 2025 Research track and Data and Benchmarking-track
- SANER 2024 Short Papers and Posters Track track
- APSEC 2024 Technical Track
- Internetware 2024 Research track
- AIware 2024 Main track
- Forge 2024 Research track
Session chair
- ASE 2024 - Session Chair of Code generation 3
- ASE 2024 - Session Chair of Testing 1
- Internetware 2024 - Session Chair of Session 6: Code Generation and Transformation
Reviewer
- ACM Transactions on Software Engineering and Methodology (TOSEM)
- IEEE Transactions on Software Engineering (TSE)
- Empirical Software Engineering (EMSE)
- Journal of Automated Software Engineering (JASE)
- Journal of Software: Evolution and Process (JSME)
- ACM Transactions on Knowledge Discovery from Data (TKDD)
- IEEE Transactions on Mobile Computing (TMC)
๐ฌ Invited Talks
- 2025.01, From Benchmarks to Practice: A Preliminary Study on the Code Capabilities of Large Language Models. In the Next Era of AI-assisted R&D Seminar (2025ๅไธบAI่พ ๅฉ็ ๅNext็ ่ฎจไผ) by Huawei, Hong Kong. ๐ฝ๏ธ[Recording]
- 2025.01, From Requirement to Formal Specification andModel Generation via Large Language Models. In Zhejiang University (Online).
- 2024.11, Is Large Language Model a Rescue for Code Generation & Code Reasoning? In Trusted Large Language Model Evaluation and Open-Source Technology Forum by CCF China Open Source Conference.
- 2024.10, Exploring Automatic Testing and Verification for Software Programs using Large Language Models. In High Trust Software Engineering Technology Laboratory, Guangzhou Research Institute, Xidian University.
- 2024.08, Trusted Architecture of Intelligent CPS Systems. Micro-Forum of Intelligent Software Development hosted by Fudan University. ๐ [Link]
- 2024.08, Can AI be a Panacea for Software Reliability? Exploring Automatic Testing and Verification for Software Programs. In the Software Systems Engineering Group at University College London.
- 2024.07, Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. In the IEEE World Congress on Service - Cloud & AI Symposium
- 2024.05, From Requirement to Formal Specification and Model Generation via Large Language Models In the Formal Methods Committee Strategic Seminar (2024 CCFๅฝขๅผๅๆนๆณไธๅงไผๆ็ฅ็ ่ฎจไผโโๅฝขๅผๅๆนๆณไธไบบๅทฅๆบ่ฝ็ไบคๅ่ๅ๏ผๆบ้ไธๆๆ) hosted in China Computer Federation (CCF). ๐ฝ๏ธ[Recording] ๐ชง[News]
- 2023.12, Crafting Future: A Dancerโs Leap into Computer Science. The 1st Forum for Women Scholars in Software Engineering hosted by ChinaSoft.
- 2023.12, A Study on the Problem of Data Contamination in the Era of Large Language Models. In the Forum on the new paradigm of software engineering under AIGC hosted by Chinasoft.