I’m Lunyiu Nie, a computer science PhD student at UT Austin advised by Prof. Swarat Chaudhuri. My research interests mainly lie in the intersection of Programming Languages and Natural Language Processing, particularly in cost-efficient AI agent programming that adaptively integrates multiple LLMs and vision foundation models.

I completed my computer science Master’s Degree at Tsinghua University. My advisor at Tsinghua was Prof. Jidong Zhai and I also received advice from Prof. Juanzi Li. I obtained my Bachelor’s Degree at the Chinese University of Hong Kong, during which I was advised by Prof. Helen Meng and Prof. Wai Lam.

Please feel free to reach out to me via email for collaboration or general advice. I’m happy to help!

🔥 News

2025.06: Our work “Foundation Model Programs” has been accepted by ES-FoMo@ICML2025. 🎉
2025.05: Joining Adobe Research for summer internship. See you in the Bay Area! 🌁
2024.05: Our work “Online Cascade Learning” has been accepted by ICML 2024. 🎉
2023.08: Started my PhD journey! Nice to see you all in Austin! 🙌

📝 Selected Publications

Resource-efficient Inference with Foundation Model Programs

Lunyiu Nie, Zhimin Ding, Kevin Yu, Marco Cheung, Christopher Jermaine, Swarat Chaudhuri

Preprint [🖇️Code]
TLDR: We propose foundation model programs—modular programs that dynamically select foundation model backends with varying resource costs based on task complexity—to reduce inference-time resource use. Applied to streaming visual question answering, our method achieves up to 98% resource savings with minimal accuracy loss, outperforming monolithic multi-modal models in efficiency.

Online Cascade Learning for Efficient Inference over Streams

Lunyiu Nie, Zhimin Ding, Erdong Hu, Christopher Jermaine, Swarat Chaudhuri

In ICML 2024 (Poster) [🏠Page] [🖇️Code] [📊Slides]
TLDR: This work presents an online cascade learning framework for enhancing the efficiency of LLMs in handling streaming queries. Theoretical analysis provides a no-regret performance guarantee for the algorithm. Experimental results on several benchmarks confirmed the effectiveness of our approach, showing that it can achieve an accuracy comparable to LLMs while saving up to 90% of the inference costs.

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing

Lunyiu Nie*, Jiuding Sun*, Yanling Wang, Lun Du, Han Shi, Dongmei Zhang, Lei Hou, Juanzi Li, Jidong Zhai

In AAAI 2023 (Long Paper, Oral) [🖇️Code]
TLDR: Current PLM-based neural semantic parsers are often suffering from hallucination issues due to their neglect of logical form structures and a lack of intrinsic interpretability. To alleviate the problems, we propose a novel hierarchical decoder architecture and two intermediate supervision tasks that explicitly guide the PLMs to address the structural information alongside the fine-tuning. By probing the PLM inner layer outputs, our work also provides a novel testbed for interpreting the intermediate process of logical form generation.

GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation

Lunyiu Nie, Shulin Cao, Jiaxin Shi, Qi Tian, Lei Hou, Juanzi Li, Jidong Zhai

In EMNLP 2022 (Long Paper) [🖇️Code]
TLDR: We propose GraphQ IR, a novel intermediate representation that aims to bridge the semantic gap between natural language and graph query languages (e.g., SPARQL, Cypher, Lambda-DCS, KoPL, etc). With GraphQ IR as a middleware, we also implement a transpiler that supports translation among multiple graph query languages. Experiments show that our approach can consistently achieve SOTA performance on several semantic parsing benchmarks Overnight, KQA Pro, GrailQA, and MetaQA-Cypher, with promising generalizability under non-IID (compositional generalization and zero-shot) & low-resource settings.

KQA Pro: A Dataset with Explicit Compositional Programs for Complex Question Answering over Knowledge Base

Shulin Cao, Jiaxin Shi, Liangming Pan, Lunyiu Nie, Yutong Xiang, Lei Hou, Juanzi Li, Hanwang Zhang, Bin He

In ACL 2022 (Long Paper) [🖇️Dataset]
TLDR: We introduce KQA Pro, a dataset for complex KBQA including ~120K diverse natural language questions, which is currently also the largest parallel corpus for NLQ-to-SPARQL task. The questions are very diverse and challenging, requiring complex reasoning capabilities including compositional reasoning, multi-hop reasoning, quantitative comparison, set operations, and etc.

Code Structure Guided Transformer for Source Code Summarization

Shuzheng Gao, Cuiyun Gao, Yulan He, Jichuan Zeng, Lunyiu Nie, Xin Xia, Michael Lyu

In ACM Transactions on Software Engineering and Methodology (TOSEM) (2022) [🖇️Code]
TLDR: We propose SG-Trans to incorporate code structural features into the Transformer framework. To capture the hierarchical characteristics of code, we inject the local symbolic information (e.g., code tokens) and global syntactic structure (e.g., data flow) into the self-attention module as inductive bias.

CoreGen: Contextualized Code Representation Learning for Commit Message Generation

Lunyiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, Zenglin Xu

In Neurocomputing (2021) [🖇️Code]
TLDR: We propose a novel code representation learning method named CoreGen that exploits the contextual information behind the code changes via self-supervised learning. Evaluation on the commit message generation benchmark shows that our model outperforms over the baseline models with >28.18% BLEU-4 score improvement.

Unstructured Knowledge Access in Task-oriented Dialog Modeling using Language Inference, Knowledge Retrieval and Knowledge-Integrative Response Generation

Mudit Chaudhary, Borislav Dzodzo, Sida Huang, Chun Hei Lo, Mingzhi Lyu, Lunyiu Nie, Jinbo Xing, Tianhua Zhang, Xiaoying Zhang, Jingyan Zhou, Hong Cheng, Wai Lam, Helen Meng

In AAAI-21 DSTC9 Workshop (2021), Finalist of the 9th Dialog System Technology Challenge (DSTC9) Track 1. [🖇️Code]
TLDR: We propose a pipeline framework for task-oriented dialog modeling with unstructured knowledge access. Our methods can significantly improve the performance of dialog systems and generate high-quality responses, achieving at least 58.77% improvement on BLEU-4 score over the baseline.

ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Reranking

Shangqing Liu, Cuiyun Gao, Sen Chen, Lunyiu Nie, Yang Liu.

In IEEE Transactions on Software Engineering (TSE) (2020)
TLDR: We develop a novel commit message generation model that explicitly incorporates the abstract syntax tree for representing code changes and integrates both retrieved and generated messages through hybrid ranking. Experimental results demonstrate that our approach improves the state-of-the-art models by 30.72% in terms of BLEU-4.

🏆 Honors and Awards

2022 Tsinghua Outstanding Thesis Award & Outstanding Graduate Award (Top 1 in THU CS Department)
2022 Tsinghua Special Class Scholarship (Highest award for Hong Kong Students, <0.5%).
2022 Microsoft Research Asia - Award of Excellence.
2021 Tsinghua Comprehensive Excellence Scholarship.
2018-20 CUHK Faculty Dean’s List (Top 10%).
2019-20 CUHK Wu Yee Sun Collge Master’s List (Ranked 1st/major).
2017-18 Deparment Academic Excellence Award.
2017 HKSAR Government Scholarship.

📖 Educations

2023.08 - Now, The University of Texas at Austin, Ph.D. Student in Computer Science.
2020.09 - 2023.06, Tsinghua University, M.Sc. in Computer Science and Technology.
2015.08 - 2020.07, The Chinese University of Hong Kong, B.B.A. (Honours) in Professional Accountancy, Second Major in Computer Science.

💻 Internships

2025.05 - 2025.08, Adobe Research.
2022.06 - 2022.09, Microsoft Research Asia.
2021.07 - 2022.03, Alibaba Cloud.
2018.07 - 2018.08, Institute of Computing Technology, Chinese Academy of Sciences.

💬 Invited Talks

2023 @ Software Lab, University of Stuttgart (Hosted by Prof. Michael Pradel)
2023 @ Microsoft Research Asia: Neurosymbolic Semantic Parsing (Hosted by Zhiwei Yu)
2022 @ MLNLP Academic Seminar (Hosted by Shaolei Zhang)

👥 Services

Served as reviewer for ICML, NeurIPS, ICLR, AISTATS, ACL Rolling Review (for ACL and EMNLP), TOSEM, etc.

Lunyiu Nie 聂麟骁