Mukul Singh
Researcher at Microsoft Research
PROSE Group, Redmond, USA
I am a Researcher at Microsoft Research in the PROSE (Program Synthesis using Examples) group. My research focuses on Agentic AI, Large Language Models, AI for Code, and Program Synthesis.
My work powers features in GitHub Copilot, Excel Copilot, Visual Studio, and Power Query, impacting over 1 billion users monthly. I develop code generation systems using inference-based, agentic, and fine-tuned frameworks, and have pioneered research on diffusion models for structured code generation.
I completed my B.Tech in Computer Science from IIT Delhi with a Minor in Artificial Intelligence.
Research Interests
- Large Language Models for Code
- Agentic AI Systems
- Program Synthesis
- Diffusion Models for Structured Generation
- Table Understanding & Formatting
News
- Dec 2025 Invited speaker at Dagstuhl Seminar on Diffusion Models for Code Generation
- Dec 2025 Paper accepted at NeurIPS 2025: "Diffusion is a code repair operator and generator"
- Dec 2025 Serving as Senior Area Chair for ACL 2026
- Sep 2025 Serving as Senior Area Chair for EMNLP 2025, ACL 2025, and NAACL 2025
- Aug 2025 Promoted to Senior Researcher at Microsoft Research
- May 2025 Keynote speaker at AIWARE conference co-hosted with ICSE 2025 in Seoul, South Korea
- Apr 2025 Invited guest at Dagstuhl Seminar on Automated Programming and Program Repair
- Nov 2024 Two papers accepted at EMNLP 2024: RAR and One-to-many testing for code generation
- Apr 2024 Received Best Paper Award at ICSE 2024
Publications
Full list on Google Scholar
Training Emergent Joint Associations: A Reinforcement Learning Approach to Creative Thinking in Language Models
Do Code Models Suffer from the Dunning-Kruger Effect?
Collaboration and Conflict between Humans and Language Models through the Lens of Game Theory
Tabularis Formatus: Predictive Formatting for Tables
Diffusion is a code repair operator and generator
Execution-guided within-prompt search for programming-by-example
TeCoFeS: Text Column Featurization using Semantic Analysis
RAR: Retrieval Augmented Retrieval for Code Generation in Low Resource Languages
One-to-many testing for code generation from (just) natural language
Tabularis Revilio: Converting Text to Tables
Semantically Aligned Question and Code Generation for Automated Insight Generation
Learning Email Folder Classification Rules by Demonstration
Learning from mistakes: Example based personalized online learner
CODEFUSION: A Pre-trained Diffusion Model for Code Generation
FORMAT5: Multi-modal Table Formatting with Natural Language and Examples
Example based Automated Spreadsheet Formatting Rule Generation
EMFORE: Online Learning of Email Folder Classification Rules
DATAVINCI: Learning to Repair String Data
INSTRUCTEXCEL: A Benchmark for Natural Language Instruction in Excel
Target Similarity Tuning Meets the Real World
Learning what to teach models for code generation
An empirical study of validating synthetic data for formula generation
CORNET: Learning Formatting Rules for Tabular Data by Example
Personalized UI suggestions for low-code automation platforms
From Words to Code: Harnessing Data for Program Synthesis from Natural Language
A Genetic Algorithm and RNN-LSTM Model for Remaining Battery Capacity Prediction
Multi-Objective Genetic Algorithm Based Deep Learning Model for Automated COVID-19 Detection
Deep Learning Model Based CO2 Emissions Prediction using Vehicle Telematics Data
Vehicle Telematics: Application of Machine Learning and Big Data at Edge
Transfer Learning based Ensemble SVM for Automated COVID-19 Detection using lung CT Scan Image Data
Multi-Objective Genetic Algorithm Based Deep Learning Model for Automated COVID-19 Detection
Patents
- Execution Guided Within-Prompt Search With Large Language Models
- System and Method for Intelligent Chart Design Recommendations in Spreadsheet Applications Using Large Language Models
- Multistep retrieval for grammar based code generation
- Dynamic Refinement of Language Model Evaluation Checklists
- Tabular Data Augmentation Through Semantic Text Analysis and Labeling
- Recognizing tables in unstructured text
- Vision chain-of-thought reasoning for multimodal language models
- Code generation system using pre-trained diffusion model
- Database rule generation for tabular task automation suggestion and ranking
- Learning Syntactic and Semantic String Repairs
- Target Property Selection Techniques for Learning What to Teach Language Models for Code Generation
- Automated recommendation and ranking of personalized user interface actions in task and data automation platforms
- Automated system for battery health and capacity prediction using charge and voltage statistics
Experience
Researcher
Jul 2022 – PresentMicrosoft Research, PROSE Group · Redmond, USA
Research on code generation, diffusion models, table understanding, and UI automation. Work powers GitHub Copilot, Excel Copilot, and Visual Studio.
Research Intern
Jun 2021 – May 2022Microsoft Research · Redmond, USA (Remote)
Developed neurosymbolic AI for automatic spreadsheet formatting. Led to publications in VLDB and SIGMOD.
AI Research Intern
May 2022 – Jul 2022Meta (Facebook) · London, UK
Designed auto-encoder based anomaly detection for manufacturing defect identification.
AI Research Intern
May 2021 – Jul 2021American Express · Gurgaon, India
Deep learning for email subject line optimization and open rate prediction.
Research Intern
Dec 2019 – Jun 2020Robert Bosch GmbH · Bangalore, India
Vehicle reaction time estimation using driving pattern data for chassis health evaluation.
Education
B.Tech in Computer Science and Engineering
2018 – 2022Indian Institute of Technology Delhi
CGPA: 9.2/10.0
Minor in Artificial Intelligence
2018 – 2022Indian Institute of Technology Delhi
CGPA: 9.6/10.0
Academic Service
Senior Area Chair
EMNLP 2025, ACL 2025, NAACL 2025
Area Chair
EMNLP 2024, ICSE 2024, CIKM 2024, ACL 2024
Program Committee
ACL, EMNLP, AAAI, NeurIPS
Journal Reviewer
IEEE TIV, IEEE ITS, SIGMOD, OOPSLA
Awards & Achievements
- Dagstuhl Invited Guest – Automated Programming and Program Repair Seminar, Germany (2025)
- ICSE Keynote Speaker – AIWARE conference co-hosted with ICSE 2025, Seoul, South Korea (2025)
- ICSE Best Paper Award – International Conference on Software Engineering, USA (2024)
- SMU Invited Talk – Research keynote at Singapore Management University (2024)
- NeurIPS Best Reviewer Award – Table Representation Learning Workshop (2023)
- VLDB Invited Speaker – Tabular Data Analysis Workshop, Canada (2023)
- Sustainability Research & Leadership Award – AI for vehicle automation, Tata Power (2022)
- Young Researcher Award – Bachelor thesis, NSS & Govt. of India (2021)
- TIDE Fellowship – Tech Entrepreneurship Award & Incubation by MeitY, Govt. of India (2020)
- Merit Scholarship – Full merit scholarship for scholastic performance, IIT Delhi (2018)
- KVPY Fellowship – Kishore Vaigyanik Protsahan Yojana, IISc Bangalore & Govt. of India (2017)
- InPhO Merit Certificate – Outstanding performance in International Physics Olympiad (2016)