This episode is not yet released.
program-aided-language-models | episodes

Program-aided Language Models

We are joined by Aman Madaan and Shuyan Zhou. They are both PhD students at the Language Technology Institute at Carnegie Mellon University. They join us to discuss their latest published paper, PAL: Program-aided Language Models.

Aman and Shuyan started by sharing how the application of LLMs has evolved. They talked about the performance of LLMs on arithmetic tasks in contrast to coding tasks. Aman introduced their PAL model and how it helps LLMs improve at arithmetic tasks. He shared examples of the tasks PAL was tested on. Shuyan discussed how PAL’s performance was evaluated using Big Bench hard tasks.

They discussed the kind of mistakes LLMs tend to make and how the PAL’s model circumvents these limitations. They also discussed how these developments in LLMS can improve kids learning.

Rounding up, Aman discussed the CoCoGen project, a project that enables NLP tasks to be converted to graphs. Shuyan and Aman shared their next research steps.

Follow Shuyan on Twitter @shuyanzhxyc. Follow Aman on @aman_madaan.

Further reading includes this mentioned paper, which solves high-school math problems using text (chain of thought) and code (PaL).

Shuyan Zhou

Shuyan Zhou is a final-year Ph.D. student from the Language Technologies Institute at Carnegie Mellon University, advised by Prof. Graham Neubig. She studies autonomous AI agents and she aims to create AI agents that could perform tedious tasks so that everyone can focus on more creative and interesting tasks. Before joining CMU, she received her B.S. degree from the Department of Computer Science and Technology at Harbin Institute of Technology.

Aman Madaan

Aman is a PhD candidate at Carnegie Mellon University. His current focus is improving the reasoning capabilities of large language models, focusing on leveraging feedback and the synergy between code generation and natural language reasoning. In the past, he has developed several techniques that are now part of standard few-shot reasoning workflows, including memory-augmented prompting, using code for improving structured generation and reasoning:,, and Self-refinement of language model outputs