Science

MIT’s CodeSteer Enhances Large Language Models’ Problem-Solving

Published

17 hours ago

Researchers at MIT have developed a new assistant called CodeSteer that significantly enhances the problem-solving capabilities of large language models (LLMs) by guiding them in switching between text and code generation. This breakthrough addresses a common limitation of LLMs, which excel at understanding textual content but often struggle with basic mathematical problems and algorithmic tasks. The results indicate that integrating CodeSteer can improve accuracy on symbolic tasks by over 30 percent.

Bridging Text and Code

Large language models are designed primarily for textual reasoning, making them more prone to errors when faced with mathematical queries. For instance, if asked to compare the numbers 9.11 and 9.9, an LLM might incorrectly rely on text-based reasoning rather than executing code. To counter this, CodeSteer, a smaller LLM itself, acts as a coach that directs the larger model on when to apply code effectively.

According to Chuchu Fan, an associate professor of aeronautics and astronautics and principal investigator in the MIT Laboratory for Information and Decision Systems (LIDS), “We want to enable LLMs to select the right tools and methods.” CodeSteer reviews the responses of the larger model, suggesting adjustments until the correct answer is achieved.

The research team, which includes graduate students from LIDS and University of Illinois at Urbana-Champaign, has prepared their findings for presentation at the International Conference on Machine Learning. They found that augmenting an LLM with CodeSteer can enhance performance on complex tasks such as generating robot paths in unpredictable environments and optimizing international supply chains.

Methodology and Results

Research has shown that LLMs often attempt to generate simpler, less effective code when faced with symbolic calculations. CodeSteer tackles this issue by prompting the model to use more complex coding methods, ensuring that the generated code effectively addresses the task at hand. The team developed a dataset named SymBench, which includes 37 complex symbolic tasks, to test their methods.

In experiments, CodeSteer outperformed nine baseline methods, increasing average accuracy from 53.3 percent to 86.4 percent, and maintained high performance even on previously unseen tasks. This innovation allows a general-purpose model equipped with CodeSteer to achieve better accuracy than state-of-the-art models designed specifically for complex reasoning.

“By augmenting an LLM with the ability to smartly use coding, we can take a model that is already very strong and improve its performance even more,” said Yongchao Chen, a graduate student involved in the study.

The research has received support from the U.S. Office of Naval Research and the MIT-IBM Watson AI Lab, highlighting its potential impact on various applications where LLMs currently fall short.

Experts in the field have praised the approach, with Jinsung Yoon, a staff research scientist at Google Cloud AI, noting that the method enables LLMs to achieve significant performance improvements without requiring direct fine-tuning. “This research represents a substantial contribution that promises to significantly enhance the application of LLMs to a diverse range of tasks,” Yoon added.

As the team continues to refine CodeSteer, they aim to streamline its prompting process and explore the possibility of developing a unified model that seamlessly integrates both textual reasoning and code generation capabilities. This research could pave the way for more robust AI applications in complex real-world scenarios, marking a significant step forward in the evolution of large language models.

In this article:Aero, Chuchu Fan, CodeSteer, Large Language Models, LLMs, MIT, researchers

California Defies Federal Order to Ban Transgender Athletes in School Sports

California has taken a stand against a federal directive from the Trump administration demanding the exclusion of transgender athletes from girls’ and women’s sports....

Editorial8 July, 2025

Entertainment

Olivia Munn Opens Up About Trichotillomania Triggered by Public Scrutiny

Olivia Munn, the acclaimed actress, recently shared an intimate revelation about her personal struggles with trichotillomania, a disorder that compels individuals to pull out...

Editorial30 June, 2025

Tech Giants Invest $41M in Pioneering Carbon Removal with Arbor’s BECCS

Frontier, a coalition of technology leaders including Google and Meta, has announced a landmark investment in Arbor, a cutting-edge startup specializing in bioenergy with...

Editorial9 July, 2025

Business

Elon Musk’s Potential Third Party: A Risky Political Gamble

Political commentator Brilyn Hollyhand has voiced strong opposition to the prospect of Elon Musk launching a third political party in 2025. In his commentary,...

Editorial10 July, 2025

Trump Justice Department Fails to Release Epstein Client List

The Trump Justice Department has not released a client list related to the late financier Jeffrey Epstein, despite widespread speculation and anticipation. This decision...

Editorial10 July, 2025

Politics

Pennsylvania Lawmakers Consider Sales Tax Changes Amid Budget Crisis

Lawmakers in Pennsylvania are exploring potential changes to the state’s sales tax exemptions as the General Assembly grapples with a significant budget deficit. This...

Editorial9 July, 2025

Business

Sunny Fourth of July Weekend Drives Business Growth Across the U.S.

The Fourth of July weekend in 2023 brought an unexpected surge in business activity across the United States, driven by sunny weather and increased...

Editorial10 July, 2025

Entertainment

Netflix Celebrates Tenth Anniversary of ‘The Intern’ with Surge in Popularity

Netflix has seen a significant rise in viewership for the film The Intern, which marks its tenth anniversary this year. Last month, this 2015...

Editorial9 July, 2025

Health

NCS Commits S$130 Million to AI Development Over Three Years

Ng Kuo Pin, CEO of NCS, announced a significant investment of S$130 million in artificial intelligence (AI) over the next three years. This initiative...

Editorial4 days ago

Entertainment

James Gunn’s Superman Soundtrack Blends Tradition and Punk Energy

The highly anticipated soundtrack for the upcoming film Superman, directed by James Gunn, merges traditional orchestral elements with a punk rock flair. Set to...

Editorial10 July, 2025

Politics

Trump Administration Files Lawsuit Against California’s Egg Laws

The Trump administration has filed a lawsuit against California and Governor Gavin Newsom, contesting the state’s anti-animal cruelty laws, which the administration claims are...

Editorial10 July, 2025

Politics

Central Nebraska Reports Successful Groundwater Recharge Efforts

The Central Nebraska Public Power and Irrigation District has successfully completed groundwater recharge activities between June 25 and July 3, utilizing excess water flows....

Editorial10 July, 2025

Bridging Text and Code

Methodology and Results

Trending

Top Stories

California Defies Federal Order to Ban Transgender Athletes in School Sports

Entertainment

Olivia Munn Opens Up About Trichotillomania Triggered by Public Scrutiny

Top Stories

Tech Giants Invest $41M in Pioneering Carbon Removal with Arbor’s BECCS

Business

Elon Musk’s Potential Third Party: A Risky Political Gamble

Top Stories

Trump Justice Department Fails to Release Epstein Client List

Politics

Pennsylvania Lawmakers Consider Sales Tax Changes Amid Budget Crisis

Business

Sunny Fourth of July Weekend Drives Business Growth Across the U.S.

You May Also Like

Top Stories

California Defies Federal Order to Ban Transgender Athletes in School Sports

Entertainment

Olivia Munn Opens Up About Trichotillomania Triggered by Public Scrutiny

Top Stories

Tech Giants Invest $41M in Pioneering Carbon Removal with Arbor’s BECCS

Business

Elon Musk’s Potential Third Party: A Risky Political Gamble

Top Stories

Trump Justice Department Fails to Release Epstein Client List

Politics

Pennsylvania Lawmakers Consider Sales Tax Changes Amid Budget Crisis

Business

Sunny Fourth of July Weekend Drives Business Growth Across the U.S.

Entertainment

Netflix Celebrates Tenth Anniversary of ‘The Intern’ with Surge in Popularity

Health

NCS Commits S$130 Million to AI Development Over Three Years

Entertainment

James Gunn’s Superman Soundtrack Blends Tradition and Punk Energy

Politics

Trump Administration Files Lawsuit Against California’s Egg Laws

Politics

Central Nebraska Reports Successful Groundwater Recharge Efforts