Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

llm
research paper
Author

Santosh Sawant

Published

June 7, 2024

Recently, various prompting methods such as CoT, ToT and GoT have been instrumental in improving reasoning performance of LLMs. All these methods can be broadly divided into two categories:

  1. single-query reasoning: these methods usually focus on prompt engineering and their reasoning process can be finished within a single query, such as CoT that appends the input query with ’Let’s think step by step’ to produce rationales for increasing reasoning accuracy.

  2. multi-query reasoning: these methods focus on leveraging multiple LLM queries to elicit different plausible reasoning paths, thus decomposing a complex problem into a series of simpler sub-problems, such as Least-to-Most ToT and GoT.

However, both single-query and multi-query reasoning processes are limited by their designed examples and reasoning structures, and they neglect to derive general and high-level guidelines or thoughts from previously-completed tasks.

To address these limitations, researchers have proposed Buffer of Thoughts (BoT), a novel and versatile thought augmented reasoning framework aimed at enhancing reasoning accuracy, efficiency and robustness of LLMs across various tasks. BoT uses a meta-buffer, a lightweight library housing a series of universal high-level thoughts (thought-template), which are distilled from different problem-solving processes and can be shared across tasks. Then, for each problem, a relevant thought template is retrieved and instantiated with a specific reasoning structure for efficient thought-augmented reasoning. In order to guarantee the scalability and stability of BoT, a buffer-manager is used to dynamically update the meta-buffer, which effectively enhances the capacity of meta-buffer as more tasks are solved.

During extensive experiments on 10 challenging reasoning-intensive tasks BoT have shown significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. Further analysis demonstrates the superior generalization ability and model robustness of BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, it was found that Llama3-8B + BoT has the potential to surpass the Llama3-70B model in reasoning-intensive tasks.

Paper : https://arxiv.org/pdf/2406.04271

Code : https://github.com/YangLing0818/buffer-of-thought-llm