MIRAI: Evaluating LLM Agents for Event Forecasting

llm
research paper
Author

Santosh Sawant

Published

July 2, 2024

Recent advancements in Large Language Models (LLMs) have enabled LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems such as predicting international events. Despite such a growing interest, there is a lack of a rigorous benchmark of LLM agents’ forecasting capability and reliability.

To address this gap, researchers from UCLA Caltech have introduced MIRAI, a novel benchmark designed to systematically evaluate LLM agents as temporal forecasters in the context of international events. MIRAI benchmark consists of a collection of 991,759 GDELT event records, corresponding to 59,161 unique events and 296,630 unique news articles. The test set contains 705 query and answer pairs on forecasting an event of a given timestamp between two countries, with a 100 balanced test subset.

Let consider LLM agent’s framework interaction with the multi-source environment using the ReAct strategy for forecasting a query event. The framework consists of three main steps: (1) Think: The agent analyzes the current status and plans the next action based on the query and the provided API specifications. (2) Act: The agent generates a Single Function call or a Code Block to retrieve and analyze relevant data from the database. (3) Execute: The Python interpreter runs the generated code with the API implementation and database and produces observations. These steps are iteratively performed until the agent reaches a final forecast for the future relation.

Now MIRAI comprehensively evaluates the agents’ capabilities in three dimensions: 1) autonomously source and integrate critical information from large global databases; 2) write codes using domain-specific APIs and libraries for tool-use; and 3) jointly reason over historical knowledge from diverse formats and time to accurately predict future events.

Through comprehensive benchmarking, MIRAI strives to assess the capabilities of LLM agents in forecasting international events, thereby contributing to the development of more accurate and trustworthy models for international relation analysis.

Paper : https://arxiv.org/pdf/2407.01231