Source: Kensho Blog

Kensho Blog We’re expanding S&P AI Benchmarks to cover Long-document QA

New benchmark assesses how well AI systems analyze complex financial documents hundreds of pages long.We’re excited to announce the expansion of S&P AI Benchmarks by Kensho with the addition of a new evaluation set focused on long-form document Question Answering (QA). Building on the success of our existing offering, this latest benchmark assesses how well large language models (LLMs) can correctly answer a natural language query by analyzing relevant information from extremely long and complex documents, such as SEC filings.Financial professionals often need to quickly find specific, relevant information from documents that are hundreds of pages long. But most LLMs struggle to effectively complete this kind of task, given context window limitations, which restricts the size and length of the input. Model developers are working to solve this challenge in many ways, such as expanding the word limit, splitting a query into multiple inferences, or, the most common solution, combining the LLM with a Retrieval Augmented Generation (RAG) system. With RAG, teams can connect their LLM to additional data sources stored in a vector database, which can include complex, long-form documents and PDFs.By assessing an LLM’s ability to query long-form documents, S&P Global’s new benchmark can help financial professionals understand the relative performance of LLM applications and corresponding RAG architectures for their specific use cases. For example, the new benchmark can help research analysts and wealth managers to make informed decisions about which model architectures to leverage for financial workflows involving long and complex documents.This offering is the latest evolution of S&P Global’s legacy of financial benchmarking dating back to 1860, from credit ratings and market indices to commodity price assessments, and now AI. “We are thrilled to introduce this new evaluation solution as part of our S&P AI Benchmarks series,” said Bhavesh Dayalji, Chief AI Officer at S&P Global. “Our vision is to continue to expand our offering and add new benchmarks based on our customers’ most important use cases, providing financial professionals with a reliable tool to enhance their decision-making processes.”“Our vision is to continue to expand our offering and add new benchmarks based on our customers’ most important use cases, providing financial professionals with a reliable tool to enhance their decision-making processes.”— Bhavesh Dayalji, Chief AI Officer, S&P GlobalThe new evaluation set, comprising 225 questions, was developed in collaboration with experts across S&P Global to ensure accuracy and reliability. Model builders and teams implementing GenAI in the financial sphere can confidently benchmark their LLMs using the secure submission process, which only requires the sharing of outputs, not the model itself. The scores are displayed on a leaderboard, allowing users to compare the performance of various LLMs on these specific criteria.To learn more about S&P AI Benchmarks, visit https://benchmarks.kensho.com/.We’re expanding S&P AI Benchmarks to cover Long-document QA was originally published in Kensho Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read full article »
Est. Annual Revenue
$100-500M
Est. Employees
100-250
Bhavesh Dayalji's photo - CEO of Kensho

CEO

Bhavesh Dayalji

CEO Approval Rating

82/100

Read more