HeyCloud
Posts
Measuring the stubbornness of LLMs

Measuring the stubbornness of LLMs

How much LLMs prefer to hallucinate vs answer from RAG data

Abdelhadi Azzouni
April 24, 2024

Measuring the stubbornness of LLMs

Original paper: https://arxiv.org/pdf/2404.10198.pdf

When building a RAG app, you want your LLM to be faithful to the RAG data; meaning you want it to answer from the data you feed it instead of answering from its internal knowledge (which opens the doors to hallucinations).

This paper attempts to quantify the faithfulness (or seen from a different angle: stubbornness) of LLMs, and it has some interesting results.

It appears that LLMs behave quasi-predictably when faced with external data that does not correspond to their internal knowledge.

Quick Ask

If you enjoy reading this newsletter, please share it with your friends and network. It will help us a lot! Thanks 😊

In other words, if the LLM is trained on D1 data, and you feed it D2 data (via RAG) then:

- if D2 is too different from D1 then the LLM sticks with D1

- if D1 and D2 are close, then the LLM likely picks D2.

Measuring LLM faithfulness (or stubbornness)

The idea of this paper is to quantify too different and close above. For that, you need to calculate two variables:

- Prior probability: probability of an answer from D1 (we call it prior_strength)

- RAG preference rate: probability of an answer from D2 (we call it faithfulness_rag)

Once you calculate these values for enough question/answers. You can draw the red line below, which is your approximative function:

faithfulness_rag = f(prior_strength)

Measuring LLM faithfulness (or stubbornness)

The results are quite expected, from a high level:

- The likelihood of the LLM to adhere to D2 is inversely correlated with the model’s confidence in its response without context.

- Similarly, LLMs will increasingly revert to their priors (D1) when D2 is progressively modified with unrealistic values.

The slope determines the “stubbornness” of the LLM. Meaning, a low slope (-0.45 for example), means that it is harder to convince the LLM to answer from RAG data (D2) if it is even slightly confident in its internal knowledge (D1).

Hope this was useful!