← Back to Computation and Language
cs.CL

What do language models actually know versus what we ask them?

Luca Giordano, Simon Razniewski

May 26, 2026

Current knowledge benchmarks suffer from availability bias—they only test knowledge that benchmark designers explicitly ask about. This work introduces BeQu, a 10,000-entity benchmark that evaluates what LLMs naturally surface when prompted broadly (e.g., "Tell me everything about M.L. King") rather than answering narrow trivia. The shift from predefined retrieval to open-ended elicitation reveals a more honest picture of model knowledge, with analysis of how reasoning effort, scale, and domain affect what gets expressed.
Published as Beyond Questions: Evaluating What Large Language Models (Actually) Know arXiv:2605.26937
Read the original paper →