What do language models actually know versus what we ask them?

Current knowledge benchmarks suffer from availability bias—they only test knowledge that benchmark designers explicitly ask about. This work introduces BeQu, a 10,000-entity benchmark that evaluates what LLMs naturally surface when prompted broadly (e.g., "Tell me everything about M.L. King") rather than answering narrow trivia. The shift from predefined retrieval to open-ended elicitation reveals a more honest picture of model knowledge, with analysis of how reasoning effort, scale, and domain affect what gets expressed.