# AbsenceBench: Language Models Can't Tell What's Missing

> AbsenceBench를 소개하는 논문. https://arxiv.org/abs/2506.11440

[AbsenceBench](https://wiki.g15e.com/pages/AbsenceBench.txt)를 소개하는 논문. https://arxiv.org/abs/2506.11440

## Abstract

> [Large language models](https://wiki.g15e.com/pages/Large%20language%20model.txt) (LLMs) are increasingly capable of processing long inputs and locating specific information within them, as evidenced by their performance on the [Needle in a Haystack (NIAH) test](https://wiki.g15e.com/pages/The%20Needle%20In%20a%20Haystack%20Test.txt). However, while models excel at recalling surprising information, they still struggle to identify clearly omitted information. We introduce [AbsenceBench](https://wiki.g15e.com/pages/AbsenceBench.txt) to assesses LLMs' capacity to detect missing information across three domains: numerical sequences, poetry, and GitHub pull requests. AbsenceBench asks models to identify which pieces of a document were deliberately removed, given access to both the original and edited contexts. Despite the apparent straightforwardness of these tasks, our experiments reveal that even state-of-the-art models like Claude-3.7-Sonnet achieve only 69.6% F1-score with a modest average context length of 5K tokens. Our analysis suggests this poor performance stems from a fundamental limitation: Transformer attention mechanisms cannot easily attend to "gaps" in documents since these absences don't correspond to any specific keys that can be attended to. Overall, our results and analysis provide a case study of the close proximity of tasks where models are already superhuman (NIAH) and tasks where models breakdown unexpectedly (AbsenceBench).