AIBearisharXiv – CS AI · 11h ago7/10
🧠
NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models
Researchers introduce NeedleChain, a benchmark that reveals significant limitations in how well large language models like GPT-4o can integrate query-relevant information across contexts. The study demonstrates that current context-understanding evaluations overestimate LLM capabilities by including irrelevant content, and proposes ROPE contraction as a training-free improvement strategy.
🧠 GPT-4