AIBullisharXiv – CS AI · 10h ago7/10
🧠
LLM Jaggedness Unlocks Scientific Creativity
Researchers introduce SciAidanBench, a benchmark revealing that LLM capability improvements are uneven across tasks and domains—a phenomenon termed 'jaggedness.' By evaluating 19 models across 8 providers, they demonstrate that stronger models don't uniformly excel at scientific creativity, but this fragmentation can be leveraged through ensemble methods to achieve superior performance.