KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning
Researchers introduce KITE, a novel example selection method for in-context learning in large language models that uses information theory and kernel methods to choose task-specific examples from a prompt bank. The approach addresses limitations of existing nearest-neighbor methods by improving diversity and generalization, demonstrating measurable improvements across classification tasks in label-scarce scenarios.
KITE represents a meaningful advancement in optimizing how large language models leverage in-context learning, a technique that has become central to LLM adaptation without fine-tuning. The research tackles a practical constraint: since LLMs have limited context windows, the quality of selected examples directly impacts model performance on user queries. Traditional retrieval methods like KATE rely on embedding-space nearest neighbors, which perform poorly in high dimensions and tend to select redundant examples lacking diversity.
The breakthrough comes from framing example selection as a query-specific optimization problem grounded in information theory rather than general-purpose learning theory. By modeling LLMs as linear functions over embeddings and deriving a submodular surrogate objective, the authors enable greedy selection with mathematical approximation guarantees. The kernel trick allows the method to operate in high-dimensional spaces efficiently, while design-based regularization explicitly encourages diversity among selected examples.
This work matters because in-context learning has become the dominant paradigm for deploying LLMs across diverse tasks without retraining. More intelligent example selection directly translates to better model outputs and reduced computational overhead. Organizations building LLM applications—from customer support to data labeling—stand to benefit from improved few-shot performance, especially in resource-constrained or proprietary data scenarios.
The empirical validation across classification tasks suggests KITE's practical applicability. Future work will likely explore scaling to other task types and investigating whether these principles extend to reasoning tasks or multimodal inputs. The methodology's foundation in information theory positions it as a potential standard for example selection in production LLM systems.
- →KITE uses information theory and kernel methods to select diverse, query-specific examples for in-context learning, outperforming traditional nearest-neighbor approaches
- →The method frames example selection as a query-specific optimization problem with mathematical approximation guarantees through submodular optimization
- →Kernel trick integration enables operation in high-dimensional embedding spaces while design-based regularization promotes diversity in selected examples
- →Empirical results demonstrate significant improvements over standard retrieval methods across classification tasks in label-scarce scenarios
- →The approach addresses fundamental limitations of existing methods like KATE that suffer from poor generalization and redundancy in high-dimensional spaces