AIBullisharXiv โ CS AI ยท 10h ago7/10
๐ง
Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models
Researchers propose a cost-effective proxy model framework that uses smaller, efficient models to approximate the interpretability explanations of expensive Large Language Models (LLMs), achieving over 90% fidelity at just 11% of computational cost. The framework includes verification mechanisms and demonstrates practical applications in prompt compression and data cleaning, making interpretability tools viable for real-world LLM development.