RPRA: Predicting an LLM-Judge for Efficient but Performant Inference
Researchers propose RPRA (Reason-Predict-Reason-Answer/Act), a framework enabling smaller language models to predict how a larger LLM judge would evaluate their outputs before responding. By routing simple queries to smaller models and complex ones to larger models, the approach reduces computational costs while maintaining output quality, with fine-tuned smaller models achieving up to 55% accuracy improvements.