AINeutralarXiv โ CS AI ยท 1d ago4/10
๐ง
Perturbation: A simple and efficient adversarial tracer for representation learning in language models
Researchers propose a new method called 'perturbation' for understanding how language models learn representations by fine-tuning models on adversarial examples and measuring how changes spread to other examples. The approach reveals that trained language models develop structured linguistic abstractions without geometric assumptions, offering insights into how AI systems generalize language understanding.