AINeutralarXiv โ CS AI ยท 5h ago0
๐ง
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
Researchers found that narrow finetuning of Large Language Models leaves detectable traces in model activations that can reveal information about the training domain. The study demonstrates that these biases can be used to understand what data was used for finetuning and suggests mixing pretraining data into finetuning to reduce these traces.