#model-attribution News & Analysis

4 articles tagged with #model-attribution. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles

AINeutralarXiv – CS AI · Jun 257/10

🧠

Small edits, large models: How Wikipedia advocacy shapes LLM values

A research study demonstrates that a small group of Wikipedia editors advocating for animal welfare has measurably shaped how large language models discuss the topic, with their edits appearing in 68% of the most relevant documents for animal welfare queries. Using advanced data attribution techniques, researchers traced the influence of 125 edits across 115 pages and found the effect was specific to animal welfare topics rather than general company discussion, revealing how concentrated editorial efforts on widely-used training sources can influence AI system behavior.

🏢 Perplexity🧠 Llama

AIBearisharXiv – CS AI · May 277/10

🧠

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Researchers identify a critical vulnerability in retrieval-augmented generation systems where language models produce faithful-looking outputs from memory rather than retrieved context, making it impossible to verify source attribution through output analysis alone. They propose Computational Reality Monitoring (CRM), a technique that detects internal representational differences to identify when models rely on pretraining data versus external evidence.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework

Researchers introduce LAVA, a hierarchical framework using convolutional autoencoders to detect audio deepfakes and identify their source generation models with 95%+ accuracy. The system addresses a critical gap in deepfake attribution, moving beyond detection to pinpoint which specific AI model created fraudulent audio content.

$ADA

AINeutralarXiv – CS AI · Apr 156/10

🧠

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

Researchers have developed SeedPrints, a novel fingerprinting method that identifies Large Language Models based on their random initialization seed rather than post-training characteristics. This approach enables model attribution and provenance verification from inception through full pretraining, addressing limitations of existing methods that only work reliably after fine-tuning.