AINeutralarXiv – CS AI · 6h ago6/10
🧠
Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs
Researchers propose using Large Language Models to automatically detect and annotate Personally Identifiable Information (PII) in HTTP traffic without requiring fixed taxonomies or extensive manually-labeled datasets. The approach combines deterministic preprocessing with LLM-based classification and includes a synthetic traffic generator for evaluation, demonstrating flexible privacy audit capabilities across multiple PII domains.