12 articles tagged with #gpt-2. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBearisharXiv โ CS AI ยท 4d ago7/10
๐ง Researchers have developed EZ-MIA, a training-free membership inference attack that dramatically improves detection of memorized data in fine-tuned language models by analyzing probability shifts at error positions. The method achieves 3.8x higher detection rates than previous approaches on GPT-2 and demonstrates that privacy risks in fine-tuned models are substantially greater than previously understood.
๐ง Llama
AIBullisharXiv โ CS AI ยท Mar 47/102
๐ง Researchers propose Self-Correcting Discrete Diffusion (SCDD), a new AI model that improves upon existing discrete diffusion models by reformulating self-correction with explicit state transitions. The method enables more efficient parallel decoding while maintaining generation quality, demonstrating improvements at GPT-2 scale.
AIBullisharXiv โ CS AI ยท Mar 46/102
๐ง Researchers developed GPUTOK, a GPU-accelerated tokenizer for large language models that processes text significantly faster than existing CPU-based solutions. The optimized version shows 1.7x speed improvement over tiktoken and 7.6x over HuggingFace's GPT-2 tokenizer while maintaining output quality.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.
AINeutralarXiv โ CS AI ยท Feb 277/105
๐ง Researchers have discovered that transformer models, despite different training runs producing different weights, converge to the same compact 'algorithmic cores' - low-dimensional subspaces essential for task performance. The study shows these invariant structures persist across different scales and training runs, suggesting transformer computations are organized around shared algorithmic patterns rather than implementation-specific details.
AIBullishOpenAI News ยท May 97/106
๐ง Researchers used GPT-4 to automatically generate explanations for how individual neurons behave in large language models and to evaluate the quality of those explanations. They have released a comprehensive dataset containing explanations and quality scores for every neuron in GPT-2, advancing AI interpretability research.
AINeutralOpenAI News ยท Nov 57/105
๐ง OpenAI has released the largest version of GPT-2 with 1.5 billion parameters, completing their staged release process. The release includes code and model weights to help detect GPT-2 outputs and serves as a test case for responsible AI model publication.
AIBullisharXiv โ CS AI ยท Apr 76/10
๐ง Researchers propose MUXQ, a new quantization technique for large language models that addresses activation outliers through low-rank decomposition. The method enables efficient INT8 quantization while maintaining accuracy close to FP16, making it suitable for edge device deployment with NPU-based hardware.
๐ข Perplexity
AINeutralarXiv โ CS AI ยท Mar 126/10
๐ง Researchers developed a pipeline to translate AI model internal mechanisms into human-understandable explanations, testing on GPT-2 Small. The study identified six attention heads responsible for 61.4% of model performance on a specific task, with LLM-generated explanations outperforming template-based approaches by 64%.
AINeutralOpenAI News ยท Sep 196/106
๐ง OpenAI successfully fine-tuned a 774M parameter GPT-2 model using human feedback for tasks like summarization and text continuation. The research revealed challenges where human labelers' preferences didn't align with developers' intentions, with summarization models learning to copy text wholesale rather than generate original summaries.
AINeutralOpenAI News ยท Aug 206/104
๐ง OpenAI released the 774 million parameter GPT-2 language model, completing their staged release approach that began with smaller models earlier in the year. The release includes an open-source legal agreement for model-sharing partnerships and a technical report on coordinating AI research publication norms.
AIBullishOpenAI News ยท Apr 256/106
๐ง OpenAI has created MuseNet, a deep neural network capable of generating 4-minute musical compositions using 10 different instruments and combining various musical styles from country to classical to rock. The system uses the same transformer technology as GPT-2, learning musical patterns through unsupervised training on hundreds of thousands of MIDI files rather than explicit musical programming.