20 articles tagged with #bert. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv – CS AI · Mar 56/10
🧠Researchers reproduced and analyzed severe accuracy degradation in BERT transformer models when applying post-training quantization, showing validation accuracy drops from 89.66% to 54.33%. The study found that structured activation outliers intensify with model depth, with mixed precision quantization being the most effective mitigation strategy.
AIBullisharXiv – CS AI · Mar 47/102
🧠Researchers propose Self-Correcting Discrete Diffusion (SCDD), a new AI model that improves upon existing discrete diffusion models by reformulating self-correction with explicit state transitions. The method enables more efficient parallel decoding while maintaining generation quality, demonstrating improvements at GPT-2 scale.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers propose a semantic bootstrapping framework that transfers knowledge from large language models into interpretable symbolic Tsetlin Machines, enabling text classification systems to achieve BERT-comparable performance while remaining fully transparent and computationally efficient without runtime LLM dependencies.
AIBullisharXiv – CS AI · 4d ago6/10
🧠Researchers introduce BERT-as-a-Judge, a lightweight alternative to LLM-based evaluation methods that assesses generative model outputs with greater accuracy than lexical approaches while requiring significantly less computational overhead. The method demonstrates that existing lexical evaluation techniques poorly correlate with human judgment across 36 models and 15 tasks, establishing a practical middle ground between rigid rule-based and expensive LLM-judge evaluation paradigms.
AINeutralarXiv – CS AI · Apr 76/10
🧠Researchers propose a new metric to assess consistency of AI model explanations across similar inputs, implementing it on BERT models for sentiment analysis. The framework uses cosine similarity of SHAP values to detect inconsistent reasoning patterns and biased feature reliance, providing more robust evaluation of model behavior.
AINeutralarXiv – CS AI · Mar 176/10
🧠Researchers released MALINT, the first human-annotated English dataset for detecting disinformation and its malicious intent, developed with expert fact-checkers. The study benchmarked 12 language models and introduced intent-based inoculation techniques that improved zero-shot disinformation detection across six datasets, five LLMs, and seven languages.
🧠 Llama
AIBullisharXiv – CS AI · Feb 276/105
🧠Researchers introduce dLLM, an open-source framework that unifies core components of diffusion language modeling including training, inference, and evaluation. The framework enables users to reproduce, finetune, and deploy large diffusion language models like LLaDA and Dream while providing tools to build smaller models from scratch with accessible compute resources.
AINeutralLil'Log (Lilian Weng) · Jan 276/10
🧠This article presents an updated and expanded version of a comprehensive guide to Transformer architecture improvements, building upon a 2020 post. The new version is twice the length and includes recent developments in Transformer models, providing detailed technical notations and covering both encoder-decoder and simplified architectures like BERT and GPT.
🏢 OpenAI
AIBullishLil'Log (Lilian Weng) · Jan 316/10
🧠This article discusses the evolution of generalized language models including BERT, GPT, and other major pre-trained models that achieved state-of-the-art results on various NLP tasks. The piece covers the breakthrough progress in 2018 with large-scale unsupervised pre-training approaches that don't require labeled data, similar to how ImageNet helped computer vision.
🏢 OpenAI
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers have created CzechTopic, a new benchmark dataset for evaluating AI models' ability to identify specific topics within historical Czech documents. The study compared various large language models and BERT-based models, finding significant performance variations with the strongest models approaching human-level accuracy in topic detection.
AIBullisharXiv – CS AI · Mar 35/104
🧠Researchers developed a Noise Removal model to improve precision in clinical entity extraction using BERT-based Named Entity Recognition systems. The model uses advanced features like Probability Density Maps to identify weak vs strong predictions, reducing false positives by 50-90% in clinical NER applications.
AINeutralarXiv – CS AI · Feb 274/105
🧠Researchers developed a machine learning framework to predict which clinical trials are likely to have high dosing error rates before the trials begin. The system analyzed 42,112 clinical trials and achieved 86.2% accuracy using a combination of structured data and text analysis, enabling proactive risk management in clinical research.
AINeutralarXiv – CS AI · Feb 274/104
🧠Researchers developed a hybrid AI model combining BanglaBERT and stacked LSTM networks to detect multiple types of cyberbullying in Bangla text simultaneously. The approach addresses limitations in existing single-label classification methods by recognizing that comments can contain overlapping forms of abuse like threats, hate speech, and harassment.
AINeutralHugging Face Blog · Dec 195/107
🧠The article title suggests the introduction of ModernBERT as a replacement for BERT, a widely-used language model in AI applications. However, the article body appears to be empty, preventing detailed analysis of the technical improvements or implications.
AINeutralHugging Face Blog · Jan 194/104
🧠The article appears to be about fine-tuning W2V2-Bert (Wav2Vec2-BERT) for automatic speech recognition in low-resource languages using Hugging Face Transformers. However, the article body is empty, preventing detailed analysis of the technical implementation or methodology.
AIBullishHugging Face Blog · Mar 164/105
🧠The article appears to focus on optimizing BERT model inference using Hugging Face Transformers library with AWS Inferentia chips. This represents a technical advancement in AI model deployment and performance optimization on specialized hardware.
AINeutralHugging Face Blog · Nov 44/103
🧠This appears to be a technical article about optimizing BERT model inference performance on CPU architectures, part of a series on scaling transformer models. The article likely covers implementation strategies and performance improvements for running large language models efficiently on CPU hardware.
AINeutralHugging Face Blog · Aug 223/105
🧠The article appears to be about pre-training BERT language models using Hugging Face Transformers framework with Habana Gaudi processors. However, the article body is empty, making it impossible to provide detailed analysis of the content or methodology discussed.
AINeutralHugging Face Blog · Mar 23/104
🧠The article appears to be about BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art natural language processing model. However, the article body is empty, preventing detailed analysis of the content or implications.
AINeutralHugging Face Blog · Apr 201/105
🧠The article appears to be incomplete or missing content, containing only a title about scaling BERT inference on CPU systems. Without the article body, no meaningful analysis can be provided about the technical implementation or performance improvements discussed.