#transformer-models News & Analysis

66 articles tagged with #transformer-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

66 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Mind the Noise: Sensitivity of Transformer-based Interaction-Aware Trajectory Prediction Models to Noisy Data

Researchers demonstrate that transformer-based trajectory prediction models used in autonomous vehicles experience severe accuracy degradation when exposed to noisy real-world sensor data, with prediction accuracy declining by up to 3.9x under realistic noise conditions. The findings highlight a critical gap between idealized training environments and actual deployment scenarios, signaling the need for robust noise mitigation strategies in autonomous vehicle systems.

AIBullisharXiv – CS AI · Jun 117/10

🧠

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

Researchers propose nD-RoPE, a generalized extension of Rotary Position Embedding (RoPE) for high-dimensional data that addresses limitations in existing Transformer position encoding methods. The innovation treats positions and frequencies as coupled n-dimensional vectors rather than independent rotations, enabling better cross-dimensional interactions and directional balance across images, videos, and point clouds.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Researchers introduce Rotate2Think, a training-free method that improves language model reasoning by applying geometric transformations to embedding space. The technique identifies that input and reasoning embeddings occupy distinct directional regions and uses orthogonal rotation to geometrically prime the model before generating reasoning traces, showing consistent accuracy improvements across 30 of 32 tested model-benchmark configurations.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks

Researchers demonstrate that parameter-efficient fine-tuning (PEFT) methods like adapters and LoRA can achieve competitive performance on instance segmentation tasks while training only 1-6% of model parameters, compared to 40-55% in traditional fine-tuning. The findings highlight that context-specific optimization is crucial, with 2-3 adapters per transformer block providing optimal efficiency gains.

AIBullisharXiv – CS AI · May 297/10

🧠

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

Researchers introduce Logit-aware Final-block Quantization (LFQ), a technique that improves low-bit quantization of large language models by optimizing the final transformer block to preserve token probability distributions. This advancement addresses quality degradation in generative tasks while maintaining efficiency gains critical for deploying scaled LLMs.

AIBullisharXiv – CS AI · May 277/10

🧠

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

Researchers develop a systematic approach to quantization-aware training for large language models using 8-bit floating-point formats, identifying and solving two critical failure modes—amax saturation and catastrophic forgetting—that don't surface in standard training metrics. Their solution achieves near-lossless performance with only 0.43% degradation on benchmark tasks, advancing practical LLM deployment efficiency.

AINeutralarXiv – CS AI · May 97/10

🧠

On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning

Researchers demonstrate that standard fine-tuning of transformer models on causal reasoning tasks causes catastrophic collapse where models learn trivial solutions while appearing accurate. They propose a semantic loss function with graph-based constraints that prevents collapse and achieves stable, context-dependent causal reasoning with 42.7% improvement over baseline models.

AIBullisharXiv – CS AI · May 47/10

🧠

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Researchers introduce Sentra-Guard, a real-time defense system that detects and mitigates jailbreak and prompt injection attacks on large language models with 99.96% accuracy. The multilingual framework combines FAISS-indexed semantic embeddings with fine-tuned transformers and human-in-the-loop feedback, significantly outperforming existing defenses like LlamaGuard-2 and OpenAI Moderation.

🏢 OpenAI

AIBearisharXiv – CS AI · May 17/10

🧠

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

Researchers challenge the assumption that multi-agent AI systems benefit from the 'Wisdom of the Crowd' by demonstrating the Inverse-Wisdom Law: adding more logical agents to swarms can paradoxically increase the stability of errors rather than improve accuracy. Through 36 experiments across major benchmarks, the study reveals that architectural tribalism causes agents to prioritize internal agreement over external truth, with system integrity ultimately determined by the synthesizer's logic rather than individual agent quality.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBearisharXiv – CS AI · Apr 77/10

🧠

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.

AINeutralarXiv – CS AI · Apr 67/10

🧠

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Researchers studied weight-space model merging for multilingual machine translation and found it significantly degrades performance when target languages differ. Analysis reveals that fine-tuning redistributes rather than sharpens language selectivity in neural networks, increasing representational divergence in higher layers that govern text generation.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Researchers have developed 'Amnesia,' a lightweight adversarial attack that bypasses safety mechanisms in open-weight Large Language Models by manipulating internal transformer states. The attack enables generation of harmful content without requiring fine-tuning or additional training, highlighting vulnerabilities in current LLM safety measures.

AIBullisharXiv – CS AI · Mar 57/10

🧠

ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Researchers introduce ZipMap, a new AI model for 3D reconstruction that achieves linear-time processing while maintaining accuracy comparable to slower quadratic-time methods. The system can reconstruct over 700 frames in under 10 seconds on a single H100 GPU, making it more than 20x faster than current state-of-the-art approaches like VGGT.

AINeutralarXiv – CS AI · Mar 37/104

🧠

How Do LLMs Use Their Depth?

New research reveals that large language models use a "Guess-then-Refine" framework, starting with high-frequency token predictions in early layers and refining them with contextual information in deeper layers. The study provides detailed insights into layer-wise computation dynamics through multiple-choice tasks, fact recall analysis, and part-of-speech predictions.

AINeutralarXiv – CS AI · Feb 277/107

🧠

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Researchers developed Compositional-ARC, a dataset to test AI models' ability to systematically generalize abstract spatial reasoning tasks. A small 5.7M parameter transformer model trained with meta-learning outperformed large language models like GPT-4o and Gemini 2.0 Flash on novel geometric transformation combinations.

AINeutralarXiv – CS AI · Jun 256/10

🧠

TopoCast: A Topological Fidelity Framework for Evaluating Transformer-Based Time Series Forecasting

Researchers introduce TopoCast, a topology-based evaluation framework for time series forecasting that moves beyond traditional error metrics to assess structural fidelity in deep learning models. The framework uses persistent homology to detect phase shifts, oscillatory distortions, and timing errors that conventional metrics like MSE overlook, revealing that models with similar numerical accuracy can exhibit substantially different structural quality.

AINeutralarXiv – CS AI · Jun 255/10

🧠

Spam and Sentiment Detection in Arabic Tweets Using MARBERT Model

Researchers developed a sentiment analysis model using MARBERT to classify Arabic tweets for Saudi Telecom Company (STC), training on 24,513 tweets across five sentiment categories. The study addresses a significant gap in NLP research by applying advanced transformer-based models to Arabic social media data, enabling improved customer service through automated sentiment detection.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Short-Term Electricity Demand Forecasting for New England Using a Hybrid Transformer-XGBoost Framework with Weather, Calendar, and COVID-19 Indicators

Researchers developed a hybrid machine learning model combining Transformers and XGBoost to forecast short-term electricity demand in New England, incorporating weather, calendar, and COVID-19 data. While the hybrid approach marginally outperformed a baseline model (2.05% MAPE vs 2.21%), statistical testing revealed the improvement is not significant, and an ablation study exposed how COVID-19 features caused overfitting to pandemic-era behavioral patterns that no longer applied.

AINeutralarXiv – CS AI · Jun 235/10

🧠

Transcribing Bengali Text with Regional Dialects to IPA using District Guided Tokens

Researchers have developed a District Guided Tokens (DGT) technique to improve Bengali text-to-IPA transcription by incorporating regional dialect information, with the ByT5 model achieving superior performance on a new dataset spanning six Bangladeshi districts. This advancement addresses the phonological complexity of Bengali dialects and demonstrates the importance of regional context in natural language processing systems.

AINeutralarXiv – CS AI · Jun 195/10

🧠

A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition

Researchers developed improved Automatic Speech Recognition (ASR) models for Quranic recitation using pretrained Transformer architectures (Wav2Vec2.0, HuBERT, XLS-R), achieving 8% word error rates compared to 16.3% baseline performance. The study demonstrates that domain-specific fine-tuning with 870+ hours of professional and user-recited Quranic audio, combined with Arabic text without diacritics, significantly enhances transcription accuracy while reducing training time by 71%.

AINeutralarXiv – CS AI · Jun 106/10

🧠

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

Researchers introduce WorldModelLens, an open-source interpretability framework that unifies analysis across diverse world model architectures (recurrent state-space models, token-based transformers, and joint-embedding systems) through a standardized capability-typed interface. The tool enables researchers to apply interpretability methods once rather than reimplementing them for each model architecture, addressing fragmentation in AI model analysis tooling.

AINeutralarXiv – CS AI · Jun 106/10

🧠

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

Researchers introduce RKSC, a training-free inference framework that optimizes multi-step LLM reasoning by sharing KV cache across similar branches and implementing early exit mechanisms. The system achieves 3x average speedup over baseline methods with minimal error rates, advancing efficiency in large language model inference without requiring model retraining.

AIBullisharXiv – CS AI · Jun 106/10

🧠

UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation

UPLOTS is a unified pre-trained language model that generates constrained time-series data across multiple domains using a single transformer backbone guided by learned prompts. The framework addresses scalability limitations of existing domain-specific approaches by internalizing diverse temporal structures and enabling conditional generation with precise pattern control.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Customer Churn Prediction on Structured Data Using FT-Transformer and Stacking Ensembles

Researchers propose a hybrid machine learning architecture combining FT-Transformer neural networks with XGBoost gradient boosting to predict customer churn in banking and subscription services. The ensemble method achieves superior performance metrics (62.10% F1, 0.861 AUC-ROC) compared to baseline models while addressing critical challenges in class imbalance and probability calibration.

AINeutralarXiv – CS AI · Jun 95/10

🧠

TeamHerald@CHIPSAL 2026: Hate Speech Detection and Sentiment Analysis of Nepali Memes using Transformer-based Architectures and Ensemble Learning

Researchers presented a study on detecting hate speech and analyzing sentiment in Nepali-language memes using transformer-based machine learning models and ensemble learning techniques. The work addresses challenges specific to Nepali text analysis, including code-mixing and limited baseline datasets, demonstrating that soft voting ensemble strategies outperform standalone models for multi-class sentiment tasks by 15.8% in Macro F1-score.

Page 1 of 3Next →