#transformer-models News & Analysis

42 articles tagged with #transformer-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

42 articles

AIBullisharXiv – CS AI · 3d ago7/10

🧠

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

Researchers introduce Logit-aware Final-block Quantization (LFQ), a technique that improves low-bit quantization of large language models by optimizing the final transformer block to preserve token probability distributions. This advancement addresses quality degradation in generative tasks while maintaining efficiency gains critical for deploying scaled LLMs.

AIBullisharXiv – CS AI · 5d ago7/10

🧠

Max-Window Scale Estimation for Near-Lossless HiF8 W8A8 Quantization-Aware Training

Researchers develop a systematic approach to quantization-aware training for large language models using 8-bit floating-point formats, identifying and solving two critical failure modes—amax saturation and catastrophic forgetting—that don't surface in standard training metrics. Their solution achieves near-lossless performance with only 0.43% degradation on benchmark tasks, advancing practical LLM deployment efficiency.

AINeutralarXiv – CS AI · May 97/10

🧠

On Semantic Loss Fine-Tuning Approach for Preventing Model Collapse in Causal Reasoning

Researchers demonstrate that standard fine-tuning of transformer models on causal reasoning tasks causes catastrophic collapse where models learn trivial solutions while appearing accurate. They propose a semantic loss function with graph-based constraints that prevents collapse and achieves stable, context-dependent causal reasoning with 42.7% improvement over baseline models.

AIBullisharXiv – CS AI · May 47/10

🧠

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

Researchers introduce Sentra-Guard, a real-time defense system that detects and mitigates jailbreak and prompt injection attacks on large language models with 99.96% accuracy. The multilingual framework combines FAISS-indexed semantic embeddings with fine-tuned transformers and human-in-the-loop feedback, significantly outperforming existing defenses like LlamaGuard-2 and OpenAI Moderation.

🏢 OpenAI

AIBearisharXiv – CS AI · May 17/10

🧠

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

Researchers challenge the assumption that multi-agent AI systems benefit from the 'Wisdom of the Crowd' by demonstrating the Inverse-Wisdom Law: adding more logical agents to swarms can paradoxically increase the stability of errors rather than improve accuracy. Through 36 experiments across major benchmarks, the study reveals that architectural tribalism causes agents to prioritize internal agreement over external truth, with system integrity ultimately determined by the synthesizer's logic rather than individual agent quality.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBearisharXiv – CS AI · Apr 77/10

🧠

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.

AINeutralarXiv – CS AI · Apr 67/10

🧠

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Researchers studied weight-space model merging for multilingual machine translation and found it significantly degrades performance when target languages differ. Analysis reveals that fine-tuning redistributes rather than sharpens language selectivity in neural networks, increasing representational divergence in higher layers that govern text generation.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Researchers have developed 'Amnesia,' a lightweight adversarial attack that bypasses safety mechanisms in open-weight Large Language Models by manipulating internal transformer states. The attack enables generation of harmful content without requiring fine-tuning or additional training, highlighting vulnerabilities in current LLM safety measures.

AIBullisharXiv – CS AI · Mar 57/10

🧠

ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Researchers introduce ZipMap, a new AI model for 3D reconstruction that achieves linear-time processing while maintaining accuracy comparable to slower quadratic-time methods. The system can reconstruct over 700 frames in under 10 seconds on a single H100 GPU, making it more than 20x faster than current state-of-the-art approaches like VGGT.

AINeutralarXiv – CS AI · Mar 37/104

🧠

How Do LLMs Use Their Depth?

New research reveals that large language models use a "Guess-then-Refine" framework, starting with high-frequency token predictions in early layers and refining them with contextual information in deeper layers. The study provides detailed insights into layer-wise computation dynamics through multiple-choice tasks, fact recall analysis, and part-of-speech predictions.

AINeutralarXiv – CS AI · Feb 277/107

🧠

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Researchers developed Compositional-ARC, a dataset to test AI models' ability to systematically generalize abstract spatial reasoning tasks. A small 5.7M parameter transformer model trained with meta-learning outperformed large language models like GPT-4o and Gemini 2.0 Flash on novel geometric transformation combinations.

AINeutralarXiv – CS AI · 13h ago6/10

🧠

Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

Researchers demonstrate that GPT-4o-generated paraphrases can improve sign language translation by augmenting training data while keeping video inputs unchanged. Testing across three sign language datasets reveals modest gains on PHOENIX14T (9.56 to 10.33 BLEU-4) but exposes fundamental limitations when data is sparse or highly controlled.

🧠 GPT-4

AINeutralarXiv – CS AI · 13h ago6/10

🧠

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Researchers propose a novel framework for controlling symbolic music generation in Transformer models through activation steering, enabling fine-grained control over musical attributes like pitch and duration without retraining. The approach uses latent space analysis and orthogonalization techniques to independently manipulate multiple attributes while reducing interference and maintaining generation quality.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

A comparative study of transformer-based embeddings for topic coherence

A research study comparing seven transformer-based language models of varying sizes (22M to 13B parameters) in topic modeling tasks found that model size has negligible impact on topic quality. This suggests smaller, more efficient models can match larger models' performance for topic coherence applications, potentially reducing computational costs without sacrificing output quality.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Specialty-Specific Medical Language Model for Immune-Mediated Diseases

Researchers developed a specialized Named Entity Recognition model for identifying disease-related clinical entities in immunology and infectious disease texts, achieving 0.89 F1 score through transformer-based architecture with clinical embeddings. The model outperforms general-purpose NLP systems and LLMs in extracting granular biomedical concepts from unstructured medical narratives, enabling improved cohort identification and clinical decision support.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting

Researchers propose FHRFormer, a masked transformer-based autoencoder that reconstructs missing fetal heart rate data from wearable monitors using self-supervised learning. The method addresses signal dropout caused by sensor displacement and positional changes, preserving spectral characteristics better than traditional interpolation while enabling both data inpainting and forecasting for improved fetal risk assessment.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

Researchers propose a new interpretation method for Transformer models with heterogenous attention structures, which process information from multiple sources. The work addresses the growing need to understand complex AI systems, particularly as they integrate diverse data modalities and support increasingly sophisticated agent applications.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

LiDDA: Data Driven Attribution at LinkedIn

LinkedIn researchers introduced LiDDA, a transformer-based machine learning approach for data-driven attribution that assigns conversion credits to marketing interactions across member-level data, aggregate data, and external macro factors. The framework has been implemented at scale at LinkedIn and demonstrates significant business impact, with methodologies applicable to the broader marketing and ad tech industries.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

Auditable Decision Models with Learned Abstention and Real-Time Steering

Researchers introduce EvaluatorDPT, a decision-control model that predicts YES, NO, or TBD (to-be-determined) for high-stakes AI applications where uncertainty exists. The system learns deferral as an explicit outcome rather than hiding uncertainty in forced predictions, achieving 82.6% accuracy with auditable, policy-governed decision routing that can be inspected and controlled at inference time.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence

Researchers introduce CmIVTP, a cross-modal AI framework that combines AIS and CCTV data to improve maritime vessel trajectory prediction. The system uses transformer-based architecture with attention mechanisms to model vessel-environment interactions, addressing limitations of single-source data in maritime navigation systems.

AINeutralarXiv – CS AI · May 126/10

🧠

PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering

PathISE is a novel framework that enables knowledge graph question-answering systems to learn effective supervision signals from answer-level labels alone, eliminating the need for expensive intermediate annotations. By using a transformer-based estimator to identify informative relation paths and distilling them into LLM path generators, the approach achieves competitive state-of-the-art performance while reducing resource requirements for training.

AINeutralarXiv – CS AI · May 126/10

🧠

CLEF: EEG Foundation Model for Learning Clinical Semantics

Researchers introduce CLEF, a foundation model for clinical EEG interpretation that processes full-length brain signal sessions alongside patient records and neurologist reports. The model achieves 74% mean AUROC across 234 clinical tasks, substantially outperforming prior EEG foundation models by integrating long-context signal analysis with clinically grounded embeddings.

AINeutralarXiv – CS AI · May 126/10

🧠

LLM Translation of Compiler Intermediate Representation

Researchers introduce IRIS-14B, a 14-billion-parameter LLM fine-tuned to translate compiler intermediate representations between GCC's GIMPLE and LLVM IR, achieving up to 44 percentage points higher accuracy than existing state-of-the-art models. The approach demonstrates how LLMs can function as interoperability layers in hybrid compiler architectures, enabling cross-toolchain workflows without modifying existing compiler infrastructure.

AINeutralarXiv – CS AI · May 125/10

🧠

KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

Researchers introduce KANMultiSign, a neural network framework that converts sign language notation into pose animations using Kolmogorov-Arnold Networks integrated with Transformers. The system achieves improved accuracy with fewer parameters across multiple sign languages, demonstrating that multi-scale supervision is the key driver of performance gains.

AINeutralarXiv – CS AI · May 116/10

🧠

PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks

Researchers present PAMPOS, a causal transformer-based system that detects misbehavior in Vehicle-to-Everything (V2X) networks by identifying deviations from learned normal driving patterns, achieving up to 98% AUC without requiring labeled attack data during training. This unsupervised approach addresses a critical security gap where cryptographic mechanisms alone cannot prevent insider falsification attacks in connected vehicle systems.

Page 1 of 2Next →