#transformer-models News & Analysis

66 articles tagged with #transformer-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

66 articles

AIBullisharXiv – CS AI · Jun 96/10

🧠

Cheap Reward Hacking Detection

Researchers have developed a lightweight transformer-based method to detect reward hacking in AI systems that operates at a fraction of the cost of existing approaches. The technique achieves comparable performance to LLM-based judges while demonstrating superior true positive rates, suggesting efficient alternatives to expensive AI evaluation methods are feasible.

AINeutralarXiv – CS AI · Jun 86/10

🧠

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

RETROSPECT introduces a modular retrosynthesis system combining a Transformer-based proposal model with LambdaMART reranking to improve chemical synthesis prediction. The system achieves 55% top-1 accuracy on USPTO-50K benchmarks, demonstrating that decomposing retrosynthesis into proposal generation and learned selection improves both ranking quality and candidate diversity.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Multi-Granularity Reasoning for Natural Language Inference

Researchers propose Multi-Granularity Reasoning Network (MGRN), a novel approach to Natural Language Inference that processes semantic information across multiple hierarchical levels rather than relying solely on final-layer transformer representations. The framework demonstrates improved performance on NLI benchmarks by explicitly separating lexical, phrasal, and contextual semantic features.

AINeutralarXiv – CS AI · Jun 56/10

🧠

MASF: A Multi-Model Adaptive Selection Framework for Abstractive Text summarization

Researchers propose MASF, a Multi-Model Adaptive Selection Framework that combines multiple fine-tuned transformer models with automatic evaluation metrics to improve abstractive text summarization quality. The framework achieves a BERTScore of 88.63% on the CNN/DailyMail dataset, outperforming several large language models including GPT3-D2 and Falcon-7b.

AINeutralarXiv – CS AI · Jun 46/10

🧠

A Unified Geometric Space for Topological Alignment Between Transformer-Based Models and Human Brain Networks

Researchers have developed a novel framework for comparing Transformer-based AI models by mapping their internal attention topology onto human brain networks, analyzing 151 models across vision, language, and multimodal domains. The study reveals an arc-shaped distribution of topological alignment with human cognition, where models trained for semantic abstraction align with higher-order brain networks, while detail-focused models align with low-level networks, though alignment scores show weak correlation with standard performance metrics.

AINeutralarXiv – CS AI · Jun 26/10

🧠

AEyeDE: An Attention-Based Attribution Framework for AI-Generated Text Detection

Researchers introduce AEyeDE, an attention-based attribution framework that detects AI-generated text by analyzing transformer model attention patterns rather than surface-level linguistic features. The method uses a lightweight CNN trained on attention maps from a proxy model and demonstrates strong performance across multiple settings, suggesting attention structures provide a reliable signal for distinguishing human from AI authorship.

AINeutralarXiv – CS AI · Jun 26/10

🧠

UF-AMA: A unified framework for cross-domain emotion recognition via adaptive multimodal alignment

Researchers introduce UF-AMA, a unified framework for cross-domain emotion recognition using multimodal physiological signals like EEG and eye-tracking data. The model employs adaptive alignment mechanisms and multi-level domain adaptation to achieve state-of-the-art performance in cross-subject and cross-session emotion recognition tasks.

AIBullisharXiv – CS AI · Jun 26/10

🧠

MURMUR: An Efficient Inference System for Long-Form ASR

Researchers introduce Murmur, an inference system that optimizes long-form automatic speech recognition by balancing accuracy and latency through a two-level approach: intermediate chunk sizes at the inter-chunk level and attention sparsity exploitation at the intra-chunk level. The system achieves 4.2x latency reduction while maintaining single-pass accuracy on benchmark tests.

AINeutralarXiv – CS AI · Jun 25/10

🧠

JenBridge: Adaptive Long-Form Video Soundtracking across Scene Transitions

JenBridge is a new AI framework for generating long-form video soundtracks that maintain coherence across scene transitions using transformer-based generative models and LLM-directed transition selection. The system combines text-audio pretraining with video-domain adaptation and introduces the LVS Benchmark for evaluating soundtrack quality and transition naturalness.

AIBearisharXiv – CS AI · Jun 26/10

🧠

Vision Language Models Cannot Reason About Physical Transformation

Researchers demonstrate that Vision Language Models systematically fail to understand physical transformations, revealing fundamental gaps in how these AI systems reason about dynamic environments. Through ConservationBench testing 112 VLMs on conservation principles, the study shows models perform near chance levels regardless of prompting strategies or temporal resolution, indicating they lack genuine comprehension of invariant physical properties rather than simply lacking training data.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Researchers propose a novel framework for controlling symbolic music generation in Transformer models through activation steering, enabling fine-grained control over musical attributes like pitch and duration without retraining. The approach uses latent space analysis and orthogonalization techniques to independently manipulate multiple attributes while reducing interference and maintaining generation quality.

AINeutralarXiv – CS AI · Jun 16/10

🧠

Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models

Researchers demonstrate that GPT-4o-generated paraphrases can improve sign language translation by augmenting training data while keeping video inputs unchanged. Testing across three sign language datasets reveals modest gains on PHOENIX14T (9.56 to 10.33 BLEU-4) but exposes fundamental limitations when data is sparse or highly controlled.

🧠 GPT-4

AINeutralarXiv – CS AI · May 296/10

🧠

FHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and Forecasting

Researchers propose FHRFormer, a masked transformer-based autoencoder that reconstructs missing fetal heart rate data from wearable monitors using self-supervised learning. The method addresses signal dropout caused by sensor displacement and positional changes, preserving spectral characteristics better than traditional interpolation while enabling both data inpainting and forecasting for improved fetal risk assessment.

AINeutralarXiv – CS AI · May 296/10

🧠

A comparative study of transformer-based embeddings for topic coherence

A research study comparing seven transformer-based language models of varying sizes (22M to 13B parameters) in topic modeling tasks found that model size has negligible impact on topic quality. This suggests smaller, more efficient models can match larger models' performance for topic coherence applications, potentially reducing computational costs without sacrificing output quality.

AINeutralarXiv – CS AI · May 296/10

🧠

Specialty-Specific Medical Language Model for Immune-Mediated Diseases

Researchers developed a specialized Named Entity Recognition model for identifying disease-related clinical entities in immunology and infectious disease texts, achieving 0.89 F1 score through transformer-based architecture with clinical embeddings. The model outperforms general-purpose NLP systems and LLMs in extracting granular biomedical concepts from unstructured medical narratives, enabling improved cohort identification and clinical decision support.

AINeutralarXiv – CS AI · May 286/10

🧠

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

Researchers propose a new interpretation method for Transformer models with heterogenous attention structures, which process information from multiple sources. The work addresses the growing need to understand complex AI systems, particularly as they integrate diverse data modalities and support increasingly sophisticated agent applications.

AINeutralarXiv – CS AI · May 286/10

🧠

LiDDA: Data Driven Attribution at LinkedIn

LinkedIn researchers introduced LiDDA, a transformer-based machine learning approach for data-driven attribution that assigns conversion credits to marketing interactions across member-level data, aggregate data, and external macro factors. The framework has been implemented at scale at LinkedIn and demonstrates significant business impact, with methodologies applicable to the broader marketing and ad tech industries.

AINeutralarXiv – CS AI · May 286/10

🧠

Auditable Decision Models with Learned Abstention and Real-Time Steering

Researchers introduce EvaluatorDPT, a decision-control model that predicts YES, NO, or TBD (to-be-determined) for high-stakes AI applications where uncertainty exists. The system learns deferral as an explicit outcome rather than hiding uncertainty in forced predictions, achieving 82.6% accuracy with auditable, policy-governed decision routing that can be inspected and controlled at inference time.

AINeutralarXiv – CS AI · May 276/10

🧠

CmIVTP: Cross-modal Interaction-based Vessel Trajectory Prediction for Maritime Intelligence

Researchers introduce CmIVTP, a cross-modal AI framework that combines AIS and CCTV data to improve maritime vessel trajectory prediction. The system uses transformer-based architecture with attention mechanisms to model vessel-environment interactions, addressing limitations of single-source data in maritime navigation systems.

AINeutralarXiv – CS AI · May 126/10

🧠

PathISE: Learning Informative Path Supervision for Knowledge Graph Question Answering

PathISE is a novel framework that enables knowledge graph question-answering systems to learn effective supervision signals from answer-level labels alone, eliminating the need for expensive intermediate annotations. By using a transformer-based estimator to identify informative relation paths and distilling them into LLM path generators, the approach achieves competitive state-of-the-art performance while reducing resource requirements for training.

AINeutralarXiv – CS AI · May 126/10

🧠

CLEF: EEG Foundation Model for Learning Clinical Semantics

Researchers introduce CLEF, a foundation model for clinical EEG interpretation that processes full-length brain signal sessions alongside patient records and neurologist reports. The model achieves 74% mean AUROC across 234 clinical tasks, substantially outperforming prior EEG foundation models by integrating long-context signal analysis with clinically grounded embeddings.

AINeutralarXiv – CS AI · May 126/10

🧠

LLM Translation of Compiler Intermediate Representation

Researchers introduce IRIS-14B, a 14-billion-parameter LLM fine-tuned to translate compiler intermediate representations between GCC's GIMPLE and LLVM IR, achieving up to 44 percentage points higher accuracy than existing state-of-the-art models. The approach demonstrates how LLMs can function as interoperability layers in hybrid compiler architectures, enabling cross-toolchain workflows without modifying existing compiler infrastructure.

AINeutralarXiv – CS AI · May 125/10

🧠

KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

Researchers introduce KANMultiSign, a neural network framework that converts sign language notation into pose animations using Kolmogorov-Arnold Networks integrated with Transformers. The system achieves improved accuracy with fewer parameters across multiple sign languages, demonstrating that multi-scale supervision is the key driver of performance gains.

AINeutralarXiv – CS AI · May 116/10

🧠

PAMPOS: Causal Transformer-based Trajectory Prediction for Attack-Agnostic Misbehavior Detection in V2X Networks

Researchers present PAMPOS, a causal transformer-based system that detects misbehavior in Vehicle-to-Everything (V2X) networks by identifying deviations from learned normal driving patterns, achieving up to 98% AUC without requiring labeled attack data during training. This unsupervised approach addresses a critical security gap where cryptographic mechanisms alone cannot prevent insider falsification attacks in connected vehicle systems.

AINeutralarXiv – CS AI · May 76/10

🧠

Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation

Researchers reveal that large language models develop distinct hierarchical processing stages (Local, Intermediate, Global) determined by architecture family rather than model size. Using information theory, they demonstrate that Llama and Qwen models show dramatically different brittleness patterns across layers, with architectural design — not scaling — as the primary driver of model behavior.

🧠 Llama

← PrevPage 2 of 3Next →