#feature-engineering News & Analysis

18 articles tagged with #feature-engineering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models

Researchers introduce Token Factory, a framework that converts traditional recommendation signals into efficient 'soft tokens' for Large Recommendation Models, enabling better feature integration without excessive computational overhead or prompt bloat. The approach demonstrates practical improvements in production-scale recommendation systems by compressing heterogeneous inputs while maintaining or enhancing model performance.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Bridging Expert Knowledge and Automated Feature Engineering via Self-Evolution

Researchers introduce FEST, a machine learning system that automatically engineers interpretable features from unstructured text and images while aligning with expert knowledge. The method outperforms existing approaches across brand compliance, content moderation, and clinical tasks, and the team releases BrandGuide, a new dataset of 1M+ assets with expert-designed features for systematic evaluation.

AIBullisharXiv – CS AI · May 297/10

🧠

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction

Eureka is an LLM-driven framework that automates feature engineering for machine learning by treating feature design as a code generation problem. The system combines expert agents, chain-of-thought reasoning, and reinforcement learning to generate and refine features iteratively, demonstrating 16% improvement in cloud resource prediction at Alibaba Cloud.

AINeutralarXiv – CS AI · May 117/10

🧠

Does Your Neural Network Extrapolate? Feature Engineering as Identifiability Bias for OOD Generalization

Researchers demonstrate that neural networks fail at out-of-distribution (OOD) generalization not due to insufficient training data, but because the choice of feature representation fundamentally determines what extrapolation patterns a model can learn. The same architecture achieving identical in-distribution loss can differ by 520x out-of-distribution depending on how features are encoded, showing that correct feature engineering is necessary but not sufficient without appropriate model class constraints.

AIBullisharXiv – CS AI · Mar 46/103

🧠

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Researchers introduce MedFeat, a new AI framework that uses Large Language Models for healthcare feature engineering in clinical tabular predictions. The system incorporates model awareness and domain knowledge to discover clinically meaningful features that outperform traditional approaches and demonstrate robustness across different hospital settings.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR

Researchers have developed improved acoustic modeling techniques for recognizing dysarthric speech in children, achieving 4.65% relative improvement in word recognition and 4.63% in sentence recognition using Factorized Time Delay Neural Networks. The study demonstrates that strategic selection of acoustic features, particularly pitch characteristics, significantly enhances performance on low-resource speech recognition tasks.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models

Researchers have achieved significant improvements in dysarthric speech recognition by systematically combining acoustic features with the Factorized Time Delay Neural Network (F-TDNN) model, demonstrating 4.65% relative improvement in word recognition and 4.63% in sentence recognition. The study identifies pitch features as particularly effective for handling the acoustic variability characteristic of impaired speech, advancing accessibility technology for individuals with speech disorders.

AINeutralarXiv – CS AI · Jun 105/10

🧠

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

This study evaluates machine learning approaches for distinguishing asthma from COPD using pulmonary sound analysis, comparing MFCC matrices, log-mel spectrograms, and VAR models with CNN and GRU networks. MFCC representations with adaptive-length windowing achieved the best performance (F1-score 0.877), while sophisticated fusion strategies and data augmentation unexpectedly degraded results, emphasizing the importance of authentic clinical data.

AINeutralarXiv – CS AI · Jun 96/10

🧠

LATTEArena: An Evaluation Framework for LLM-powered Tabular Feature Engineering (Extended Version)

Researchers introduce LATTEArena, a standardized evaluation framework for comparing LLM-powered tabular feature engineering methods. The framework decomposes 15 representative techniques into reusable components and reveals that Tree-of-Thought combined with Monte Carlo Tree Search offers optimal cost-effectiveness, while RPN and Code formats excel at different task types.

🏢 Meta

AINeutralarXiv – CS AI · Jun 95/10

🧠

A Hierarchical Feature Engineering Framework for Automated Classification of Phonotraumatic and Non-Phonotraumatic Vocal Hyperfunction

Researchers developed a hierarchical feature engineering framework to classify vocal hyperfunction subtypes using non-invasive neck-surface acceleration monitoring. The machine learning approach achieved 89.1% AUC for phonotraumatic cases and 72.8% for non-phonotraumatic cases, with coupling features proving crucial for distinguishing both conditions from healthy controls.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Which Anatomy Matters Under Limited Labels? A Data-Efficient Anatomy-Aware Benchmark for Cardiac Pathology Prediction

Researchers present an anatomy-aware benchmark demonstrating that in low-data medical imaging scenarios, effective representation of clinically meaningful cardiac structures outperforms model complexity for pathology prediction. The study uses cardiac MRI segmentation data to show that simpler classifiers with better anatomical feature engineering achieve superior results compared to more complex models with generic representations.

AIBullisharXiv – CS AI · Jun 26/10

🧠

DAStatFormer: A Hybrid Multibranch Transformer with Statistical Feature Integration for DAS-Based Pattern Recognitions

Researchers introduce DAStatFormer, a hybrid Transformer model that dramatically improves Distributed Acoustic Sensing (DAS) event classification by extracting 24 statistical features per channel instead of processing raw signals, achieving 99.4% accuracy on benchmark datasets while reducing computational requirements significantly compared to existing deep learning approaches.

AINeutralarXiv – CS AI · May 296/10

🧠

Evolving Features vs Evolving Entire Trees with GP for Interpretable Survival Analysis

Researchers propose using genetic programming to evolve interpretable feature sets and tree structures for survival analysis models, demonstrating improved predictive performance while maintaining shallow, explainable decision trees. The approach addresses the fundamental trade-off between accuracy and interpretability in medical survival prediction by optimizing both feature construction and tree logic simultaneously.

AINeutralarXiv – CS AI · May 285/10

🧠

REED: Post-Training Representation Editing for Cross-Domain Linguistic Steganalysis

Researchers propose REED, a post-training representation editing method that improves linguistic steganalysis detection across different domains without modifying model architecture or updating parameters. The technique uses domain-offset vectors and source-domain cover-to-stego directions to adapt detectors to unseen domains with different vocabularies and writing styles.

AINeutralarXiv – CS AI · Apr 146/10

🧠

When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

Researchers demonstrate that large language models can extract predictive features from financial news with valid intermediate signals (Information Coefficient >0.15), yet these features fail to improve reinforcement learning trading agents during macroeconomic shocks. The findings reveal a critical gap between feature-level validity and downstream policy robustness, suggesting that valid signals alone cannot guarantee trading performance under distribution shifts.

AINeutralarXiv – CS AI · Mar 27/1017

🧠

Exploring Robust Intrusion Detection: A Benchmark Study of Feature Transferability in IoT Botnet Attack Detection

Researchers conducted a benchmark study on IoT botnet intrusion detection systems, finding that models trained on one network domain suffer significant performance degradation when applied to different environments. The study evaluated three feature sets across four IoT datasets and provided guidelines for improving cross-domain robustness through better feature engineering and algorithm selection.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation

Researchers propose a new multi-agent reinforcement learning framework that uses three cooperative agents with attention mechanisms to automate feature transformation for machine learning models. The approach addresses key limitations in existing automated feature engineering methods, including dynamic feature expansion instability and insufficient agent cooperation.

AINeutralarXiv – CS AI · Feb 273/106

🧠

Predicting Tennis Serve directions with Machine Learning

Researchers developed a machine learning method to predict professional tennis players' first serve directions, achieving 49% accuracy for male players and 44% for female players. The study provides evidence that top players use mixed-strategy serving decisions and suggests contextual information plays a larger role in tennis strategy than previously understood.