#feature-selection News & Analysis

18 articles tagged with #feature-selection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

18 articles

AIBearisharXiv – CS AI · Jun 237/10

🧠

Worse than Random: The Importance of a Baseline for Unsupervised Feature Selection

A research paper challenges the credibility of unsupervised feature selection methods by demonstrating that many state-of-the-art approaches perform no better than random selection. The study calls for establishing random feature selection as a mandatory baseline in future research to ensure genuine methodological improvements.

AIBullisharXiv – CS AI · Jun 17/10

🧠

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Researchers demonstrate that efficient LLM benchmarking can be substantially improved by treating it as a multiple regression problem with kernel ridge regression and applying minimum redundancy maximum relevance (mRMR) feature selection. The approach achieves lower prediction errors and faster computation than existing methods while maintaining consistency across different data splits.

AINeutralarXiv – CS AI · Jun 196/10

🧠

Neural Additive and Basis Models with Feature Selection and Interactions

Researchers propose enhanced neural additive and basis models (NAM/NBM) that incorporate feature selection mechanisms to improve computational efficiency and interpretability of deep neural networks. The advancement enables these models to handle high-dimensional datasets and capture feature interactions while reducing training costs and model sizes compared to traditional approaches.

AINeutralarXiv – CS AI · Jun 46/10

🧠

You Only Train Once: Differentiable Subset Selection for Omics Data

Researchers introduce YOTO, an end-to-end machine learning framework that simultaneously selects compact gene subsets and performs prediction tasks in single-cell transcriptomic analysis. The differentiable architecture enforces sparsity and uses multi-task learning to improve biomarker discovery while outperforming existing feature selection methods.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Do Real-World Datasets Contain Natural Experiments? An Empirical Study Using Causal Feature Selection

Researchers investigate whether real-world datasets contain natural experiments—events that create implicit interventions affecting some groups but not others—and propose using causal discovery methods to detect and leverage them for improved model performance. Their empirical study across synthetic and real-world datasets suggests that natural experiments do exist in practice and can enhance downstream machine learning outcomes when treated as interventional rather than observational data.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Estimating Mutual Information between Time Series and Temporal Event Sequences Across Diverse Analysis Tasks

Researchers propose a nonparametric mutual information estimator that quantifies dependence between continuous time series and discrete temporal event sequences without requiring data transformation or ad hoc discretization. The method addresses limitations in existing approaches through latent event clustering and continuous-discrete duality modeling, offering robust applications across causality analysis, pattern discovery, and feature selection tasks.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Implicit Regularization for Multi-label Feature Selection

Researchers propose a novel feature selection method for multi-label learning using implicit regularization and label embedding instead of traditional sparse penalization techniques. The approach leverages Hadamard product parameterization to reduce bias and potentially enable benign overfitting, showing promise on benchmark datasets.

AIBullisharXiv – CS AI · Jun 16/10

🧠

Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines

A new study challenges recent findings that dismissed Sparse Autoencoders (SAEs) as ineffective for steering Large Language Models, demonstrating that SAEs can match LoRA baseline performance when combined with a supervised feature selection pipeline. The research suggests that high sparsity constraints may not be necessary for effective model steering based on interpretability.

AINeutralarXiv – CS AI · Jun 15/10

🧠

Feature-Optimized Vision for Adaptive 3D Scene Reconstruction

Researchers propose an adaptive feature-selection system for 3D scene reconstruction that intelligently prioritizes visual data based on texture, repeatability, and geometric utility rather than using fixed thresholds. The method demonstrates improved reconstruction quality and computational efficiency across diverse scene types compared to baseline approaches, offering a modular enhancement for both classical and neural reconstruction pipelines.

AINeutralarXiv – CS AI · May 296/10

🧠

The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

A comprehensive study of Markov boundaries in tabular prediction reveals that while oracle boundaries significantly improve model performance, practical causal discovery methods fail to recover them cost-effectively. The research identifies fundamental misalignments between structural recovery optimization and predictive performance, suggesting that prediction-focused feature selection requires different approaches than theoretical assumptions propose.

AIBullisharXiv – CS AI · May 286/10

🧠

Bayesian Gated Non-Negative Contrastive Learning

Researchers propose BayesNCL, a new machine learning approach that improves the interpretability of self-supervised learning models by using probabilistic gating to filter out task-irrelevant features. The method achieves a 142.1% improvement in semantic consistency on ImageNet-100 while maintaining downstream task performance, addressing a fundamental limitation in how contrastive learning models process information.

AINeutralarXiv – CS AI · May 126/10

🧠

Sequential Feature Selection for Efficient Landslide Segmentation from Multi-Spectral Data

Researchers present a Sequential Forward Floating Selection (SFFS) framework for identifying the minimal set of satellite imagery channels needed for accurate landslide detection, demonstrating that 8 carefully selected channels match or exceed the performance of models using 30 channels. The work addresses computational efficiency and model interpretability in Earth observation machine learning by moving beyond conventional approaches that simply include all available data.

AINeutralarXiv – CS AI · May 125/10

🧠

Novel GPU Boruta algorithms for feature selection from high-dimensional data

Researchers have developed GPU-accelerated versions of the Boruta feature selection algorithm, significantly improving computational efficiency for processing large-scale datasets while maintaining accuracy comparable to the original CPU-based method. The two variants—Boruta-Permut and Boruta-TreeImp—demonstrate that GPU acceleration offers a cost-effective solution for machine learning workflows on high-dimensional data.

AINeutralarXiv – CS AI · May 96/10

🧠

CatNet: Controlling the False Discovery Rate in LSTM with SHAP Feature Importance and Gaussian Mirrors

Researchers introduce CatNet, an algorithm that controls False Discovery Rate (FDR) in LSTM neural networks by combining SHAP feature importance derivatives with a Gaussian Mirror statistical approach. The method addresses overfitting and model interpretability challenges in time-series deep learning through improved feature selection and a novel kernel-based independence measure.

AIBullisharXiv – CS AI · Feb 276/105

🧠

A Lightweight IDS for Early APT Detection Using a Novel Feature Selection Method

Researchers developed a lightweight intrusion detection system using XGBoost and explainable AI to detect Advanced Persistent Threats (APTs) at early stages. The system reduced required features from 77 to just 4 while maintaining 97% precision and 100% recall performance.

$APT

AINeutralarXiv – CS AI · Mar 34/105

🧠

Beyond False Discovery Rate: A Stepdown Group SLOPE Approach for Grouped Variable Selection

Researchers introduce Group Stepdown SLOPE, a new statistical method for high-dimensional feature selection that improves upon existing frameworks by controlling multiple error metrics and exploiting group structure in data. The method provides better statistical power while maintaining strict error control in machine learning applications.

AINeutralarXiv – CS AI · Mar 24/106

🧠

Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search

Researchers propose a new framework for feature selection that uses permutation-invariant embedding and reinforcement learning to address limitations in current methods. The approach combines an encoder-decoder paradigm to preserve feature relationships without order bias and employs policy-based RL to explore embedding spaces without convexity assumptions.

AIBullisharXiv – CS AI · Mar 24/106

🧠

Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

Researchers have developed a new framework for privacy-preserving feature selection that uses permutation-invariant representation learning and federated learning techniques. The approach addresses data imbalance and privacy constraints in distributed scenarios while improving computational efficiency and downstream task performance.