AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce QD-LLM, a framework that evolves lightweight prompt embeddings (~32K parameters) to steer frozen large language models toward diverse outputs without fine-tuning. The approach outperforms existing quality-diversity optimization methods by 46.4% in coverage and demonstrates practical applications in test generation and training data improvement.
🧠 Llama
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose GLoRA, a gauge-aware federated learning framework that improves parameter-efficient adaptation of large language models by aggregating semantic updates rather than raw LoRA factors. The method addresses a fundamental mathematical limitation in existing federated LoRA systems and demonstrates consistent performance improvements across heterogeneous client scenarios.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers propose gated QKAN-FWP, a quantum-inspired machine learning framework that combines Fast Weight Programmers with quantum-inspired Kolmogorov-Arnold Networks using single-qubit circuits. The model achieves superior performance on time-series forecasting tasks with 12.5k parameters while maintaining compatibility with current NISQ quantum processors, demonstrating practical viability for near-term quantum computing applications.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce Causal Energy Minimization (CEM), a theoretical framework that reinterprets Transformer layer architecture through energy-based optimization principles. The approach derives weight-tied attention and gated MLPs as gradient updates on energy functions, revealing new design spaces for parameter-efficient Transformer variants that maintain baseline performance at hundred-million-parameter scales.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers propose a novel application of neural operators (NOs) for finite-dimensional function interpolation, demonstrating they can outperform standard neural networks while using significantly fewer parameters. The approach is validated on synthetic benchmarks and applied to nuclear mass prediction, achieving competitive accuracy with high parameter efficiency.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers have developed Von Neumann Networks (VNNs), a novel neural network architecture inspired by John von Neumann's mid-20th century cellular automata model, demonstrating superior parameter efficiency and performance on basic tasks compared to traditional deep learning approaches. The framework extends neural operators through Green's functions on cellular topologies and proves computational universality, potentially opening new architectural paradigms for both software and hardware design.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers present a layer-wise analysis of Supervised Fine-Tuning (SFT) in large language models, revealing that middle layers remain stable during training while final layers exhibit high sensitivity. They introduce Mid-Block Efficient Tuning, a targeted approach that selectively updates intermediate layers and achieves up to 10.2% performance gains over standard LoRA on benchmarks with significantly reduced parameter overhead.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose an SVD-based orthogonal subspace projection method for continual machine unlearning that prevents interference between sequential deletion tasks in neural networks. The approach maintains model performance on retained data while effectively removing influence of unlearned data, addressing a critical limitation of naive LoRA fusion methods.
AIBullisharXiv – CS AI · Apr 146/10
🧠Researchers propose a novel hybrid fine-tuning method for Large Language Models that combines full parameter updates with Parameter-Efficient Fine-Tuning (PEFT) modules using zeroth-order and first-order optimization. The approach addresses computational constraints of full fine-tuning while overcoming PEFT's limitations in knowledge acquisition, backed by theoretical convergence analysis and empirical validation across multiple tasks.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers introduce Instance-Adaptive VAE (IA-VAE), a new framework that uses hypernetworks to generate input-specific parameter modulations for variational autoencoders, reducing the amortization gap while maintaining computational efficiency. The approach demonstrates improved posterior approximation accuracy on synthetic data and consistently better ELBO performance on image benchmarks compared to standard VAEs.
AIBullisharXiv – CS AI · Mar 176/10
🧠AdapterTune introduces a new method for efficiently fine-tuning Vision Transformers by using zero-initialized low-rank adapters that start at the pretrained function to prevent optimization instability. The technique achieves +14.9 point accuracy improvement over head-only transfer while using only 0.92% of parameters needed for full fine-tuning.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers developed a hybrid model combining Mamba-2 state space operators with Transformer blocks for recursive reasoning, achieving a 2% improvement in pass@2 performance on ARC-AGI-1 tasks with only 6.83M parameters. The study demonstrates that Mamba-2 operators can preserve reasoning capabilities while improving solution candidate coverage in tiny neural networks.
AIBullisharXiv – CS AI · Mar 45/103
🧠Researchers developed GLoRIA, a parameter-efficient framework for automatic speech recognition that adapts to regional dialects using location metadata. The system achieves state-of-the-art performance while updating less than 10% of model parameters and demonstrates strong generalization to unseen dialects.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers introduce Polynomial Surrogate Training (PST) to enable differentiable ternary logic gate networks, reducing parameters by 2,187x while maintaining performance. The method extends beyond binary logic gates to ternary systems with an UNKNOWN state for uncertainty handling, training 2-3x faster than binary networks.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers introduced Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs), a new neural network architecture that solves reasoning problems like Sudoku and ARC-AGI more efficiently than existing models. SE-RRMs achieve competitive performance with only 2 million parameters and can generalize across different puzzle sizes without requiring extensive data augmentation.
AIBullisharXiv – CS AI · Mar 26/1018
🧠Researchers propose QKAN-LSTM, a quantum-inspired neural network that integrates quantum variational activation functions into LSTM architecture for sequential modeling. The model achieves superior predictive accuracy with 79% fewer parameters than classical LSTMs while remaining executable on classical hardware.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers have developed LilMoo, a 0.6-billion parameter Hindi language model trained from scratch using a transparent, reproducible pipeline optimized for limited compute environments. The model outperforms similarly sized multilingual baselines like Qwen2.5-0.5B and Qwen3-0.6B, demonstrating that language-specific pretraining can rival larger multilingual models.
AINeutralarXiv – CS AI · Mar 44/102
🧠Researchers developed CDD (Contamination Detection via output Distribution) to identify data contamination in small language models by measuring output peakedness. The study found that CDD only works when fine-tuning produces verbatim memorization, failing at chance level with parameter-efficient methods like low-rank adaptation that avoid memorization.
AIBullishHugging Face Blog · Feb 105/104
🧠The article discusses parameter-efficient fine-tuning methods using Hugging Face's PEFT library. PEFT enables efficient adaptation of large language models by updating only a small subset of parameters rather than full model retraining.