9 articles tagged with #gradient-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed HAP (Heterogeneity-Aware Adaptive Pre-ranking), a new framework for recommender systems that addresses gradient conflicts in training by separating easy and hard samples. The system has been deployed in Toutiao's production environment for 9 months, achieving 0.4% improvement in user engagement without additional computational costs.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers developed new activation functions for deep neural networks based on polynomial and trigonometric orthonormal bases that can successfully train models like GPT-2 and ConvNeXt. The work addresses gradient problems common with polynomial activations and shows these networks can be interpreted as multivariate polynomial mappings.
AIBullisharXiv โ CS AI ยท Mar 37/102
๐ง Researchers propose GradientStabilizer, a new technique to address training instability in deep learning by replacing gradient magnitude with statistically stabilized estimates while preserving direction. The method outperforms gradient clipping across multiple AI training scenarios including LLM pre-training, reinforcement learning, and computer vision tasks.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers introduce Uni-X, a novel architecture for unified multimodal AI models that addresses gradient conflicts between vision and text processing. The X-shaped design uses modality-specific processing at input/output layers while sharing middle layers, achieving superior efficiency and matching 7B parameter models with only 3B parameters.
$UNI
AIBearisharXiv โ CS AI ยท Mar 37/103
๐ง Researchers have developed a new 'untargeted jailbreak attack' (UJA) that can compromise AI safety systems in large language models with over 80% success rate using only 100 optimization iterations. This gradient-based attack method expands the search space by maximizing unsafety probability without fixed target responses, outperforming existing attacks by over 30%.
AIBullisharXiv โ CS AI ยท Mar 55/10
๐ง Researchers propose JPmHC (Jacobian-spectrum Preserving manifold-constrained Hyper-Connections), a new deep learning framework that improves upon existing Hyper-Connections by replacing identity skips with trainable linear mixers while controlling gradient conditioning. The framework addresses training instability and memory overhead issues in current deep learning architectures through constrained optimization on specific mathematical manifolds.
AIBullisharXiv โ CS AI ยท Mar 36/108
๐ง Researchers introduced GOME, an AI agent that uses gradient-based optimization instead of tree search for machine learning engineering tasks, achieving 35.1% success rate on MLE-Bench. The study shows gradient-based approaches outperform tree search as AI reasoning capabilities improve, suggesting this method will become more effective as LLMs advance.
AIBullisharXiv โ CS AI ยท Mar 26/1010
๐ง Researchers have developed a new quantum machine learning optimization technique using ternary encodings that significantly improves frequency tuning efficiency. The method achieves 22.8% better performance than existing approaches while requiring exponentially fewer encoding gates than traditional fixed-frequency methods.
AIBullisharXiv โ CS AI ยท Mar 25/107
๐ง Researchers introduce FedDAG, a new clustered federated learning framework that improves AI model training across heterogeneous client environments. The system combines data and gradient similarity metrics for better client clustering and uses a dual-encoder architecture to enable knowledge sharing across clusters while maintaining specialization.