y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#pre-training News & Analysis

15 articles tagged with #pre-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

15 articles
AIBullisharXiv โ€“ CS AI ยท 1d ago7/10
๐Ÿง 

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Researchers present Chain-of-Models Pre-Training (CoM-PT), a novel method that accelerates vision foundation model training by up to 7.09X through sequential knowledge transfer from smaller to larger models in a unified pipeline, rather than training each model independently. The approach maintains or improves performance while significantly reducing computational costs, with efficiency gains increasing as more models are added to the training sequence.

AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Researchers have released DanQing, a large-scale Chinese vision-language dataset containing 100 million high-quality image-text pairs curated from Common Crawl data. The dataset addresses the bottleneck in Chinese VLP development and demonstrates superior performance compared to existing Chinese datasets across various AI tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Training Language Models via Neural Cellular Automata

Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Researchers released two open-source datasets, SwallowCode and SwallowMath, that significantly improve large language model performance in coding and mathematics through systematic data rewriting rather than filtering. The datasets boost Llama-3.1-8B performance by +17.0 on HumanEval for coding and +12.4 on GSM8K for math tasks.

AIBullishHugging Face Blog ยท Mar 207/108
๐Ÿง 

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

The article discusses Cosmopedia, a methodology for generating large-scale synthetic data specifically designed for pre-training Large Language Models. This approach addresses the challenge of obtaining sufficient high-quality training data by creating artificial datasets that can supplement or replace traditional web-scraped content.

AINeutralarXiv โ€“ CS AI ยท 2d ago6/10
๐Ÿง 

Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance

Researchers introduce a Cross-Lingual Mapping Task during LLM pre-training to improve multilingual performance across languages with varying data availability. The method achieves significant improvements in machine translation, cross-lingual question answering, and multilingual understanding without requiring extensive parallel data.

AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Expert Divergence Learning for MoE-based Language Models

Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

Intention-Conditioned Flow Occupancy Models

Researchers introduce Intention-Conditioned Flow Occupancy Models (InFOM), a new reinforcement learning approach that uses flow matching to predict future states and incorporates user intention as a latent variable. The method demonstrates significant improvements with 1.8x median return improvement and 36% higher success rates across 40 benchmark tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1013
๐Ÿง 

3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Researchers developed MedMAP, a Medical Modality-Aware Pretraining framework that enhances vision-language models for 3D MRI multi-organ abnormality detection. The framework addresses challenges in modality-specific alignment and cross-modal feature fusion, demonstrating superior performance on a curated dataset of 7,392 3D MRI volume-report pairs.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1018
๐Ÿง 

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.

AIBullishLil'Log (Lilian Weng) ยท Jan 316/10
๐Ÿง 

Generalized Language Models

This article discusses the evolution of generalized language models including BERT, GPT, and other major pre-trained models that achieved state-of-the-art results on various NLP tasks. The piece covers the breakthrough progress in 2018 with large-scale unsupervised pre-training approaches that don't require labeled data, similar to how ImageNet helped computer vision.

๐Ÿข OpenAI
AINeutralarXiv โ€“ CS AI ยท Mar 174/10
๐Ÿง 

Unsupervised Point Cloud Pre-Training via Contrasting and Clustering

Researchers propose ConClu, an unsupervised pre-training framework for point clouds that combines contrasting and clustering techniques to learn discriminative representations without labeled data. The method outperforms state-of-the-art approaches on multiple downstream tasks, addressing the challenge of expensive point cloud annotation.

AINeutralHugging Face Blog ยท Aug 223/105
๐Ÿง 

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

The article appears to be about pre-training BERT language models using Hugging Face Transformers framework with Habana Gaudi processors. However, the article body is empty, making it impossible to provide detailed analysis of the content or methodology discussed.

AINeutralOpenAI News ยท Jan 241/108
๐Ÿง 

Text and code embeddings by contrastive pre-training

The article title references text and code embeddings using contrastive pre-training methodology, but no article body content was provided for analysis. Without the actual content, a comprehensive assessment of the technical details, implications, or market impact cannot be performed.