#pre-training News & Analysis

17 articles tagged with #pre-training. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · Apr 157/10

🧠

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Researchers present Chain-of-Models Pre-Training (CoM-PT), a novel method that accelerates vision foundation model training by up to 7.09X through sequential knowledge transfer from smaller to larger models in a unified pipeline, rather than training each model independently. The approach maintains or improves performance while significantly reducing computational costs, with efficiency gains increasing as more models are added to the training sequence.

AIBullisharXiv – CS AI · Mar 267/10

🧠

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Researchers have released DanQing, a large-scale Chinese vision-language dataset containing 100 million high-quality image-text pairs curated from Common Crawl data. The dataset addresses the bottleneck in Chinese VLP development and demonstrates superior performance compared to existing Chinese datasets across various AI tasks.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Training Language Models via Neural Cellular Automata

Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Researchers released two open-source datasets, SwallowCode and SwallowMath, that significantly improve large language model performance in coding and mathematics through systematic data rewriting rather than filtering. The datasets boost Llama-3.1-8B performance by +17.0 on HumanEval for coding and +12.4 on GSM8K for math tasks.

AIBullishHugging Face Blog · Mar 207/108

🧠

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

The article discusses Cosmopedia, a methodology for generating large-scale synthetic data specifically designed for pre-training Large Language Models. This approach addresses the challenge of obtaining sufficient high-quality training data by creating artificial datasets that can supplement or replace traditional web-scraped content.

AINeutralarXiv – CS AI · May 126/10

🧠

Emergent Semantic Role Understanding in Language Models

Researchers demonstrate that language models develop semantic role understanding (who-did-what-to-whom comprehension) primarily during pre-training, though fine-tuning still improves performance. Using linear probes on frozen transformer models, they find semantic role information emerges from language modeling objectives alone, with representation structure becoming more distributed as models scale.

AINeutralarXiv – CS AI · May 16/10

🧠

CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining

Researchers introduce CLAMP, a novel 3D pre-training framework for robotic manipulation that combines point cloud processing with contrastive learning to capture spatial information missing from traditional 2D image-based approaches. The method demonstrates superior performance across simulated and real-world tasks by leveraging multi-view depth data and action-conditioned learning to improve policy efficiency.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance

Researchers introduce a Cross-Lingual Mapping Task during LLM pre-training to improve multilingual performance across languages with varying data availability. The method achieves significant improvements in machine translation, cross-lingual question answering, and multilingual understanding without requiring extensive parallel data.

AIBearisharXiv – CS AI · Mar 96/10

🧠

From Tokenizer Bias to Backbone Capability: A Controlled Study of LLMs for Time Series Forecasting

Researchers conducted a controlled study examining the effectiveness of large language models (LLMs) for time series forecasting, finding that existing approaches often overfit to small datasets. Despite some promise, LLMs did not consistently outperform models specifically trained on large-scale time series data.

AIBullisharXiv – CS AI · Mar 37/106

🧠

Expert Divergence Learning for MoE-based Language Models

Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Intention-Conditioned Flow Occupancy Models

Researchers introduce Intention-Conditioned Flow Occupancy Models (InFOM), a new reinforcement learning approach that uses flow matching to predict future states and incorporates user intention as a latent variable. The method demonstrates significant improvements with 1.8x median return improvement and 36% higher success rates across 40 benchmark tasks.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Researchers developed MedMAP, a Medical Modality-Aware Pretraining framework that enhances vision-language models for 3D MRI multi-organ abnormality detection. The framework addresses challenges in modality-specific alignment and cross-modal feature fusion, demonstrating superior performance on a curated dataset of 7,392 3D MRI volume-report pairs.

AIBullisharXiv – CS AI · Mar 26/1018

🧠

Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Researchers introduce LoRA-Pre, a memory-efficient optimizer that reduces memory overhead in training large language models by using low-rank approximation of momentum states. The method achieves superior performance on Llama models from 60M to 1B parameters while using only 1/8 the rank of baseline methods.

AIBullishLil'Log (Lilian Weng) · Jan 316/10

🧠

Generalized Language Models

This article discusses the evolution of generalized language models including BERT, GPT, and other major pre-trained models that achieved state-of-the-art results on various NLP tasks. The piece covers the breakthrough progress in 2018 with large-scale unsupervised pre-training approaches that don't require labeled data, similar to how ImageNet helped computer vision.

🏢 OpenAI

AINeutralarXiv – CS AI · Mar 174/10

🧠

Unsupervised Point Cloud Pre-Training via Contrasting and Clustering

Researchers propose ConClu, an unsupervised pre-training framework for point clouds that combines contrasting and clustering techniques to learn discriminative representations without labeled data. The method outperforms state-of-the-art approaches on multiple downstream tasks, addressing the challenge of expensive point cloud annotation.

AINeutralHugging Face Blog · Aug 223/105

🧠

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

The article appears to be about pre-training BERT language models using Hugging Face Transformers framework with Habana Gaudi processors. However, the article body is empty, making it impossible to provide detailed analysis of the content or methodology discussed.

AINeutralOpenAI News · Jan 241/108

🧠

Text and code embeddings by contrastive pre-training

The article title references text and code embeddings using contrastive pre-training methodology, but no article body content was provided for analysis. Without the actual content, a comprehensive assessment of the technical details, implications, or market impact cannot be performed.