AIBullisharXiv – CS AI · 4d ago6/10
🧠Researchers introduce DSL-Topic, a novel framework that improves neural topic modeling by distilling soft labels from language models rather than relying on traditional bag-of-words reconstruction. The approach leverages LM-generated contextual signals to produce higher-quality topics with better coherence and semantic alignment, demonstrating significant improvements over existing baselines.
AINeutralarXiv – CS AI · Jun 16/10
🧠CobSeg introduces a novel multi-branch architecture for dialogue topic segmentation that separates semantic continuity from lexical boundary transitions, achieving significant performance improvements across five benchmarks without requiring LLM calls during inference. The approach demonstrates particular strength in scenarios where local lexical cues are prominent, reducing error metrics substantially in both supervised and pseudo-label settings.
AINeutralarXiv – CS AI · May 296/10
🧠A research study comparing seven transformer-based language models of varying sizes (22M to 13B parameters) in topic modeling tasks found that model size has negligible impact on topic quality. This suggests smaller, more efficient models can match larger models' performance for topic coherence applications, potentially reducing computational costs without sacrificing output quality.
AINeutralarXiv – CS AI · May 286/10
🧠SmartIterator is a visual analytics framework that helps data scientists systematically evaluate and choose between multiple unsupervised learning results across parameter sweeps. The approach operationalizes structured six-phase workflows for three clustering and topic-modeling method families, enabling informed decision-making by visualizing data grouping quality, stability, membership confidence, and domain context simultaneously.
AINeutralarXiv – CS AI · Apr 156/10
🧠Researchers propose a novel framework treating Large Language Models as attention-informed Neural Topic Models, enabling interpretable topic extraction from documents. The approach combines white-box interpretability analysis with black-box long-context LLM capabilities, demonstrating competitive performance on topic modeling tasks while maintaining semantic clarity.
AINeutralarXiv – CS AI · Mar 54/10
🧠TopicENA is a new framework that combines BERTopic with Epistemic Network Analysis to automatically analyze concept relationships in large text datasets without manual coding. The research demonstrates that automated topic modeling can replace expert manual coding while maintaining analytical quality, making network analysis scalable for large corpora.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers have created CzechTopic, a new benchmark dataset for evaluating AI models' ability to identify specific topics within historical Czech documents. The study compared various large language models and BERT-based models, finding significant performance variations with the strongest models approaching human-level accuracy in topic detection.
AINeutralarXiv – CS AI · Mar 34/103
🧠Researchers introduce Topic Word Mixing (TWM), a new human evaluation method for assessing topic models in specialized domains. The study reveals misalignment between automated metrics and human judgment, particularly in domain-specific corpora like philosophy of science publications.
AIBullishHugging Face Blog · May 314/109
🧠BERTopic, a popular topic modeling library, has integrated with the Hugging Face Hub to enable easier sharing and discovery of topic models. This integration allows researchers and practitioners to upload, download, and collaborate on BERTopic models through Hugging Face's platform.