#chinese-nlp News & Analysis

6 articles tagged with #chinese-nlp. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

FormalASR: End-to-End Spoken Chinese to Formal Text

Researchers present FormalASR, compact end-to-end models that convert spoken Chinese directly into formal written text, eliminating the need for post-processing with large language models. Built on newly created datasets and fine-tuned versions of Qwen3-ASR, the solution achieves significant error reduction while enabling lightweight on-device deployment.

AIBearisharXiv – CS AI · Jun 17/10

🧠

MedFact: Benchmarking the Fact-Checking Capabilities of Large Language Models on Chinese Medical Texts

Researchers introduced MedFact, a Chinese medical fact-checking benchmark containing 2,116 expert-annotated instances designed to evaluate Large Language Models' ability to verify medical information and identify errors. Testing 20 leading LLMs revealed that while models can detect whether text contains errors, they struggle significantly with precise error localization and exhibit an "over-criticism" phenomenon where correct information is frequently misidentified as false.

AINeutralarXiv – CS AI · Jun 235/10

🧠

Explanation-Guided Medical Named Entity Recognition with Stability and Boundary Awareness for Atopic Dermatitis

Researchers propose an explanation-guided framework for medical named entity recognition (NER) in Chinese atopic dermatitis clinical texts, using stability and boundary-aware constraints to improve model reliability and interpretability. The method combines perturbation-based analysis with adaptive fusion of local and global explanations, achieving performance gains across multiple NER models while enhancing explanation robustness for clinical decision support.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Luwen Technical Report

Researchers have developed Luwen, an open-source Chinese legal language model built on Baichuan that uses continual pre-training, supervised fine-tuning, and retrieval-augmented generation to excel at legal tasks. The model outperforms baselines on five legal benchmarks including judgment prediction, judicial examination, and legal reasoning, demonstrating effective domain adaptation for specialized legal applications.

AINeutralarXiv – CS AI · Mar 54/10

🧠

A benchmark for joint dialogue satisfaction, emotion recognition, and emotion state transition prediction

Researchers have created a new multi-task Chinese dialogue dataset that enables prediction of user satisfaction, emotion recognition, and emotional state transitions across multiple conversation turns. The dataset addresses limitations in existing Chinese resources and aims to improve understanding of how user emotions evolve during interactions to better predict satisfaction.

AINeutralarXiv – CS AI · Mar 44/102

🧠

Hot-Start from Pixels: Low-Resolution Visual Tokens for Chinese Language Modeling

Researchers developed a novel approach for Chinese language modeling using low-resolution visual images of characters instead of traditional text tokens. The method achieved comparable accuracy (39.2%) to index-based models while showing faster initial learning, demonstrating that visual structure can effectively represent logographic scripts.