Knowledge Augmentation and Latent Topics Learning with Small Language Models for Low-Resource Dialogue Generation

Knowledge Augmentation and Latent Topics Learning with Small Language Models for Low-Resource Dialogue Generation
DOI:
                        
CSTR:
                        
Author:
                        
Affiliation:1.College of Computer and Information Engineering, Henan University of Economics and Law;2.School of Data Science and E-commerce, Henan University of Economics and Law;3.School of Tourism Management and MICE, Henan University of Economics and Law
Clc Number:TP391??????
Fund Project:Supported by the National Natural Science Foundation of China (62072156), the Natural Science Outstanding Youth Science Foundation Project of Henan Province (252300421061), and the Key Technologies Research and Development Program of Henan Province (242102210076).

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Knowledge-grounded dialogue methods based on small language models (SMLs) hold significant research value, yet face multiple challenges including limited generalization and reasoning capabilities compared to LLMs, model alignment difficulties, and frequent low-resource application scenarios. To address these challenges, this paper presents a novel three-stage knowledge-enhanced and latent-topic learning framework for low-resource dialogue generation with SMLs. The SMLs-based dialogue model initially is pre-trained on knowledge-grounded dialogue datasets, and then a supervised fine-tuning for the model is performed on low-resource dialogue datasets. Model domain adaptation issues are alleviated through application of a noise-injection-based denoising pre-training approach to dialog texts. A retrieval-augmented knowledge generation module is introduced to produce context-relevant knowledge, effectively mitigating knowledge scarcity in low-resource settings. To solve topic drift and intent deviation, a method via conditional variational autoencoder-based latent topic learning is proposed to guide response generation. To improve model performance, self-critical sequence training for reinforcement learning is employed for the dialogue model on low-resource datasets. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches across multiple metrics evaluating response generation quality, including diversity, coherence, and accuracy.

Reference

Cited by

Get Citation

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 08,2025
Revised:January 07,2026
Adopted:April 08,2026
Online:
Published:

Home

Get Citation

Related Videos

Share

Article Metrics

History

Article QR Code