Knowledge Augmentation and Latent Topics Learning with Small Language Models for Low-Resource Dialogue Generation
DOI:
CSTR:
Author:
Affiliation:

1.College of Computer and Information Engineering, Henan University of Economics and Law;2.School of Data Science and E-commerce, Henan University of Economics and Law;3.School of Tourism Management and MICE, Henan University of Economics and Law

Clc Number:

TP391??????

Fund Project:

Supported by the National Natural Science Foundation of China (62072156), the Natural Science Outstanding Youth Science Foundation Project of Henan Province (252300421061), and the Key Technologies Research and Development Program of Henan Province (242102210076).

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Knowledge-grounded dialogue methods based on small language models (SMLs) hold significant research value, yet face multiple challenges including limited generalization and reasoning capabilities compared to LLMs, model alignment difficulties, and frequent low-resource application scenarios. To address these challenges, this paper presents a novel three-stage knowledge-enhanced and latent-topic learning framework for low-resource dialogue generation with SMLs. The SMLs-based dialogue model initially is pre-trained on knowledge-grounded dialogue datasets, and then a supervised fine-tuning for the model is performed on low-resource dialogue datasets. Model domain adaptation issues are alleviated through application of a noise-injection-based denoising pre-training approach to dialog texts. A retrieval-augmented knowledge generation module is introduced to produce context-relevant knowledge, effectively mitigating knowledge scarcity in low-resource settings. To solve topic drift and intent deviation, a method via conditional variational autoencoder-based latent topic learning is proposed to guide response generation. To improve model performance, self-critical sequence training for reinforcement learning is employed for the dialogue model on low-resource datasets. Experimental results demonstrate that the proposed method significantly outperforms baseline approaches across multiple metrics evaluating response generation quality, including diversity, coherence, and accuracy.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 08,2025
  • Revised:January 07,2026
  • Adopted:April 08,2026
  • Online:
  • Published:
Article QR Code