With the rapid advancement of deep learning technologies
DOI:
https://doi.org/10.61173/0ep47b69Keywords:
Chinese dialect recognition, technological advancement, HMM, GMMAbstract
This paper addresses the core challenges of low-resource dialect speech recognition by proposing a comprehensive solution that integrates self-supervised pre-training, efficient parameter fine-tuning, a hybrid modeling unit, and data augmentation. The core of this solution lies in the introduction of a lightweight adapter module, which effectively transfers knowledge from a resource-rich host language to a low-resource dialect without significantly increasing the number of parameters. This overcomes the problem of model overfitting due to the scarcity of dialect data. Furthermore, to further explore and utilize limited annotated data, this study innovatively combines multitask learning strategies with speech synthesis techniques. Multi-task learning improves model generalization through shared representations, while augmentation based on synthetic data effectively expands the diversity of training samples, essentially alleviating the bottleneck of data scarcity. Based on this approach, we successfully constructed a scalable and engineering-feasible dialect speech recognition framework. To validate its effectiveness, we conducted systematic experiments on multiple representative dialect datasets. Consistently, our approach significantly reduces word error rates compared to baseline methods, with particularly significant performance improvements in low-resource settings, fully demonstrating the solution’s superior cross-dialect adaptability and generalization capabilities. In summary, this research not only demonstrates the robustness and efficiency of an integrated technical approach in resource-constrained scenarios, but more importantly, it provides a proven path for the practical deployment of dialect speech recognition systems. This research has positive practical implications for promoting dialect preservation and fostering the inclusive development of multilingual information processing technologies.