Data Sparsity and Unsupervised Training of the Classifier-Based Speech Translation

A Novel Approach for Addressing Data Sparsity in Classifier-Based Speech Translation

Authors

  • Durga Gupta Author
  • Dr. Vijay Singh Author

Keywords:

data sparsity, unsupervised training, classifier-based speech translation, voice data, domain-adaptive learning, semi-supervised learning, machine learning, n-best lists, unsupervised algorithms, Transonics project, SMT engine, conceptual distance, clustering algorithm, sparse data, statistical machine translation software, train classifiers

Abstract

We focused on speech recognition jobs that need huge volumes of tagged voice data yet arechallenging to gather. Both academic study and practical applications make use of domain-adaptive andsemi-supervised learning techniques. Algorithms from the field of machine learning may be utilised forunsupervised learning, which entails studying and categorising data sets that have not been labelled. It ishypothesized that the accuracy of the suggested strategy depends on the size of the n-best lists. The trialsemployed n-best lists with sizes of 100, 500, 1000, and 2000 to make these findings. Thus, theseunsupervised algorithms are able to find patterns in data without any external supervision. This studymade use of a dataset that was compiled throughout the duration of the Transonics project. In addition,the output vocabulary may benefit greatly from employing a more robust SMT engine. For this purpose,we have adopted a strategy for determining how far apart two statements are conceptually, and asuitable clustering algorithm. To deal with the problem of sparse data, researchers have developed anovel approach that use statistical machine translation software to train classifiers

Downloads

Download data is not yet available.

Downloads

Published

2020-03-01