Data Sparsity and Unsupervised Training of the Classifier-Based Speech Translation
A Novel Approach for Addressing Data Sparsity in Classifier-Based Speech Translation
Keywords:
data sparsity, unsupervised training, classifier-based speech translation, voice data, domain-adaptive learning, semi-supervised learning, machine learning, n-best lists, unsupervised algorithms, Transonics project, SMT engine, conceptual distance, clustering algorithm, sparse data, statistical machine translation software, train classifiersAbstract
We focused on speech recognition jobs that need huge volumes of tagged voice data yet arechallenging to gather. Both academic study and practical applications make use of domain-adaptive andsemi-supervised learning techniques. Algorithms from the field of machine learning may be utilised forunsupervised learning, which entails studying and categorising data sets that have not been labelled. It ishypothesized that the accuracy of the suggested strategy depends on the size of the n-best lists. The trialsemployed n-best lists with sizes of 100, 500, 1000, and 2000 to make these findings. Thus, theseunsupervised algorithms are able to find patterns in data without any external supervision. This studymade use of a dataset that was compiled throughout the duration of the Transonics project. In addition,the output vocabulary may benefit greatly from employing a more robust SMT engine. For this purpose,we have adopted a strategy for determining how far apart two statements are conceptually, and asuitable clustering algorithm. To deal with the problem of sparse data, researchers have developed anovel approach that use statistical machine translation software to train classifiersDownloads
Download data is not yet available.
Published
2020-03-01
Issue
Section
Articles