Speech Compression via Gaussian Mixture Model with Wavelet Analysis used in Communication Systems

Faculty Engineering Year: 2009
Type of Publication: Theses Pages: 161
Authors:
BibID 10904387
Keywords : Communication Systems    
Abstract:
Speech compression to very low bit rates is desirable for a number of applications that require the transmission of digital speech or multimedia applications. This thesis introduces two different techniques for Arabic speech compression. These techniques are Gaussian Mixture Model (GMM) and Wavelet Transform (WT) techniques and a comparison between these techniques is given. It is found that:• Supervised learning technique is used to evaluate the PDF parameters. Gaussian Mixture Model (GMM) based on EM algorithm gives good compression ratios with acceptable SNR. For mono and stereo voice, the quality of output speech is nearly identical.• Wavelet transform can be used for speech signals. This is because it analyzes the speech both in time and frequency domains. Wavelet transform (WT) can be used for speech compression based on threshold value, as it gives good compression ratios.• The Bior3.1 and Db10 wavelet functions can be used for speech compression as they give better results than other wavelet functions.• Increasing the level of decomposition increases the compression ratio, but after a certain level, the compression ratio becomes approximately constant. No further enhancements were achieved beyond level 2 decomposition.• For both Gaussian Mixture Model (GMM) and Wavelet transform (WT), the signal to noise ratio is decreased when compression ratio increases.• Gaussian Mixture Model is better than Wavelet transform in speech compression especially for Holy Quran even though for high compression ratios.• For normal speech, Gaussian Mixture Model and Wavelet transform give a relatively equal value for low compression ratios. When compression ratio increases to a certain high value, wavelet transform gives SNR better than SNR obtained with GMM. However, this speech is not acceptable speech at this value of high compression ratio.• For wavelet compression, the threshold choice affects extremely the quality of the output speech. This threshold value is not constant and chosen by human resource.• A Gaussian Mixture Model (GMM) and Wavelets compression techniques are considered to be lossy compression where the reconstructed signal is not an exact match of the original signal. For speech signals, this loss is acceptable since we are interested only in recognizing the signal.• Mean Opinion Score (MOS) tests show that the Gaussian Mixture Model (GMM) techniques are better than the Wavelet Transform (WT) techniques in the quality. 
   
     
PDF  
       
Tweet