标题: 使用鉴别式动态非负矩阵分解之单声道声源分离
Discriminative and Dynamic Nonnegative Matrix Factorization on Monaural Audio Source Separation
作者: 黄奕钧
冀泰石
Huang, Yi-Chun
Chi, Tai-Shih
工学院声音与音乐创意科技硕士学位学程
关键字: 单声道声源分离;局部保留;歌声分离;语音消噪;鉴别式学习;非负矩阵分解;Monaural audio source separation;Local preserving;Singing voice separation;Speech denoising;Discriminative learning;Nonnegative matrix factorization
公开日期: 2016
摘要: 非负矩阵分解是一种热门的声音声源分离工具,它可以从声源频谱中习得频谱字典,并且利用这些习得之字典将混合讯号加以分离。然而,标准的非负矩阵分解在学习的过程中并没有考虑声源内的时间特性。而非负矩阵分解类似生成模型的特性,使得它无法保证具有良好代表性的频谱字典对于声音声源分离有帮助。此外,字典的学习也应该被划分为数个子区块以处理声音讯号的不同时频特性,例如不同语者的语音讯号,或者音乐讯号中的不同乐器。因此,我们提出的方法结合数种非负矩阵分解的延伸方法以解决上述问题,应用于语音降噪与歌声及背景音乐分离。在时间特性建模部分,我们使用一套向量自回归模型的后处理方法;在子区块划分方面,则引进一套局部基底学习方法。我们也引进了一套修改过后的鉴别式学习程序,用以解决代表性与分离效能之问题。总而言之,我们基于非负矩阵分解的延伸方法考虑了局部的时间特性以及模型对不同声源的鉴别能力。
The nonnegative matrix factorization (NMF), which learns dictionaries from source spectra and uses the learned dictionaries to decompose the mixture in the test phase, is a widely used tool for audio source separation. However, the standard NMF does not consider temporal properties of the signals when learning dictionaries. The standard NMF is also a generative model, which do not guarantee that a good representation model is also a good separation model. Besides, the learned dictionaries should be partitioned into subgroups to account for sources with different spectro-temporal properties, such as speech signals from different speakers or music signals from different instruments. Therefore, we propose a method by combine extensions of NMF to address these problems for speech denoising and singing voice separation. For temporal modeling, our method adopts a post-filtering technique, which derives a source specific vector autoregressive (VAR) model to smooth the NMF coefficients in the test phase. For partitioning, we make use of the mixture of local dictionaries (MLD) technique to divide dictionaries into subgroups by considering intra- and inter- group distances. We also introduce a modified discriminative learning procedure to deal with the representation-separation problem. To sum up, our NMF-extended method put additional considerations on the temporal properties of each subgroup and discrimination between sources.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070351901
http://hdl.handle.net/11536/139110
显示于类别:Thesis