Musical source separation

Source separation is to demix a musical audio signal, which is typically composed of the sounds from various sources (e.g. vocal, guitar, bass and drums). A successful source separation model can find applications in music production, DJing, music education, singing voice processing etc. It can also be used as a pre-processing step for downstream music analyses such as transcription and melody extraction.

Our model is based on deep learning and it is trained on multitrack datasets of Western pop song. According to the result of the 2018 Signal Separation Evaluation Campaign (SiSEC; link), our model achieves the second-best result in the world, only behind the model proposed by Sony.

We apply the model to Japanese and Chinese pop songs and found it perform equally well. You can check out the effectiveness of our model by visiting the followign demo website:

http://ss.ciaua.com

Related publication

2018

Conference, published at Proc. IEEE Int. Conf. Machine Learning and Applications (ICMLA), Dec 1, 2018

Denoising auto-encoder with recurrent skip connections and residual regression for music source separation

2017

Journal, published at IEEE Signal Processing Letters, Feb 1, 2017

Informed group-sparse representation for singing voice separation

2016

Journal, published at IEEE Signal Processing Letters, Feb 1, 2016

Complex and quaternionic principal component pursuit and its application to audio separation

Journal, published at IEEE/ACM Transactions on Audio, Speech, and Language Processing, Nov 1, 2016

Monaural music source separation using convolutional sparse coding

2015

Conference, published at IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 1, 2015

Vocal activity informed singing voice separation with the IKALA dataset

2012

Conference, published at ACM Int. Conf. Multimedia (MM), Nov 1, 2012

On sparse and low-rank matrix decomposition for singing voice separation