Source separation is to demix a musical audio signal, which is typically composed of the sounds from various sources (e.g. vocal, guitar, bass and drums). A successful source separation model can find applications in music production, DJing, music education, singing voice processing etc. It can also be used as a pre-processing step for downstream music analyses such as transcription and melody extraction.
Our model is based on deep learning and it is trained on multitrack datasets of Western pop song. According to the result of the 2018 Signal Separation Evaluation Campaign (SiSEC; link), our model achieves the second-best result in the world, only behind the model proposed by Sony.
We apply the model to Japanese and Chinese pop songs and found it perform equally well. You can check out the effectiveness of our model by visiting the followign demo website: