Hguimaraes random tech notes

Music Information Retrieval in Deep Learning Era

The music industry in the early 21st century witnessed a drastic change in the way it consumed and distributed its main product, music. The physical media sales model such as CD, Vinyl, and Cassette was rapidly replaced by digital media with the increasing popularity of personal computers. Companies such as Napster that used peer-to-peer (P2P) architectures to distribute files in a decentralized way became very popular and for many years it was believed that the music industry was doomed to bankruptcy due to illegal distribution.

In fact this paradigm shift was responsible for bilionaries losses in the music industry, which today is trying to find again its way trough streaming services (e.g. Spotify, Apple Music and Tidal). On the other hand, large scale music distribution in digital format allow us to analyse user behavior and its connections, improving on how we can suggest new content or understand patterns in songs. This is one of the biggest goals of the Music Information Retrieval (MIR) field.

MIR is interdisciplinary field focused on the extraction of music features and it’s application, for example: Recommendation systems, automatic transcription, instrutments recognition and music labeling.

Machine Learning

Analyze a song from the raw audio signal is the objective of several sub-areas of the MIR, among them we can highlight Content-based Recommendation, where the objective is to allow the classification and grouping of songs through a common characteristic (e.g. genre) with the aim of improving the experience and suggestion of new contents for a user given his usage history. In recent years the accuracy of recommendation systems of this type has reached levels close to the state of art with the use of Machine Learning techniques where the challenge is to allow the computer to learn from examples in terms of hierarchical concepts.

In special, a technique called Neural Networks are responsible for new breakthrougs in the field. Probably the reader is aware of NN and its sucess in computer vision, but for those who are not, we can think of a NN as universal function aproximator. In a supervised learning problem, let be a set of examples passed to a NN, where is the input and is the label. The NN defines mapping and learn the best values for in order to minimize the difference between and .

The performance of many Machine Learning algorithms depends heavily on the representation of the data provided. In many applications it is possible to solve a problem more easily if we choose the correct features for a given task. However, the choice of such features may not be a simple task and depends on the expertise of a specialist of the problem domain. An alternative to this, a new research field has gained popularity in recent years called Deep Learning, which allows us to use techniques with which we can explore the data in its raw and unstructured form and progressively learn patterns from these representations.


This is not an extensive list of papers/books related to MIR, but one of my favorites papers in the area. A more detailed version can be found here: awesome-deep-learning-music

Disclaimer: I’m personally interested in Music Genre classification as you may notice

Year Title Authors
2012 An Introduction to Audio Content Analysis Alexander Lerch
2002 Musical Genre Classification of Audio Signals George Tzanetakis
2016 WaveNet: A Generative Model for Raw Audio Google Deep Mind
2017 Music Genre Classification with Paralleling Recurrent Convolutional Neural Network Lin Feng
2017 A Neural Parametric Singing Synthesizer Merlijn Blaauw
comments powered by Disqus