The music industry in the early 21st century witnessed a drastic change in the way it consumed and distributed its main product, music. The physical media sales model such as CD, Vinyl, and Cassette was rapidly replaced by digital media with the increasing popularity of personal computers. Companies such as Napster that used peer-to-peer (P2P) architectures to distribute files in a decentralized way became very popular and for many years it was believed that the music industry was doomed to bankruptcy due to illegal distribution.
In fact this paradigm shift was responsible for bilionaries losses in the music industry, which today is trying to find again its way trough streaming services (e.g. Spotify, Apple Music and Tidal). On the other hand, large scale music distribution in digital format allow us to analyse user behavior and its connections, improving on how we can suggest new content or understand patterns in songs. This is one of the biggest goals of the Music Information Retrieval (MIR) field.
MIR is interdisciplinary field focused on the extraction of music features and it’s application, for example: Recommendation systems, automatic transcription, instrutments recognition and music labeling.
Analyze a song from the raw audio signal is the objective of several sub-areas of the MIR, among them we can highlight Content-based Recommendation, where the objective is to allow the classification and grouping of songs through a common characteristic (e.g. genre) with the aim of improving the experience and suggestion of new contents for a user given his usage history. In recent years the accuracy of recommendation systems of this type has reached levels close to the state of art with the use of Machine Learning techniques where the challenge is to allow the computer to learn from examples in terms of hierarchical concepts.
In special, a technique called Neural Networks are responsible for new breakthrougs in the field. Probably the reader is aware of NN and its sucess in computer vision, but for those who are not, we can think of a NN as universal function aproximator. In a supervised learning problem, let be a set of examples passed to a NN, where is the input and is the label. The NN defines mapping and learn the best values for in order to minimize the difference between and .
The performance of many Machine Learning algorithms depends heavily on the representation of the data provided. In many applications it is possible to solve a problem more easily if we choose the correct features for a given task. However, the choice of such features may not be a simple task and depends on the expertise of a specialist of the problem domain. An alternative to this, a new research field has gained popularity in recent years called Deep Learning, which allows us to use techniques with which we can explore the data in its raw and unstructured form and progressively learn patterns from these representations.
This is not an extensive list of papers/books related to MIR, but one of my favorites papers in the area. A more detailed version can be found here: awesome-deep-learning-music
Disclaimer: I’m personally interested in Music Genre classification as you may notice
|2012||An Introduction to Audio Content Analysis||Alexander Lerch|
|2002||Musical Genre Classification of Audio Signals||George Tzanetakis|
|2016||WaveNet: A Generative Model for Raw Audio||Google Deep Mind|
|2017||Music Genre Classification with Paralleling Recurrent Convolutional Neural Network||Lin Feng|
|2017||A Neural Parametric Singing Synthesizer||Merlijn Blaauw|