Spotify’s Magic Ingredient: Machine Learning

Source: IFPI, “Global Music Report 2018: Annual State of the Industry”,
Data flow on how these playlists are made

Machine Learning

Collaborative Filtering

  1. A matrix of all the active users and songs is created.
  2. A series of complex factorization formulae on the matrix is run.
  3. Two separate vectors are created. X is the user vector representing one user’s taste and Y, the song vector representing the profile of a single song.
  4. A given user vector with every user vector outputs a similar user vector. These are compared to find out users with similar tastes. The same process is utilized for the song vectors. So for me to get my recommendations, my user vector is compared against the other 286 million user vectors, grouping these users then choosing a track that one of the people in these groupings has heard but I don’t.

Natural Language Processing for Content-Based Filtering

Audio features with Convolutional Neural Networks

CNN architecture for waveform(songs)
Plot of the output of the network

Bayesian Additive Regression Trees

  • >30 seconds streams = your interest. When you listen to a song more for more than 30 seconds, it considers it your interest. This avoids the confusion for the algorithm for cases like this:
  • Based on your interaction data collected, they retrain the model once a day.

The Homepage


  • Exploration — based on unexpected user engagement is used as a research tool to learn more about how users react with suggested content.
  • Exploitation — providing recommendations in the app that are based on previous music or podcast selections.

How do they handle big data running non-stop in real-time?

Spotify handles so much data and they need an environment where they can run non-stop, in real-time while dealing with large amounts of data.

  • TensorFlow Estimator — Building Machine Learning pipelines where they can try a variety of models while training them quickly. Like logistic regression, boosted tree and deep models in a much more iterative process.
  • TensorFlow Data ValidationQuickly finding inconsistencies in their pipeline. Finding bugs in the data pipeline while developing and evaluating and rolling them out.

Spotify is at the forefront of innovation for music discovery. Not only for the listeners but for artists too. They’re taking part in giving recognition to underrated artists and diversifying the world’s music taste.

Main takeaways!

  • Collaborative Filtering
  • Natural Language Processing/NLPs for content-based filtering
  • Convolutional Neural Networks/CNNs for identifying audio features
  • Bayesian Additive Regression Trees/BaRT to identify sequential weak learners.




student 🌐 innovator @ tks | based in jakarta,indonesia

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Brief Intro of Medical Image Analysis and Deep Learning

Building the future of knowledge sharing — A closer look at Lunyr’s advertising system

Visual Diagnostics for More Informed Machine Learning

Permutation based feature importance for clustering

Navigating the Broader Impacts of Machine Learning Research

How Machine Learning Can Transform The Energy Industry

Neural Networks can Change the World

Which is Better For Your Machine Learning Task, OpenCV or TensorFlow?

tensorflow vs opencv

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Liesl Anggijono

Liesl Anggijono

student 🌐 innovator @ tks | based in jakarta,indonesia

More from Medium

[Women in Science] Marina Costantini, PhD student in Communication Systems, EURECOM

On Intuition: The Neuroscience of Affect and the Economics of Artificial Intelligence

Three reasons why Watson-like AI might fail.

Ethical AI: What it is and why you should care?