Spotify’s Magic Ingredient: Machine Learning

Source: IFPI, “Global Music Report 2018: Annual State of the Industry”,
Data flow on how these playlists are made

Machine Learning

Collaborative Filtering

  1. A matrix of all the active users and songs is created.
  2. A series of complex factorization formulae on the matrix is run.
  3. Two separate vectors are created. X is the user vector representing one user’s taste and Y, the song vector representing the profile of a single song.
  4. A given user vector with every user vector outputs a similar user vector. These are compared to find out users with similar tastes. The same process is utilized for the song vectors. So for me to get my recommendations, my user vector is compared against the other 286 million user vectors, grouping these users then choosing a track that one of the people in these groupings has heard but I don’t.

Natural Language Processing for Content-Based Filtering

Audio features with Convolutional Neural Networks

CNN architecture for waveform(songs)
Plot of the output of the network

Bayesian Additive Regression Trees

  • >30 seconds streams = your interest. When you listen to a song more for more than 30 seconds, it considers it your interest. This avoids the confusion for the algorithm for cases like this:
  • Based on your interaction data collected, they retrain the model once a day.

The Homepage


  • Exploration — based on unexpected user engagement is used as a research tool to learn more about how users react with suggested content.
  • Exploitation — providing recommendations in the app that are based on previous music or podcast selections.

How do they handle big data running non-stop in real-time?

Spotify handles so much data and they need an environment where they can run non-stop, in real-time while dealing with large amounts of data.

  • TensorFlow Estimator — Building Machine Learning pipelines where they can try a variety of models while training them quickly. Like logistic regression, boosted tree and deep models in a much more iterative process.
  • TensorFlow Data ValidationQuickly finding inconsistencies in their pipeline. Finding bugs in the data pipeline while developing and evaluating and rolling them out.

Spotify is at the forefront of innovation for music discovery. Not only for the listeners but for artists too. They’re taking part in giving recognition to underrated artists and diversifying the world’s music taste.

Main takeaways!

  • Collaborative Filtering
  • Natural Language Processing/NLPs for content-based filtering
  • Convolutional Neural Networks/CNNs for identifying audio features
  • Bayesian Additive Regression Trees/BaRT to identify sequential weak learners.




student 🌐 innovator @ tks | based in jakarta,indonesia

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

NLP Pedagogy Interview: Jason Eisner (Johns Hopkins University)

Student Mind State Prediction- Confused or Not — Part2

How to Implement Asynchronous Advantage Actor-Critic (A3C) Algorithm in Tensorflow and Keras

A different take on Bayes Rule

Play Street Fighter with body movements using Arduino and Tensorflow.js

Yelp dataset challenge: Generating strategy tips for businesses on Yelp

A Beginner’s Guide to Convolutional Neural Networks

Home Default Credit Dataset

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Liesl Anggijono

Liesl Anggijono

student 🌐 innovator @ tks | based in jakarta,indonesia

More from Medium

Re-tell a Paper: “Deep Learning for 3D Building Reconstruction: a Review”

CPM Bargainer for Programmatic Advertising


2022 Gacha and Event History: Illusion Connect