Spotify’s Magic Ingredient: Machine Learning

Liesl Anggijono
10 min readDec 30, 2020


Disclaimer: This article is not affiliated with Spotify and may consist of unconfirmed information about Spotify and past structures that may not be relevant now.

Music is a form of expression, an art form, and a cultural activity as a whole. Music is something that accompanies the human race in our life. Whether it’s listening to a song with your earphones on the way to school or when you’re greeted by the sounds of the gamelan in a resort in Bali, Indonesia. Yet, have you wondered how many songs exist right now? There’s too much to count on how many there are. But due to the astronomical amounts of music being released every day, listeners are naturally compelled to discover what music they like. Song discovery has been aided by DJs, friend recommendations, radios, and many other channels. But usually, these songs are discovered just by luck, by manual curation.

Music distribution has evolved through the centuries from sheet music through vinyl, tape, CDs, downloads, and music streaming. Yet as we see, we can see exponential growth in the streaming industry throughout the latest years. Launched in 2008, Spotify is the world’s largest music streaming service with 286 million monthly users.

Source: IFPI, “Global Music Report 2018: Annual State of the Industry”,

But why is Spotify striving to compare to other streaming platforms?

Spotify personalizes the music discovery experience to the user.

Discovering and enjoying all different types of music in the pages, discovering music through your favorite influencer’s playlists, or even stalking the person you admire’s playlist 👀 . Most importantly, Spotify takes into account each listener’s music taste. From the songs you’ve listened to in the past, how frequently you play a certain genre, how long you play the song and so many more variables taken into account. The combination of these variables will result in the homepage of your Spotify and one of the most-coveted features of Spotify: these personalized playlists. These playlists are machine-generated personalized playlists of songs you’ve never heard before, customized to your tastes based on listening history data, etc.

Discover Weekly & Release Radar

Discover Weekly is a playlist updated every Monday uniquely curated to each listener’s activity by a machine learning algorithm. These songs are songs that you have never heard before, a new playlist that fits in with the listener’s existing song preferences. On the other hand, we also get a Release Radar Playlist. Similar to the Discover Weekly Playlist, it’s a machine-generated playlist that is updated every Friday with new releases from the artists that you follow.

Data flow on how these playlists are made

These personalized playlists account for 31% of all the listening on the platform. And we as Spotify users LOVE the recommendation models so much. Why? It’s because it feels like someone who’s been watching us 24/7 created a playlist for us and gifts it to us every week. But the bugging question is how does Spotify this level of detail?

Machine Learning

What is that? Machine Learning is an application of Artificial Intelligence that provides machines the ability to automatically learn and improve from experience without being programmed explicitly. These Machine Learning algorithms use statistics to find patterns in large amounts of data. Data, here encompasses literally anything from numbers, words, images, clicks,etc.

These are the known models of these recommendation engines used by Spotify:

Collaborative Filtering

Collaborative Filtering or CF comes from this elemental concept that people who listen to similar music, have similar tastes. Collaborative Filtering is a technique used to make automated predictions about the preferences of users, based on the listening behaviors of other users with similar tastes. We can take advantage of this fact to recommend songs you’ve never heard before. Example User A has songs A, B, C, D, E, F in their playlist. User B has songs A,B,C,D,X,Y in the playlist. The algorithm will then take advantage of this fact and recommend songs X&Y to User A and songs E&F to User B because User A & B most likely has a similar taste in music.

But with millions of songs stored in Spotify, they implement this intuition using matrix math — yes the math you learn in linear algebra. Here’s an oversimplified process on how these work.

  1. A matrix of all the active users and songs is created.
  2. A series of complex factorization formulae on the matrix is run.
  3. Two separate vectors are created. X is the user vector representing one user’s taste and Y, the song vector representing the profile of a single song.
  4. A given user vector with every user vector outputs a similar user vector. These are compared to find out users with similar tastes. The same process is utilized for the song vectors. So for me to get my recommendations, my user vector is compared against the other 286 million user vectors, grouping these users then choosing a track that one of the people in these groupings has heard but I don’t.

But the drawback of this model is that it doesn’t look into WHY these songs are recommended and it just associates the patterns. This drawback is especially a huge disadvantage for smaller and upcoming artists, making their work more difficult to get suggested. That is why Spotify diversifies the way of approaches they use for the recommendation model.

Natural Language Processing for Content-Based Filtering

First of all — what is Natural Language Processing/NLPs? NLP is a field of Artificial Intelligence that gives machines the ability of reading, understanding, and deriving meaning from the human language. These algorithms will understand human speech and text in real-time. A relevant example of a product that uses NLPs are your regular voice assistants like Siri and Alexa just to name a few.

Using this model we can cover up the drawbacks of the Collaborative Filtering Method through understanding the descriptors of a song. The NLP crawls the internet to find data related to the song/artist on articles, blog posts, online reviews, tweets, etc. Words (adjectives, nouns, etc.) are analyzed to associate with the music.

Audio features with Convolutional Neural Networks

Ever feel mad when you discover a very talented artist with very impressionable and crazy awesome music but they get so little recognition? Well, this section specifically doesn’t disadvantage less popular and underrated artists, unlike Collaborative Filtering. Convolutional Neural Networks(CNNs) are a class of deep neural networks that are commonly used for analyzing visuals. They’re used to identify images by feeding data, pixel by pixel to train the model through layers, classifying different objects. But how is this model usually used for images used in songs?

CNN architecture for waveform(songs)

The song is turned into a waveform that goes through the convolution layers, downsampling the time-frequency representation. After it goes through the network, we will have the output of audio features such as the time signature, loudness, key, mode, and tempo. These features are the building blocks of a song’s profile. These song profiles are put in the Spotify database to identify the characteristics of these songs and grouping them too.

Plot of the output of the network

Bayesian Additive Regression Trees

BaRT or Bayesian Additive Regression Trees are methods where they calculate the contribution of sequential weak learners. BaRT is used to predict what to put on your Spotify homepages from recommendations related to your recent history.

  • >30 seconds streams = your interest. When you listen to a song more for more than 30 seconds, it considers it your interest. This avoids the confusion for the algorithm for cases like this:

When you play a song you’ve never heard before and you clicked on it just to know what it sounds like, and you eventually realize 15 seconds through the song that it’s not your taste and you don’t like it so then you skip the song. Or when you accidentally click the wrong song. This allows the algorithm to not consider these clicks as models for your recommended songs and uses these data to know which songs you particularly enjoy less.

  • Based on your interaction data collected, they retrain the model once a day.

The Homepage

The home screen consists of a series of cards and shelves. A card is a square image that represents a playlist/podcast episode/artist page/ album. Shelves are the rows we use to group a series of cards. Think of it as how a bookcase (Spotify Home), uses bookshelves (shelves) to hold and display books (cards). Every user’s bookcase is curated by their unique interests and history of books(songs) read (in this case, listened to).


The algorithms are working in real-time. Whenever you’re refreshing your homepage, all the data is gathered to display the best choices for you. By doing it in real-time based on your choices in music, willingness to accept the recommendation, how long you play the tracks.

The machine-learning algorithm uses a multi-armed bandit framework. Which essentially tries to balance exploration and exploitation.

  • Exploration — based on unexpected user engagement is used as a research tool to learn more about how users react with suggested content.
  • Exploitation — providing recommendations in the app that are based on previous music or podcast selections.

With this method, the machine is constantly trying to learn which cards on the shelves are good for you and score them while trying out new cards and shelves because maybe they could be your new hidden gems. This allows Spotify to deliver our all-time favorite tracks while still recommending fresh and never-heard-before music. They have employed counter-factual training and long propensity with some randomization to chain these systems to avoid large-scale randomization.

How do they handle big data running non-stop in real-time?

Spotify handles so much data and they need an environment where they can run non-stop, in real-time while dealing with large amounts of data.

Before their latest system migration, Spotify wrote a lot of custom data libraries and APIs to drive the machine algorithm behind this personalization effort. This way, they always had to go back and rewrite code and compare different choices of the model. When investigating different models such as logistic regression vs boosted trees vs deep neural nets. And that encompasses a lot of custom code rewriting. Making the process for innovation lagged down. So Spotify decided to move to a TensorFlow Ecosystem where they have access to various tools and libraries. Using techniques like TensorFlow Estimators, TensorFlow Data Validation to avoid doing all of the custom work.

  • TensorFlow Estimator — Building Machine Learning pipelines where they can try a variety of models while training them quickly. Like logistic regression, boosted tree and deep models in a much more iterative process.
  • TensorFlow Data ValidationQuickly finding inconsistencies in their pipeline. Finding bugs in the data pipeline while developing and evaluating and rolling them out.

And additionally, their migration to Kubeflow — an open-source machine learning platform designed to orchestrate complicated workflows, accelerates experimentation by managing the workload to significantly speed up the training for their machine learning algorithms.

Spotify also uses the Google Could Platform, the industry-leading data management platform which helps them tackle challenges with data management, hybrid & multi-cloud, and AI/Machine Learning.

Spotify is at the forefront of innovation for music discovery. Not only for the listeners but for artists too. They’re taking part in giving recognition to underrated artists and diversifying the world’s music taste.

Main takeaways!

-Spotify is changing how the world discovers music aided with power technology — Artificial Intelligence

-They use a diversity of machine learning models to recommend songs, such as

  • Collaborative Filtering
  • Natural Language Processing/NLPs for content-based filtering
  • Convolutional Neural Networks/CNNs for identifying audio features
  • Bayesian Additive Regression Trees/BaRT to identify sequential weak learners.

- Their ‘ Paved Road Machine Learning Infrastructure ’ where they utilize Google Cloud, Kubeflow, and Tensorflow provide an environment where they can run non-stop, in real-time while dealing with large amounts of data.

Congrats you made it to the end! Thank you so much for reading until the end. If you enjoyed reading my article, don’t forget to give it 50 claps. Or else… just kidding give it the number of claps it deserves ❤

Connect with me on Linkedin, Instagram, Medium, or if you’d to have a chat with me book a meeting with me here⚡️

Sources :'s strategy has consistently focused on machine learning.&text=While collaborative filtering and NLP,with very little user awareness.

Special shoutout to Spotify < 3 ur literally the love of my life and you’ve helped me go through 6 years of my life (until 4ever I hope)