Since its launch over fifteen years ago, Spotify has succeeded in disrupting the entire music industry by providing an easily accessible freemium streaming service. Today, Spotify is recognized as the world’s biggest music streaming platform and for its commitment to user recommendations and personalized experiences.
Few of Spotify’s successes would’ve been possible without a reliable tech stack and what the company calls Spotify engineering. These are the development processes, coding languages and applications that support Spotify’s platform, and they’ve changed a lot as the company has grown. We’ve outlined what Spotify’s tech stack looks like at the moment, alongside some of the key ways Spotify engineering has pushed the music and wider tech industries forward.
An Overview of the Spotify Tech Stack
Spotify’s backend operates off of microservices architecture, which couples different services into a single application and makes it easier for them to manage a sprawling tech stack. Spotify engineering is mainly written in Java, and they’ve been known to rely on some of Java’s frameworks in the past, particularly its Spring Framework. This approach has influenced many software development services in the streaming sector. Due to its complexity, Spotify also relies on another programming language, Scala, as well as Node.js.
Perhaps the most important part of Spotify’s backend is its reliance on Apache Kafka. Spotify uses Apache Kafka to stream music to millions of listeners. Kafka allows the system to process data and handle events in real time, which is how they can provide a more seamless listening experience. They then also use Apache Cassandra for databasing, which we’ll outline in more detail below.
On the front end, Spotify’s web application is built using React, and it relies on both Redux and Sass. They were originally hosted by Amazon Web Services (AWS) but switched to Google Cloud as their cloud native computing foundation roughly a decade ago. Spotify engineering is also known to rely on Kubernetes, which is how they manage their various microservices. Kubernetes also helps them with containerization, which is a core part of Spotify’s tech stack approach.
Spotify’s Migrations: The Shift from PostgreSQL to Cassandra
Spotify has grown considerably since it was founded, particularly since its US launch in 2011. This has meant that they’ve performed several large-scale migrations, notably to tackle database scaling. Large sets of diverse data present several challenges for Spotify in terms of how all of the data they manage can be stored, collected and maintained. The most significant change to tackle this issue was their main Spotify database migration from PostgreSQL to Cassandra in 2015.
This migration was necessary due in part to a freak accident. Spotify’s main data cable, which linked its London and Ashburn data centers and was threaded deep in the Atlantic Ocean, broke. Some believe this was a result of a shark attack. The event forced Spotify to come to terms with the fact that it needed to invest in a substantial database migration to support its continued growth.
Cassandra proved to be a superior solution because it could scale with the company, which at this point supported 35 million active users. Spotify’s vast team of engineers worked to complete this migration without any significant downtime to their existing database through a process known as dark loading.
Migrations have historically been a key part of how Spotify’s engineers operate. So much so that they’ve come to define a distinct process and methodology for how they perform tech migrations with as few disruptions as possible. This involves prioritization, “productifying” the approach, and automating as many processes as possible. Since 2020, they’ve been moving tools higher up in their stack, so they haven’t had to rely on as many lengthy tech migrations.
How Spotify Continued to Scale and Modernize
Since moving from PostgreSQL to Cassandra, Spotify has continued to scale its systems. The most significant of these scaling developments happened in 2023 when they seamlessly switched the entire build system of their iOS app over to the open-source software Bazel. This migration involved over 120 teams and now means their app can better scale with new users.
Prior to this, they also overhauled their desktop application. This happened in 2021 and brought the Spotify desktop app more in line with its web application. This involved consolidating both departments and creating a unified codebase between them.
Spotify’s engineers benefitted from the company’s containerization approach by designing a single UI with multiple containers. The end result meant Spotify’s web and desktop apps were better unified, with code that’s now easier to reuse. It also meant their applications worked faster than before, making both web and desktop options more satisfying to use.
Spotify has also continued to evolve by embracing more ML and generative AI. In 2024, they built a new system for generating annotations across their millions of songs, videos and podcasts. This helped them create a more advanced data collection process across all the content on the platform, and will be used to further train and develop new machine learning models.
How Spotify Discovery and Recommendations Work
Spotify’s most widely discussed capability is how the service can adapt to each individual user. Every Spotify profile is given multiple algorithmically generated recommendations, like playlists and its AI DJ feature, to create a personalized listening experience. This is made possible due to Spotify’s insight teams and analysts, but also its reliance on event-driven architecture.
Spotify’s Use of Event-driven Architecture
Event-driven architecture allows systems to adapt to specific events; by using Kafka, Spotify is able to react to these events in real time. In Spotify’s case, events include when a user listens to a song, when they skip songs, when they create a new playlist, when they like a song and many other behaviors. They use this search technology to power recommendations, but this is just the first stage in a lengthy process.
Spotify can collect and store this information within its Cassandra database. This allows them to track how each user interacts with the Spotify platform. All of this information is then combined with metadata related to various artists and music genres. Today, Spotify’s metadata can become even more granular, identifying the different instruments used in a track or the mood it’s trying to portray.
Audio Analysis for Spotify Music Recommendations
Spotify’s audio analysis system is then able to analyze the raw audio signals of a chosen track, and break it down into 12 different metrics that refer to its sonic characteristics. They also use Natural Language Processing (NLP) models to analyze lyrics, playlist titles and web data.
From all of this metadata collected through Kafka and machine learning, the Spotify model is able to build each user a unique Taste Profile, which is stored on their Cassandra database. This includes data points created based on listening information, like your favorite artists or how often you listen to music on the Spotify platform. Spotify then feeds this information into its algorithms to create its Discovery feature.
These algorithms all differ slightly depending on the recommendation. Discover Weekly will pair this information with songs that have been released very recently, whereas its Daily Mixes will group several of your favorite artists in a similar genre together. Its AI-powered DJ tool combines all of this information and creates a never-ending playlist based on different data points. This highly complex Spotify recommendation system is what the company markets as “the magic behind the music” and is designed to help users find the music they like and to keep them listening.
The Technology Behind Spotify Wrapped
Spotify engineering also powers the company’s most well-known marketing strategy and many users’ favorite feature: Spotify Wrapped. Spotify Wrapped collects every user’s listening data across the year and gives them an animated rundown of their most listened-to songs, favorite artists, and the main mood of music they gravitated towards that year.
Spotify Wrapped Optimization in 2020
Spotify has generated Spotify Wrapped since 2016, and the technology behind the feature has evolved considerably in the near-decade since its first iteration. The company has described Spotify Wrapped as the largest dataflow job they have to handle each year, and in 2020, they used the results of the previous year (2019) to further optimize the process.
This was achieved through a technique called Sort Merge Bucket (SMB). Data processing at Spotify is written in Scio and processed through Apache Beam, which is then modularized even further into various components. Spotify used SMB formats in a detailed process that ultimately allowed them to sort their vast data sets more effectively, thanks to greater data partitioning, sharding and parallelism. Spotify Wrapped 2020 was, therefore, more extensive than previous years but was delivered more cost-effectively by the company. They’ve been able to take this learning forward into future Spotify Wrapped reveals, further optimizing the process.
How Spotify’s Engineers Support Wrapped Visuals
It isn’t just the data behind Spotify Wrapped that relies on Spotify engineering. Developers are also often behind how this information is delivered to users. Each annual Spotify Wrapped will usually involve implementing new functionalities in how data is presented to users.
Alongside animators, developers also worked on the visual landscapes for Spotify Wrapped 2023. Developers introduced a new file system to the animation process, Lottie. By using Lottie, animators could create animations that worked seamlessly across Spotify’s many different platforms, meaning Spotify Wrapped 2023 was better optimized for web users.
In the interest of personalization, Lottie animations were then combined with native animations unique to each user. This meant Spotify could create a Spotify Wrapped with introduction animations that were the same for everyone but that could then turn into a unique animation for every individual, all in a more cost-effective way. These cost savings meant Spotify could spend a larger budget on paid marketing campaigns across various digital ad platforms to expand Wrapped’s reach.
What Can We Learn from Spotify Engineering?
Developers can learn a lot from Spotify’s engineers, particularly about large-scale migrations and how to handle large data sets effectively. Spotify has grown reliably throughout its existence thanks to its tech stack and the continuous work of its development teams.
We can also learn that developers aren’t just responsible for the back end of a platform. They often determine how some of the core features of a solution work, like Spotify Discovery and Spotify Unwrapped. So remember the next time you see drivers jamming to music on their way to work, or you start streaming a personalized playlist late at night, that Spotify’s talented developers and their tech sprawling tech ecosystem made it possible.