Are We at the Dawn of A&R Data Wars?

Since the very beginning of music data analytics, a model that would accurately predict the trajectory of a music career has been something of a holy grail for music data professionals. That feat has yet to be achieved — that is, if such a model can be built at all. The future of an artist’s career is often determined by the self-fulfilling prophecy of success (i.e., the budget and the team) rather than their performance before the signing. Success is never predetermined — it’s a result of the right team working with the right artist. 

But even though the success can’t be predicted based on pre-existing data, A&R still became the single most data-driven position in the industry over the past years. The sheer volume of music produced means that every A&R needs some automated tool to filter through the vast ocean of newly released music. Whether it’s AI-driven solutions like Instrumental or Warner-owned Sodatone, conventional analytics dashboards like Soundcharts or Chartmetric, or in-house data analytics, every modern A&R department is grounded in analytics. That’s just how the game works in 2021.

However, the real question is: how much of a competitive advantage is data analytics to an A&R if the entire industry is ultimately looking at the same open-API-sourced dataset, processed by slightly different algorithms? 

Of course, different labels and distributors have different criteria for signing new talent, and A&Rs have to rely on their judgment rather than blindly following the data algorithms. However, the fact still stands — the best way to ensure that your company is signing the best acts available is to make it so that the artist appears on your A&R (data-)radar before it does on anyone else’s. The late 2010s were about all the key players racing to set up their analytic pipelines. The 2020s are going to be about fueling these pipelines with data that no one else has. 

Today, most A&R data development strategies revolve around getting access to the platform consumption datasets on top of public-API data to get a better view of artists’ performance. Throughout the first half of 2021, we can already find several notable data partnerships aiming to source the A&R departments with exclusive data. In January, Distrokid (home to some 2 million DIY artists) launched its “Upstream” feature, essentially offering unique artist consumption data to selected label partners in exchange for a signing fee.

Just a few weeks later, Universal announced its partnership deal with TikTok that reportedly goes far beyond previous music licensing agreements, with companies aiming to “collaborate on new initiatives involving digital marketing, A&R, user data and more.” And the way I see it, the chances are those are the first shots of the upcoming (or ongoing?) A&R data-wars. But who’s going to be the early leader in that race? And how will this unique data be sourced? 

Essentially, all existing artist data can be folded into three simple categories based on its source:

  1. Open-API & Crawled Data: This type of data is available to anyone with a Spotify API key and a python script. This category includes only high-level metrics such as following across platforms, Spotify monthly listeners, YouTube video views, etc.
  1. Streaming Consumption Data: This type of data is much more gated. It represents the sales reports provided by the streaming platforms to distributors (potentially also shared up the chain to the labels and artist’s teams). Different streaming platforms have different data-sharing policies in place, but generally speaking, this category includes more granular data such as stream counts, stream sources (i.e., playlists vs. library), listener location, etc.
  1. First-party Analytics: This type of data is shared by different streaming and social platforms directly with the artists and their team via first-party analytics — such as Spotify for Artists and YouTube Creator Studio. This category includes the most detailed data, such as the audience’s demographics, consumption engagement metrics (e.g., average view duration for music videos), and much, much more.
Structure of Artist Data Sources

That is if we’re talking about existing music data. Potentially, we can find more data sources outside of current music data pipelines. For example, imagine using text recognition and machine learning to be able to gather and crunch data on music-related memes. Or tracking public conversation on platforms like Reddit using natural language processing algorithms to analyze artist’s digital WOM sentiment. Furthermore, we might see existing signing databases such as ROSTR expand into the A&R space to offer more context on what happens behind the scenes of the artists’ careers. Today, though, most of these applications probably won’t be cost-effective, simply because there are easier wins out there. But as the competition for unique artist data intensifies, we might see some completely new data-mining initiatives spring up to source the A&R research teams.

Or, maybe, we’ll live to see a solution that would connect unsigned artists and labels directly. As an artist, you would log into the platform and connect your first-party analytic dashboards (from Spotify and Apple Music for Artists to YouTube Creator Studio and FB Analytics) to share the entire wealth of data on your career directly with A&R departments at labels and distributors. “No guarantees and no promises — but if your music is making waves, we’ll make sure industry people know about it” kind of thing.

Now, though, the most apparent data-sourcing solution would be to approach the subset of the industry hosting most of the unsigned artists out there: open distribution platforms of CDBaby, TuneCore, Distrokid, and alike. To talk about TuneCore — if there’s a single music company that had a bit of a headstart on the rest of the industry in this race for exclusive data, it’s Believe. The fact that Believe owns TuneCore (along with the data on 250k+ artists distributed through the platform) probably won’t come as news to anyone. But what’s interesting is that Believe has been sourcing its A&R with TuneCore data for years now, essentially getting a chance to polish out their data practices while the rest of the market was still figuring out data access. 

From that perspective, Believe’s data practices are a point of interest for the rest of the industry. Luckily for those of you who’d like to take a peek under the hood, there’s an article over at Believe’s blog (written by Music Tomorrow’s own Julie Knibbe) showcasing some of Believe’s A&Rs data habits. 

To conclude, for now, it’s still too early to tell how the A&R data game will develop in the next ten years or so. But what we can say for sure is that the artist data — especially the data that no one else has — is a thing of value. And that value is only going to go up as we move forward into the new decade. 

Written by Dmitry Pastukhov for Music Tomorrow