01What happened

The story, straight

The Atlantic's Alex Reisner has created a fully searchable public database cataloging four datasets of music used to train AI models. Two of the sets are massive — 12 million and 9 million tracks respectively — while the other two each contain over 100,000 songs. Google and Stability AI have both confirmed use of the datasets in published research papers, and the sets have been downloaded thousands of times. Some of the sourced music, including tracks from the Free Music Archive, is licensed for personal streaming but not for AI training use.

atlantic reporter alex reisner just built a searchable database of four massive music datasets used to train AI models. the two biggest hold 12 million and 9 million tracks each; the smaller ones still break 100K songs apiece. google and Stability AI have both confirmed using them in research papers, and the datasets have been downloaded thousands of times. some of the music — like stuff from the Free Music Archive — is only licensed for personal streaming, not for AI training.

02Spread timeline

Where it actually started

Jun 20, 2026Origin
The Verge reports on The Atlantic's Alex Reisner publishing a searchable public database of four AI music training datasets totaling over 21 million tracks.the verge reports the atlantic's reisner went public with a searchable database of four AI music training datasets — over 21M tracks
source

03Source receipts

Every claim, linked

04What's solid, what isn't

What's solid and what isn't

Confirmed
  • The Atlantic's Alex Reisner published a searchable public database of four AI music training datasets.
  • Two datasets contain 12 million and 9 million tracks respectively; two others contain over 100,000 songs each.
  • Google and Stability AI have confirmed using the datasets in published research papers.
  • The datasets have been downloaded thousands of times.
Disputed
  • The exact licensing status of all tracks across the four datasets.
  • Whether specific artists' work was included without authorization.
Developing
  • Whether music rights holders will pursue legal action based on the database findings.
  • How Google and Stability AI will respond to the increased transparency around their training data sources.

05Why it matters

The editorial take

This is the first time anyone has made AI music training data this transparent and searchable. It gives artists and rights holders a concrete tool to check whether their work was used without permission, and it puts pressure on companies like Google and Stability AI to address how they sourced their training data. The database follows Reisner's earlier work exposing training datasets for image and text AI models.

artists have been screaming about their music getting scraped for AI training with zero transparency. this database is the first real receipt — searchable, public, 21 million tracks worth. now anyone can check if their work ended up in a training set google or stability used. the pressure just got concrete.