I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?
> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.
The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI piracy-maxxing.
You go where the money is. Infra isn’t free. Churches pass the plate every Sunday. Perhaps one day we’ll exist in a more optimal socioeconomic system; until then, you do what you have to do to accomplish your goals (in this context, archivists and digital preservation).
That made me chuckle, Enterprise Level Access. I mean as ai company, that’s incredibly cheap and instead of torrenting something, why get it. That price is just a fraction of a engineers salary.
> The Anna’s archive group is ideologically motivated.
Very interesting, thank you. So using this for AI will just be a side effect.
And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.
I think the people seeding these are also ideologs and so would be interested in also supporting the obscure stuff, maybe more than the popular. There is no way any casual listeners would go to the quite substantial trouble of using these archives.
Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.
most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
Even some of the largest artists in the world only receive a few grand a year from streaming. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.
Yes they do use DRM. I know they are using Widevine on the web player, but possibly other ones too (never looked very far). Not sure for the app, it might be that it is using OGG streams with a custom DRM (which is probably the one some existing downloaders actually (ab)use).
I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.
It's probably going to make the AI music generation problem worse anyway...
DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.
This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.
>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.
They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.
Hmmm I don’t like this. There are sources for music with better quality out there and all this will do is paint them a bigger target for takedowns/prosecution. I am worried about losing their ebook library. They should have done this as a separate identity.
To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.
I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.
Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.
Yeah, What.CD had a bunch of the local Brisbane post-rock bands from the 00s on there which was amazing to me. I at least have copies of a lot of their records!
You can’t talk about what.cd without talking about its precursor OiNks Pink Palace. Even Trent Reznor was public about what an amazing place it was. Music aside, the community existing just for the shared love of music and not for any other kind of monetary or influencer gain is what set it apart. We just don’t have those kinds of communities for music online anymore
>We just don’t have those kinds of communities for music online anymore
They're still kind of around, but yeah, everything is very much on it's way out in the music scene, at least in terms of that late 90s early 00s culture. Or has been until recently. There is a renewed interest in self-hosting and "offline" style music collections.
It sucks too. The way folks discover music is important. The convenience of streaming has lead to some interesting outcomes. When self-hosting music comes up this is always one of the top questions people have: How do you find new music?
The answer isn't that hard and really hasn't changed much. People just don't want to spend any time or effort doing it. Music stores still exist, they're amazing. Lots of 2nd hand stores carry vinyl and CDs now, which can give you great ideas for new music. There are self-hosted AI solutions and tools. Last.fm and Scrobbling are still very much around. My scrobble history is so insanely useful. There are music discords. Friends. Asking people what they're listening to in public. Live shows with unique openers(I once went to a Ben Kweller show with 4 opening bands, I still listen to 3 of them.)
True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.
This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.
That's why I use YouTube Music as my streamer as they allow damned near anyone to upload any old rare record and then figure out the royalties somehow.
Redacted.sh is a worthy successor, but the average person just doesn’t care about “which release is best” anymore. I use YT Music as a backup but Redacted is my main source of music these days.
That being sad, I have a lot of non-mainstream tracks in my playlists on YouTube Music that have YouTube comments along the line of “I wish this was available on Spotify :’(“. I bet the same goes for What.CD.
So there’s some way to go for a comprehensive music archive.
This is something really important, especially in the days when music and film vanishes from platforms one by one. I myself have three playlists with greyed out titles (titles are missing so there's no possibility for me to find out what was there).
That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day
There are tools that actually download directly from Spotify (needs premium then) but yeah most of them just use the search and download from other sources like YouTube without mentioning it. I won't say which tools download directly out of fear that they get killed but they exist.
I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.
I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.
There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.
Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?
Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.
This is one of the greatest news I've ever heard for the digital preservation community. Just so many projects over the years could have used resources like this. Thank you for contributing to humankind!
It seems to be that the metadata doesn't include the lyrics, probably because they are provided by Musixmatch. It would have been nice to have a database of lyrics linked to ISRCs. AFAIK Lrclib doesn't support downloading lyrics for a given ISRC.
I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.
Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.
Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.
Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point
Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.
There is a sweet spot for the bass. Lower is better for deep bass, but too low and it stops being a recognizable note, and consumer speakers can't reproduce it. This effect exists though I'm not sure if it is the cause of the pattern here.
C# I don’t believe was/is a common tuning for most western instruments, classical or modern.
A digital piano can transpose things to make it “easier” to play.
Cursory google search says that a sitar is traditionally tuned to something useful for c#
I’m curious if C# is one of those notes that lines up nicely with whatever crappy consumer stereos/subs were capable of reasonable reproducing in the 90s as electronic music was taking off and it stuck around as a tribal knowledge for getting more “oomph” out of your tracks.
Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.
YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.
This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.
Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.
Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis?
Monopoly is not a nice thing. Maybe it is convenient, but not nice.
People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).
Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.
I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.
Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?
But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.
I am in no way saying that this is cheap but 300 TB will set you back a little less than $6k with tax. Very attainable for people other than OpenAI and Facebook. And it's not crazy at all to snag a server with enough bays to house all those.
a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.
it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.
I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.
Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.
Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.
great. Spotify just removes things all the time (things I actively listen to and work on for my jazz practices, one day just go "poof" because they didn't want to pay the record company anymore), and they are not as a company deserving of the role of "keeper of all the world's music". They don't give a shit and they'd vastly prefer we all listen to their AI generated royalty free crap and Joe Rogan.
Am I understanding this wrong? Ripping the metadata I'm fine with. But it sounds like they've ripped every song from Spotify and they're going to release them?
Edit: It seems like they are. Stealing from tens of thousands of artists, big and small, and calling it "preservation" or "archiving" is scummy.
Music piracy is already a thing, not to mention you don't even need to torrent nowadays when music is available for free on YouTube. Those who don't want to pay already don't pay so nothing changes there.
The value of Spotify is the convenience, and this collection does not change that in any way. Your argument would apply if someone were to make a Spotify clone with the same UX using this data.
At least pirates provide some value from curation usually. In this case the leak is just all of Spotify. It makes it really easy for a competitor to just duplicate the Spotify service without paying licensing fees. Tbd what happens.
Because it's not stealing. Stealing is a problem because it deprives the original owner of the item - whether the thief subsequently uses the item or not doesn't change that.
This doesn't apply to dematerialized content: the original copy still exists. The only negative impact occurs if someone decides to actually use the pirated copy in place of buying a licensed one.
The mere existence of this new pirate copy being around doesn't automatically imply that, especially if other, more convenient sources are available.
Okay, call it copyright infringement then if you want to be a stickler on definitions. It's still wrong and existing instances of it doesn't make it justifiable to do.
The people I know who go through the trouble of pirating and downloading vast libraries of music are all musicians themselves, or at the very least total music nerds. They don’t want to lose access to their stuff, plus if they ever need to import audio into a DAW, DRM is a no-go. They are the same people who spend large amounts of money on vinyls, and support smaller independent artists through concerts, merch and (back in the day) CDs.
It used to be more mixed, but today, piracy is often the only option to ”own” any media at all.
It's both. Musicians and music nerds buy CDs and LPs and tapes and Bandcamp files and they "pirate" music both because they care about ownership and quality and rare or substantially different editions of records that aren't available legally, and because they've seen the sausage factory from the inside and know that "stealing" $0.02 from an artist who's starving like them anyway isn't really that far up on the list of heinous crimes. Buy the shirt, download the album. No one cares.
Why is this stealing? You can already listen to everything that's on Spotify with a free account. You are free to also record the audio while it's playing. I suppose grabbing the actual file should't matter? Or is this about releasing? And robbing people of plays they would otherwise get through Spotify?
If you listen to something on Spotify with a free account the artists still get paid. This isn't a case where you're ripping off so mega-corp. You're ripping off thousands of artists from major label ones to tiny indies. Take the metadata and build something cool. Stealing the files and releasing them is something else entirely.
You can record what you play from Spotify and you are already free to play the record again and again and again without the artist being paid.
Most people do not because they find it less convenient than paying 20bucks a month or whatever is the current price in 2025 but that doesn't change the reality.
For most people the appeal of Spotify is not the music itself but the playlists that are shared thanks to its ubiquity. This is the reason other services struggle to make a dent even if they have better quality, UI and algos.
Spotify started by disrupting the market using pirated music by the way so you are pretty much endorsing and encouraging piracy when "paying" your favorite artists through Spotify.
> What’s actually scummy is Spotify paying artists $1 per 1000 streams.
My spotify wrapped says I listened for 50,000 minutes this year. Assuming 2 minutes per song, that's 25,000 streams. I paid them $110, aka $0.004/stream. Assuming I'm a typical user, they obviously could not afford to pay any more than that per stream.
I googled "spotify pay per listen" and the first result is a reddit comment saying "The average payout on Spotify is only $0.004 per stream." The google AI overview says "Spotify [..] pays artists a fraction of a cent, typically $0.003 to $0.005 per stream". So I'll assume it's something in that ballpark.
So it seems like Spotify's payouts are completely reasonable, given their pricing. Is my logic wrong somewhere?
That’s a fun math. I just checked mine: 96000 minutes. 2 minutes per song is way too generous as an assumption, for me everything seems to be > 3 minutes so ~20000 streams.
I’m paying for a family account (that’s around 250/year) and there are 5 people on it so my usage is 1/5th of that (50/year)
So that’s 0.0025€ per stream. I don’t think your assumption is unreasonable.
In most cases, they couldn't make that decision even if they wanted to. Only independent artists and those that are so large as to have enough sway (Niel Young for example) would be able to. The vast majority of artists you probably listen to don't actually own the rights to their own music.
So let the rights holders make the decision? They would never. Music rights exist for them to extract profit above all else. They don't care about preserving culture or legacy. Which is why it's important that somebody does.
While I wouldn't call this scummy I do agree with your sentiment. It is technically stealing and those copyrights should be respected.
Full disclosure, I am a career musician AND have been known to pirate material. That said, I think this is a valuable archive to build. There are a lot of recordings that will not endure without some kind of archiving. So while it's not a perfect solution, I do think it has an important role to play in preservation for future generations.
Perhaps it's best to have a light barrier to entry. Something like "Yes, you can listen to these records, but it should be in the spirit of requesting the material for review, and not just as a no-pay alternative to listening on Spotify." Give it just enough friction where people would rather pay the $12/month to use a streaming service.
Also, it's not like streaming services are a lucrative source of income for most artists. I expect the small amount of revenue lost to listeners of Anna's Archive are just (fractions of) a penny in the bucket of any income that a serious artist would stand to make.
It is technically not. Stealing means you have a thing, I steal it, now I have the thing and you do not. You can’t steal a copyright (aside from something like breaking into your stuff and stealing the proof that you hold the copyright), and then a song is downloaded the original copyright holder still have copy.
Calling piracy theft was MPAA/RIAA propaganda. Now people say that piracy is theft without ever even questioning it, so it was quite successful.
See my other comment. Identity theft is the bank being defrauded and passing the problem onto you. They are the victim, not you and it is their money that’s gone, not yours.
IP theft is more like espionage and possibly lost hypothetical revenue. Again, it isn’t larceny, burglary, etc. You still have the knowledge, it’s just that so does the perpetrator.
Moreover discussions of IP gets into whether it even makes sense to be able to patent algorithms which are at their core just mathematics. So before you can talk about stealing the quadratic formula you need to prove that the quadratic formula is something that can be property.
Can you post your social security number and other personal info here then? You will still have it afterwards!
Oh also, I don't see why I should ever pay for trains or movie tickets if there are seats available. I can just walk in! The event will happen anyway. Its not stealing.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
What the music/movie industry was claiming in court was not theft. There is no statute that identifies piracy as theft. They were claiming copyright violation and wanted to collect damages for lost revenue.
You are bringing up “identity theft” which is also not theft. If you post your PII here and I use it to open a credit card in your name and then spend a bunch of the money using that card on buying goods and services, you are not the victim. What I do in that case is defraud the bank. They are the ones who are the actual victim and in the ideal world they would be the ones working with the authorities to get their money back.
Of course they would rather not do that so they invented a crime called identity theft and convinced everyone that it is ok for them to make you the victim. They make your life hell since they can’t find the actual criminal while you spend thousands of dollars trying to prove that you don’t owe thousands of dollars. But in reality you were not any part of the fraud. It is on the bank to secure their system enough to prevent this. But they have big time lawyer money and you don’t so here you are.
Ageee with you, this release is obviously a scummy thing to do.
Same as if someone released every book on Kindle for free. There are rules. Project Gutenberg is great. They don't just steal every book they can.
Not to mention the organization is openly trying to profit from this data by selling it to big tech orgs for AI training! None of the artists consented to that, I am sure, to say nothing if Spotify's interests.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.
The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.
But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?
Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?
I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.
I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.
> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.
The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.
They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI piracy-maxxing.
Very interesting, thank you. So using this for AI will just be a side effect.
And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.
Anyone who wants to listen to unlimited free music from a vast catalog with a nice interface can use YouTube/Google Music. If they don't like the ads they can get an ad blocker. Downloading to your own machine works well too.
https://en.wikipedia.org/wiki/Useful_idiot
most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.
Even some of the largest artists in the world only receive a few grand a year from streaming. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.
Do they have DRM at all? Youtube and Pandora don't.
Their native clients use a weak hand-rolled DRM scheme (which is where the ogg vorbis files come from), whereas the web player uses Widevine with AAC.
https://www.youtube.com/channel/UCYOa-hi751OKY2zGJJv6V2A
https://www.youtube.com/watch?v=MSSxnv1_J2g (same thing, but on an official channel instead)
It's probably going to make the AI music generation problem worse anyway...
Challenge accepted…
This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.
Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.
[0] https://en.wikipedia.org/wiki/What.CD
Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.
One interesting way of discovering artists is finding an artist that I already like on a compilation CD, and then seeing what else is on the CD.
They're still kind of around, but yeah, everything is very much on it's way out in the music scene, at least in terms of that late 90s early 00s culture. Or has been until recently. There is a renewed interest in self-hosting and "offline" style music collections.
It sucks too. The way folks discover music is important. The convenience of streaming has lead to some interesting outcomes. When self-hosting music comes up this is always one of the top questions people have: How do you find new music?
The answer isn't that hard and really hasn't changed much. People just don't want to spend any time or effort doing it. Music stores still exist, they're amazing. Lots of 2nd hand stores carry vinyl and CDs now, which can give you great ideas for new music. There are self-hosted AI solutions and tools. Last.fm and Scrobbling are still very much around. My scrobble history is so insanely useful. There are music discords. Friends. Asking people what they're listening to in public. Live shows with unique openers(I once went to a Ben Kweller show with 4 opening bands, I still listen to 3 of them.)
So there’s some way to go for a comprehensive music archive.
That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day
> A while ago, we discovered a way to scrape Spotify at scale.
They wont and shouldn’t divulge the details, but I imagine that would be a fun read!
There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.
Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?
https://www.scribd.com/document/56651812/kreitz-spotify-kth1...
The data will be released in different stages on our Torrents page:
[X] Metadata (Dec 2025)
[ ] Music files (releasing in order of popularity)
[ ] Additional file metadata (torrent paths and checksums)
[ ] Album art
[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)
https://developer.spotify.com/documentation/web-api/referenc...
I bet you can whip up a super simple script with an LLM to do this!
https://notice.cuii.info/
"Their buisness model is based on copyright infringement"
Well, where to complain that Anna's Archive ain't a buisness?
Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point
For the major scale, there are 7 notes in the scale and only 5 black keys; you also need to skip ti, the 7th note.
For the minor scale ("C#m"), it's worse; only four of the five black keys are part of that scale.
And I would have thought that something intended to be played only on the black keys would be described as using a pentatonic scale anyway?
A digital piano can transpose things to make it “easier” to play.
Cursory google search says that a sitar is traditionally tuned to something useful for c#
I’m curious if C# is one of those notes that lines up nicely with whatever crappy consumer stereos/subs were capable of reasonable reproducing in the 90s as electronic music was taking off and it stuck around as a tribal knowledge for getting more “oomph” out of your tracks.
Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.
[1] https://everynoise.com/
People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).
But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.
Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)
a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.
it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.
I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.
A distributed ripping project to do that would be a fine thing.
Edit: It seems like they are. Stealing from tens of thousands of artists, big and small, and calling it "preservation" or "archiving" is scummy.
The value of Spotify is the convenience, and this collection does not change that in any way. Your argument would apply if someone were to make a Spotify clone with the same UX using this data.
This doesn't apply to dematerialized content: the original copy still exists. The only negative impact occurs if someone decides to actually use the pirated copy in place of buying a licensed one.
The mere existence of this new pirate copy being around doesn't automatically imply that, especially if other, more convenient sources are available.
https://www.youtube.com/watch?v=IeTybKL1pM4
It used to be more mixed, but today, piracy is often the only option to ”own” any media at all.
Most people do not because they find it less convenient than paying 20bucks a month or whatever is the current price in 2025 but that doesn't change the reality.
For most people the appeal of Spotify is not the music itself but the playlists that are shared thanks to its ubiquity. This is the reason other services struggle to make a dent even if they have better quality, UI and algos.
Spotify started by disrupting the market using pirated music by the way so you are pretty much endorsing and encouraging piracy when "paying" your favorite artists through Spotify.
Unless they're international stars, not really. It's peanuts these days. https://www.reddit.com/r/spotify/comments/13djsl9/how_much_d...
What’s actually scummy is Spotify paying artists $1 per 1000 streams.
Buy CDs. Use Bandcamp.
My spotify wrapped says I listened for 50,000 minutes this year. Assuming 2 minutes per song, that's 25,000 streams. I paid them $110, aka $0.004/stream. Assuming I'm a typical user, they obviously could not afford to pay any more than that per stream.
I googled "spotify pay per listen" and the first result is a reddit comment saying "The average payout on Spotify is only $0.004 per stream." The google AI overview says "Spotify [..] pays artists a fraction of a cent, typically $0.003 to $0.005 per stream". So I'll assume it's something in that ballpark.
So it seems like Spotify's payouts are completely reasonable, given their pricing. Is my logic wrong somewhere?
I’m paying for a family account (that’s around 250/year) and there are 5 people on it so my usage is 1/5th of that (50/year)
So that’s 0.0025€ per stream. I don’t think your assumption is unreasonable.
So let the rights holders make the decision? They would never. Music rights exist for them to extract profit above all else. They don't care about preserving culture or legacy. Which is why it's important that somebody does.
Full disclosure, I am a career musician AND have been known to pirate material. That said, I think this is a valuable archive to build. There are a lot of recordings that will not endure without some kind of archiving. So while it's not a perfect solution, I do think it has an important role to play in preservation for future generations.
Perhaps it's best to have a light barrier to entry. Something like "Yes, you can listen to these records, but it should be in the spirit of requesting the material for review, and not just as a no-pay alternative to listening on Spotify." Give it just enough friction where people would rather pay the $12/month to use a streaming service.
Also, it's not like streaming services are a lucrative source of income for most artists. I expect the small amount of revenue lost to listeners of Anna's Archive are just (fractions of) a penny in the bucket of any income that a serious artist would stand to make.
It is technically not. Stealing means you have a thing, I steal it, now I have the thing and you do not. You can’t steal a copyright (aside from something like breaking into your stuff and stealing the proof that you hold the copyright), and then a song is downloaded the original copyright holder still have copy.
Calling piracy theft was MPAA/RIAA propaganda. Now people say that piracy is theft without ever even questioning it, so it was quite successful.
that seems like an overly narrow definition… what about identity theft, or IP theft?
https://www.justice.gov/usao-ndca/pr/superseding-indictment-...
IP theft is more like espionage and possibly lost hypothetical revenue. Again, it isn’t larceny, burglary, etc. You still have the knowledge, it’s just that so does the perpetrator.
Moreover discussions of IP gets into whether it even makes sense to be able to patent algorithms which are at their core just mathematics. So before you can talk about stealing the quadratic formula you need to prove that the quadratic formula is something that can be property.
Oh also, I don't see why I should ever pay for trains or movie tickets if there are seats available. I can just walk in! The event will happen anyway. Its not stealing.
Everyone should just download all art, music and literature for free. Musicians, artists and writers can all make money some other way while I enjoy the works of their efforts.
What the music/movie industry was claiming in court was not theft. There is no statute that identifies piracy as theft. They were claiming copyright violation and wanted to collect damages for lost revenue.
You are bringing up “identity theft” which is also not theft. If you post your PII here and I use it to open a credit card in your name and then spend a bunch of the money using that card on buying goods and services, you are not the victim. What I do in that case is defraud the bank. They are the ones who are the actual victim and in the ideal world they would be the ones working with the authorities to get their money back.
Of course they would rather not do that so they invented a crime called identity theft and convinced everyone that it is ok for them to make you the victim. They make your life hell since they can’t find the actual criminal while you spend thousands of dollars trying to prove that you don’t owe thousands of dollars. But in reality you were not any part of the fraud. It is on the bank to secure their system enough to prevent this. But they have big time lawyer money and you don’t so here you are.
Same as if someone released every book on Kindle for free. There are rules. Project Gutenberg is great. They don't just steal every book they can.
Not to mention the organization is openly trying to profit from this data by selling it to big tech orgs for AI training! None of the artists consented to that, I am sure, to say nothing if Spotify's interests.
On top of that they beg for donations.