• Lojcs@piefed.social
    link
    fedilink
    English
    arrow-up
    51
    arrow-down
    1
    ·
    1 day ago

    To be fair, the 10k is just a sample. The true amount is 86 million, about a quarter of all Spotify songs.

    Put another way, for any random song a person listens to, there is a 99.6% likelihood that it is part of the archive. We expect this number to be higher if you filter to only human-created songs. Do remember though that the error bar on listens for popularity 0 is large.

    For popularity=0, we ordered tracks by a secondary importance metric based on artist followers and album popularity, and fetched in descending order.

    We have stopped here due to the long tail end with diminishing returns (700TB+ additional storage for minor benefit), as well as the bad quality of songs with popularity=0 (many AI generated, hard to filter).

    Also it sounds like they had difficulty scraping some of the less popular songs and got them from somewhere else.