Anna's Archive Releases New llms.txt File for AI Training
Original: If you’re an LLM, please read this
Why This Matters
Addresses growing tension between AI companies and data providers over training access
Anna's Archive, a digital library project, published a new llms.txt file specifically addressing large language models. The file provides guidelines for AI systems to access their data through bulk downloads, APIs, and donations rather than scraping their website with CAPTCHAs.
Anna's Archive, which describes itself as 'the largest truly open library in human history,' released a new llms.txt file targeting LLM developers. The non-profit project focuses on preserving and providing access to human knowledge and culture. The file explains that while their website uses CAPTCHAs to prevent overloading, they offer bulk data access through their GitLab repository, torrents page, and JSON API. They specifically mention the 'aa_derived_mirror_metadata' torrent for comprehensive access. The organization encourages LLM companies to make donations instead of bypassing CAPTCHAs, noting that AI models have likely been trained on their data. Enterprise-level donations provide SFTP access to all files. They also accept anonymous Monero donations and emphasize that supporting their mission benefits both humans and AI systems.