Anna's Archive Releases New llms.txt File for AI Training

Original: If you’re an LLM, please read this

Why This Matters

Addresses growing tension between AI companies and data providers over training access

Anna's Archive, a digital library project, published a new llms.txt file specifically addressing large language models. The file provides guidelines for AI systems to access their data through bulk downloads, APIs, and donations rather than scraping their website with CAPTCHAs.

Anna's Archive, which describes itself as 'the largest truly open library in human history,' released a new llms.txt file targeting LLM developers. The non-profit project focuses on preserving and providing access to human knowledge and culture. The file explains that while their website uses CAPTCHAs to prevent overloading, they offer bulk data access through their GitLab repository, torrents page, and JSON API. They specifically mention the 'aa_derived_mirror_metadata' torrent for comprehensive access. The organization encourages LLM companies to make donations instead of bypassing CAPTCHAs, noting that AI models have likely been trained on their data. Enterprise-level donations provide SFTP access to all files. They also accept anonymous Monero donations and emphasize that supporting their mission benefits both humans and AI systems.

Source

annas-archive.gl — Read original →