Reddit is currently blocking the Internet Archive’s Wayback Machine from indexing most of its content. This means that the Wayback Machine can now only crawl and archive Reddit’s homepage, but it cannot access or archive posts, comments, subreddits, profiles, or detailed content on Reddit.
The reason behind this move is that AI companies have been using the Wayback Machine to scrape Reddit data without licensing or permission, bypassing Reddit’s rules on data use. Reddit has struck licensing deals with companies like OpenAI and Google to provide access to its data for AI training but wants to prevent unauthorized scraping via archival services. This has led Reddit to close off the free archiving of its site’s content outside of the homepage to protect user privacy, control content ownership, and monetize access.
This shift marks a big change from earlier policies when Reddit allowed “good faith actors,” such as the Internet Archive, to archive the site freely. Now, Reddit is restricting access until the Internet Archive can ensure compliance with Reddit’s rules, especially concerning user privacy and removed content. This means many Reddit conversations and cultural content may no longer be preserved for posterity through the Wayback Machine.
In summary, Reddit is restricting the Wayback Machine’s ability to archive its content due to concerns about AI scraping and to protect its data licensing interests, limiting the archive’s scope to the homepage only.
Leave a Reply