Reddit Cuts Off Wayback Machine To Thwart AI Scrapers

If you have ever relied on the Wayback Machine to dig up an old Reddit post or see a thread that has since vanished, that window is about to close.

Reddit has announced that it is cutting off most of its site from the Internet Archiveโ€™s Wayback Machine, claiming some artificial intelligence (AI) companies have been quietly sneaking around the archive to bypass its data restrictions.

What is Internet Archive?

The Internet Archive is a non-profit organization dedicated to preserving as much of the internetโ€™s history, from old websites to books and cultural artifacts. Its Wayback Machine allows anyone to see how a webpage looked at a specific point in time โ€” even if it has been deleted or changed since. However, Reddit says the archive has also been keeping posts that users have removed, a practice it argues raises privacy concerns.

โ€Internet Archive provides a service to the open web, but weโ€™ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine,โ€ Reddit spokesperson Tim Rathschmidt said in a statement to The Verge. โ€œUntil theyโ€™re able to defend their site and comply with platform policies (e.g., respecting user privacy, re: deleting removed content) weโ€™re limiting some of their access to Reddit data to protect redditors.โ€

The new restrictions have already been put into effect since yesterday, with Reddit saying it had notified the Internet Archive in advance.

The change means the Wayback Machine will no longer be able to save Reddit posts, comments, or profiles. It will now be able to save only Redditโ€™s homepage. For years, the archive has been a go-to for journalists, researchers, and curious users, preserving snapshots of Redditโ€™s sprawling conversations. Now, it will function more like a snapshot of daily trending headlines rather than a full historical record.

This move is part of a larger trend: Reddit has spent years tightening control over its data as AI companies scramble for content to train their models. Deals with Google and OpenAI have reportedly brought in millions, and Reddit has made it clear โ€” if AI firms want access, they will have to say.

Reddit has been tightening control over its data for years, particularly as AI companies scramble for content to train their models. Deals with Google and OpenAI have reportedly brought in millions for the platform, but Reddit has made clear that if AI firms want access, they have to pay. Earlier this year, the company even sued AI start-up Anthropic, accusing it of scraping the site without permission.

โ€œWe have a longstanding relationship with Reddit and continue to have ongoing discussions about this matter,โ€ Mark Graham, Director of the Wayback Machine, said in a statement toย The Verge.

While Reddit says the move is about safeguarding user privacy and upholding its rules, critics worry it risks erasing pieces of the internetโ€™s historical record. Once a post vanishes from Reddit and canโ€™t be archived, itโ€™s gone for good โ€” taking with it a piece of online culture that might otherwise have been preserved.

Kavita Iyer
Kavita Iyerhttps://www.techworm.net
An individual, optimist, homemaker, foodie, a die hard cricket fan and most importantly one who believes in Being Human!!!
spot_img

Read More

Suggested Post