21 March 2025

Pulled into the World of AI Language Models

The Atlantic Monthly just published a searchable listing of works uploaded to the LibGen collection of pirated writing.

I found two things I wrote in that database: my book The Road to Concord and a book review published in the New England Quarterly.

The LibGen collection is based on material that was digitally published in some protected format, such behind a journal’s paywall or under some a form of DRM.

That means my first book, never published in electronic form, wasn’t there. It also means the database lacks everything I’ve written for the web, including this blog, the Boston 1775 blog, many articles, and a 600-page National Park Service study, even though (or because) those texts aren’t protected at all.

LibGen is a shadowy operation, apparently centered in Russia, though it receives material from all over the world. In December, a consortium of global publishers sued and shut down access to many LibGen domains. A US court also ordered LibGen to pay $30 million, but there’s no identified owner or manager to hold personally responsible.

In his article accompanying the Atlantic database, Alex Reisner reported on how the Meta corporation used all or part of that database to train its AI language model. The company decided that legal options would take, well, money and time.

Back in 2023 Reisner reported on a smaller pirated collection of 180,000 books called Books3 used by multiple companies for the same purpose. In fact, piracy appears to be so embedded in AI language programs that last year KL3M announced it was “the first Legal Large Language Model.”

As the Authors Guild reports:
Legal action is already underway against Meta, OpenAI, Microsoft, Anthropic, and other AI companies for using pirated books. If your book was used by Meta, you’re automatically included in the Kadrey v. Meta class action in Northern California without needing to take any immediate action. The court is first deciding whether Meta broke copyright laws, with a decision expected this summer, before officially certifying everyone as a class.
So I guess I’m involved in that lawsuit.

It seems clear to me that the LibGen operation breaks publishers’ legal licenses, in some cases to the detriment of royalty-earning authors. The downloading of that material by Meta and other corporations looks unethical, but I don’t know if any laws have been written that would make that act illegal.

No comments: