Will current plagiarism rules hold up in a generative AI world?
There is no such thing as an original idea. Unfortunately, to write that would be plagiarizing Mark Twain. I wonder what he would have thought of ChatGPT.
Rarely have two forces been so clearly on a collision path than the economic promise from generative AI writing and our strident demand for originality from writers. In industries where the primary product is original writing (journalism, academia, and literature), exacting professional codes have developed to combat its theft (plagiarism). Harvard defines plagiarism as, “draw[ing] any idea or any language from someone else without adequately crediting that source…[that is] unacceptable in all academic situations, whether you do it intentionally or by accident.” When taken literally this is a more ambitious charge than it intends to be. Ridley Scott, annoyed at historian’s criticism of his recent Napoleon film retorted, “Excuse me mate, were you there? No? Well, [shut it then].” Scott’s radical skepticism of historical truth is quite a tenuous position but he is right of course that everything modern historians know about Napoleon comes from someone else and every opinion they have is likely at least largely derived from what others have told them. No one though cites a source on how they know Napoleon’s name anymore than NASA scientists cite Newton anytime they discuss gravity.
Plagiarism of course doesn’t really work like that. Academics and journalists are allowed to use basic facts, common ideas, and techniques that they certainly took from somewhere without attribution. It is a deliberately fuzzy policy that relies on somewhat vague but not entirely arbitrary borders of what could be assumed to be widely known and in some ways “unowned” versus what needs to be attributed. The borderlines of these concepts are not always clear to professional academics much less to anyone else and they are also always slowly evolving. For most of the last 200 years they have been evolving to become stricter.
The wholesale theft of text and claiming credit for new discoveries is an ancient concern but few pre 19th century authors rigorously footnoted their sources or kept detailed notes of where they derived their ideas from innovations brought about by the Industrial Revolution. The 20th century would see knowledge production become a more common profession than farming in the developed world. Faster communication made knowledge increasingly global and both the ability and economic rationale to protect intellectual ownership grew. Few industries were transformed more than academia evolving largely from the province of landed gentry and the clergy into highly specialized and rigorous professional fields. This new world necessitated a different understanding of plagiarism. An understanding that has only grown stricter as technological development made even the most minor forms of plagiarism easily detectable.
Our 20th century understanding of plagiarism is poorly suited to the new revolution in knowledge production. Generative AI promises to be one of the engines of the future economy. The technology has a long development road ahead before being able to compete with people on synthesizing new ideas. However, on the mechanics of writing and the creation of summaries, transitions, and statements of basic facts, it is already matching us. ChatGPT and its competitors have already become widely used writing aids in the business world. It is hard to imagine that its successors will not be even more indispensable to the economy.
Yet in some ways, Generative AI is nothing more than a plagiarism machine which sifts through millions of works, extracting ideas and spitting it back as output generally without attribution. Like Napoleonic historians, AI knows nothing about anything inherently. Every sentence or idea ChatGPT suggests is an algorithmic mish-mash of similar sentences and ideas it has encountered. In most cases it would be as impossible for the author to know the source of ChatGPT’s knowledge as it would be to trace who the first person that told them Lincoln was an American president. ChatGPT’s suggestions still influence the development of AI assisted papers and greatly expands our pool of “general knowledge.” In a society where plagiarism is so strictly policed, a battle is certainly coming.
It is possible that the further use of AI will be curtailed by a stricter enforcement of intellectual copyrights. If universities discipline students caught accidentally including another author’s idea through an AI program or knowledge producers maintain litigious ownership over their content, generative AI might become less profitable. Christine Gay’s recent ouster, while having nothing inherently to do with AI, provides a cautionary tale of using summaries of other works without careful attribution.
A more pertinent example influential for the future of writing is a much less covered dispute between the NYTimes and OpenAI. The NYTimes recently sued OpenAI for training its model on NYTimes articles and sometimes “regurgitating” the articles in its responses. A victory for the NYTimes could sharply slow the development and use of generative AI.
The arc of history however, bends towards technological realities. The music industry fought a long unsuccessful war against online streaming. The newspaper industry fought to prevent social media from stealing their stories. Ultimately, both industries would eventually reinvent their business model to collaborate with these forces. The pressure of so many benefiting from the convenience of the internet overwhelmed the legal rights of incumbent copyright holders. The same fate will inevitably be true of writing based industries.
We recognize that Mythos will exist in a world where stories, knowledge and writing are intertwined with an AI driven economy. Our goal is to embrace the efficiencies of this new technology while preserving the originality and voices of the community members our product serves.