Large Language Doddle? Generative AI and UK Copyright Law Explained

6 June 2023

Intro

With generative AI tools such as ChatGPT, Midjourney and Dall-E creating headlines, we look at how copyright law is developing to keep up with the rapid advancement of this new technology, specifically:

whether AI-generated works can be protected by copyright under UK law at all, and who owns the copyright if they can be protected; and
the risks of copyright infringement, both when training AI models and using the outputs of AI tools.

Can AI-generated works be protected by copyright?

If copyright does not subsist in AI-generated works (e.g. images, articles, music), they can be freely copied by anyone without risk of copyright infringement liability.

Under UK copyright law, works generated by AI, can theoretically be protected as works "generated by computer in circumstances such that there is no human author of the work" (s. 178, Copyright, Designs and Patents Act 1988 ("CDPA")).

However, to be protected under UK law, such works still need to be "original". There is uncertainty in English law both about the correct test for "originality" to be applied and whether the test requires a human author.

The English law originality test was "skill, judgment and labour" until CJEU case law brought in a separate test, that of the "author's own intellectual creation". This was originally introduced in EU Directives on software and databases but has now been applied more broadly to encompass copyright works beyond software and databases (see for example the Painer and Cofemel judgments).
The "author's own intellectual creation" is generally regarded as requiring a higher standard of originality than the English case law standard. Many commentators consider that AI-created works that do not have a human author cannot meet this higher standard. However, there is uncertainty over how broadly the EU test applies in the UK, or whether it will continue to apply in the UK post-Brexit (particularly in light of the Retained EU Law Bill that is currently going through Parliament), and whether it contradicts the CDPA, which seems to provide protection for non-human authored works.

The UK Intellectual Property Office (IPO) ran a public consultation on Artificial Intelligence and IP from October 2021 to January 2022. The Government response decided not to make any changes to the existing law on the subsistence or ownership of copyright in computer-generated works – leaving open this uncertainty. On 15 March 2023, an entirely separate report of Sir Patrick Vallance on the Pro-innovation Regulation of Technologies Review proposed that the UK should "utilise existing protections of copyright and IP law on the output of AI". However, the Government's response did not explicitly mention providing copyright protection to AI-generated works but, instead, focused on infringement issues (see below).

This is not a uniquely UK or European problem. Unlike the UK, copyright can be registered in the US, meaning that the US Copyright Office has had to deal with this question directly. The USCO has consistently refused to register copyright works without a human author, and has now issued guidance on works containing material generated by AI.

AI-Assisted Creations

Many AI tools are currently used to provide inspiration for creators, e.g. a story outline or a melody, which are then adapted to create a new work. It is possible to draw a distinction between these AI-assisted creations and works which are generated by AI exclusively from a text prompt. By analogy, in Hyperion Records v Sawkins [2005] EWCA Civ 565, a composer and musicologist created new versions of a public-domain work, including corrections and additions to make it playable. The Court of Appeal found that, even though the starting point was a public domain score, the composer's revisions made it an "original" work.

Under UK law, to the extent that AI is used as a tool to generate ideas and themes which are adapted by creators into a final work, the overall work is likely to be protected by copyright (although any exclusively AI-developed elements, for example, may not themselves be protected).

If an AI-generated work is protected, who owns the outputs?

Provided that there is some copyright protection, under UK law, the author of a computer-generated work is deemed to be the person "by whom the arrangements necessary for the creation of the work are undertaken" (s. 9(3), CDPA). With a prompt-based AI tool, it is unclear whether the user inputting text prompts or the owner of the AI tool itself would be the author.
Although the only judgment on s. 9(3) CDPA to date held that image frames generated in the course of playing a video game belonged to game's publisher rather than the game's player, the courts may see the position of the user of an AI tool as fundamentally different to that of the player of a video game. This may depend on the amount of information put in by the user, which could change the assessment of whose creativity is expressed in the output.

Any remaining doubt about ownership as between the user and the creator of the tool can be resolved by contract, e.g. under an AI tool's end-user licence agreement, but in practice this is not addressed in the T&Cs for all currently-available tools.

Copyright Infringement

Copyright infringement under UK law occurs when there is copying of the 'whole or substantial part' of a particular work.

If the outputs of an AI tool reproduced specific, identifiable sentences or images, for example, that would likely constitute copyright infringement. However, it may be difficult to identify such specific examples of copying where the AI model is trained on a very large dataset, with well-built AI tools generally designed to create something new without performing literal copying, in part to avoid allegations of copyright infringement.

Whilst rightsholders may struggle to bring actions for copying by reference solely to the works generated by the AI system (the outputs), they may be able to bring actions for copying of the training data itself (the inputs).

Infringement of Inputs: Text and Data Mining
One of the ways in which AI models (particularly LLMs) learn is by being trained on large amounts of data, known as "text and data mining" ("TDM").

Whether a specific TDM process infringes copyright depends on the technical details of that particular TDM process and which territories' law(s) apply:

If a given TDM process involves making and storing permanent copies of complete works, if no licence is in place, and if the TDM was carried out in a jurisdiction without a broad fair use doctrine or statutory TDM exception, the entity that carried out that process may be liable for copyright infringement.
Conversely, the position is less clear where only temporary and transient copies of songs are made, and where only abstracted parameters are stored and used by the AI model, which are not themselves copyright works.

The scope of statutory exceptions from liability from copyright infringement for TDM varies markedly between territories. UK law currently permits "text and data analysis" only for non-commercial research (s. 29A, CDPA). However, in June 2022, the UK IPO announced a proposal to allow TDM for any purpose whatsoever. The proposed exception would have allowed commercial AI tools to be trained on all copyright-protected works without requiring a licence or compensating rightsholders, making the UK one of the most permissive places for AI research in the world. This received significant objection from rightsholders. The UK Government then announced in February 2023 that these proposals were to be scrapped.
By comparison, Article 4(3) of the EU Digital Single Markets Directive provides a broader exception for TDM of lawfully accessible works unless the use of works has been "expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online" (essentially an opt-out). There is currently no legal standard for how rightsholders can appropriately and expressly reserve their rights. One approach which has been suggested is to use robots.txt, a standard used to limit access to web crawlers and search engines, and an express opt-out could easily be included in commercial agreements relating to content.
The disparity between different national approaches may lead to forum shopping in disputes, and conflict of laws arguments are likely to become important in copyright claims regarding AI.

Certainty can be provided by obtaining licences from the rightsholders for the express purpose of TDM. In addition to commercial licences freely negotiated between users and rightsholders, state-approved licensing schemes may emerge.
On 15 March 2023 the report of Sir Patrick Vallance on the Pro-innovation Regulation of Technologies Review stated that the UK "should enable mining of available data, text, and images (the input)". The Government's response stated that, to provide clarity, the UK Intellectual Property Office will produce a code of practice by summer 2023, and that "an AI firm which commits to the code of practice can expect to be able to have a reasonable licence offered by a rights holder in return" (see recommendation 2 of the response). As such, the Government seems to be encouraging an industry-led approach to establish an official licensing framework, with legislation only to be brought in if this cannot be agreed.

How this self-regulation approach will work with cases already being brought in the US and the UK (see here and here) and complex jurisdictional and conflict of laws issues without a global framework, is highly uncertain.

Commentary

The UK Government has stated that it is "putting the UK on course to be the best place in the world to build, test and use AI technology", and, as set out above, copyright is central to the development and exploitation of AI technology. These goals have met resistance from content rightsholders, whose businesses are often built on copyright and to whom AI in many cases presents a commercial threat. Developing a clear legal framework for AI and copyright, which avoids stifling development of AI models in the UK whilst balancing the legitimate interests of rightsholders is an unenviable challenge for the Government.