Copyright and AI: protecting value throughout the AI process

In this short article we will discuss how copyright may offer protection to investments in AI, especially in cases where AI software may fall short of patent protection.

It seems to us that the narrative in the press and the legal reviews is generally about whether AI-generated music, books or articles may enjoy copyright protection like their human-generated equivalents, and we'll come to that in a minute.

But I think it's important to consider the interplay between AI and copyright more widely, assessing whether there are any other steps of the AI process where copyright may be relevant.

Hence, we have broken down AI exploitation process into three steps: first, the collection of raw data to fuel AI systems; second, the data analytics phase; and third, the results phase, where the AI generates data or produces output.

Phase One: Raw Data

Let's start with 'Phase 1', where raw data is collected and fed into the AI software. One may use that data as training set to teach AI do something. For example, many cat owners can't resist the temptation of uploading to YouTube videos of their cats doing funny things. Google used those videos to train a neural network of computers (machine learning) how to recognise cats.

In legal terms, training sets may be made up of personal data, such as people's names, faces, purchase preferences, but also non-personal data, such as information about the weather, prices, energy consumption. So, in a way, AI is where data meets algorithm: Without data, AI is basically useless. Because raw data is the main asset that makes AI do its magic, it is important to investigate the relationship between (i) data inputted into the AI software and (ii) copyright.

When using user-generated content – such as posts, tweets, artworks etc – as training sets for AI, one has to be mindful that that content may be covered by copyright, just as personal data would enjoy privacy protection. So, in order to make sure that the use of input data is lawful, it is necessary to obtain consent from the rightful data owner, be it in the form of privacy consents, data sharing agreements, or copyright licenses, as the case may be. There are exceptions to this rule, of course, for example, the so-called 'text and data mining' exception set out in the EU copyright directive, whereby it is possible for a research organisation having lawful access to copyrighted materials to use AI to analyse the data contained therein to infer trends and correlations.

Alternatively, a business investing in AI may wish to protect the training set it has put together. Here is where copyright and other IP rights such as sui generis database rights may offer protection, for example, to the way data has been compiled.

However, the aim to have exclusive access to a training set may be countered by what seems to be a very important proposal from the EU. In February 2022 a proposal for the so-called Data Act was presented, whereby businesses collecting data through connected devices such as smart watches, smart TVs or smart homes – may be forced to share that data with the users and with third parties under certain conditions. This is expected to be a huge development in the data monetisation sphere, so stay tuned for more developments in the coming months. You can learn more about the Data Act in our article: The Data Act: A proposed new framework for data access and porting within the EU.

Phase Two: The AI algorithm

Let's now move to the second step of the AI process, i.e. the 'pure' data analytics phase. Here is where AI software does its magic deriving valuable insights from raw data.

It is intuitive that here copyright offers protection to the AI software, subject to the algorithm meeting the requirements set forth under local laws.

The tricky point is that many software developers make use of open source solutions as a means to program more quickly and easily. Because open source software may entail certain obligations to make available to third parties (copyleft for example) organisations should be aware of these risks and manage them properly, possibly by setting out clear policies outlining if and how open source software may be used to develop AI solutions.

Phase Three: AI-generated output

We will now move to the third and final phase, where AI generates an output that may be in the form of valuable insights or even content such as music, artworks, articles. The question here is: does this content qualify as copyrighted material?

The answer is 'yes' when the content is generated as a result of a process that includes tangible human intervention. For example, consider AI software that draws an abstract picture as a result of having been trained with pictures inputted by a human. Here the selection of input pictures is key, therefore the AI-generated content is also the result of human creativity, which is key to obtain copyright protection in most jurisdictions.

But what if AI generates abstract pictures as a result of having autonomously selected input pictures to take inspiration from? The International Association for the protection of IP (AIPPI) in 2019 concluded that one may reasonably exclude that AI-generated works with no human intervention may enjoy copyright protection, because they lack a human component of creativity.

However, those works may in principle enjoy a weaker protection through the so-called related rights, which are the rights of a work not connected with the work's actual author. So, one potential collateral effect of this may be the incentive to claim that AI had human intervention – where that wasn't the case – in order to obtain the stronger protection afforded by copyright, something that we think could jeopardise the European Union's effort to incentivise more transparent use of AI, as outlined, for example, in the draft AI Regulation. You can read more about the AI Regulation in our article The European Commission's AI Regulation - The Future of AI Regulation in Europe and its Global Impact and the EU courts' approach to AI transparency in The Italian courts lead the way on explainable AI

To sum up…

Every step of the AI exploitation process triggers rights and risks which require careful considering.

When it comes to enforcing rights in technologies – such as AI – which combine the use of data and algorithms, it is difficult to identify a 'one-size-fits-all' IP legal scheme that affords 360° protection. You may use copyright to protect an AI algorithm, but copyright is less than satisfactory when it comes to exploiting raw data and output data on an exclusive basis (trade secret protection is a better option here). Similarly, in other situations, relying on contractual protection may be the best option.

So, while IP was certainly born to protect these kinds of investments in AI, one has to be selective and identify which IP legal schemes maximise the chances of enforcing rights in the technology. If the logic underlying AI software is the key asset, a balanced strategy may rely on (i) ensuring that copyright protection is afforded to the software, and (ii) structuring robust contracts for the output data exploitation (e.g. in the form of data license / sharing agreements, if output data is destined to third-party exploitation).

Clifford Chance

Intellectual Property

Talking Tech