Skip to main content

Clifford Chance
IP Insights<br />

IP Insights

EU AI Act: Copyright compliance for GPAI model providers

New copyright compliance obligations established by the EU AI Act for providers of general-purpose artificial intelligence (GPAI) models entered into force on 2 August 2025.

These obligations require GPAI model providers to implement a robust copyright policy and to publish a “sufficiently detailed summary” of their model’s training content using the template issued by the European Artificial Intelligence (AI) Office in July 2025. The aim is to enhance transparency, protect rightsholders, and provide a clear legal framework for the use of large datasets in AI development. The GPAI Code of Practice (jointly with the GPAI models guidelines), published in July, can help organisations providing GPAI models understand what EU AI compliance looks like in practice. (See also our overview article: Is the AI Act any clearer with respect to GPAI models?)

Background

August is a hot month, both climatically and with respect to Artificial Intelligence (AI). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (the AI Act) entered into force on 1 August 2024, and the second tranche of its provisions entered into force on 2 August 2025.

Among this second tranche are provisions that create specific IP-related obligations. Articles 53(1)(c) and (d) of the AI Act require 'providers' of GPAI models who are caught by the AI Act: (i) to put in place a policy to identify and comply with reservation of rights expressed pursuant to the Directive on Copyright and Related Rights in the Digital Single Market (EU) 2019/790 (the "DSM Directive") and (ii) to publish a "sufficiently detailed summary" of the training contents for the model "according to a template published by the AI Office". The European AI Office was established within the European Commission to bring together AI expertise as foundation towards an EU AI governance system.  For an overview of the EU AI Act, see our briefing: The EU AI Act – Overview of key rules and requirements.

In parallel to the entry into force of articles 53(1)(c) and (d) of the AI Act, July saw the awaited publication of two documents that are closely related to these obligations and intend to aid compliance: the final version of the Code of Practice for General-Purpose AI Models (the "Code of Practice" or "Code") – which is voluntary - and the Template for the Public Summary of Training Content for general-purpose AI models (the "Template for the Training Content Summary") – which is mandatory. See also our overview article: Is the AI Act any clearer with respect to GPAI models?

Policy to comply with EU law on copyright

One of the biggest live issues with respect to AI relates to the use of third-party works to train AI models without the consent of the relevant copyright holders. By default, there is an exception from copyright infringement liability for 'text and data mining' under EU law (the "TDM Exception"), which is widely viewed as covering the training of AI models. To protect the rights of copyright holders, article 4(3) of DSM Directive entitles rightsholders to reserve the right of use of their works (also known as 'opt-outs'), meaning that the TDM Exception does not apply to their works.

Within the EU territory, there is currently only one decision handed down by the Hamburg Regional Court in LAION v. Kneschke directly addressing the application of the TDM Exception under the DSM Directive (see here for a brief summary), but the forthcoming decision of the Munich I Regional Court in GEMA v OpenAI may provide further guidance.

Copyright laws are territorial, and the scope of the available TDM and copyright exceptions and limitations varies between jurisdictions. However, the EU AI Act requires qualifying 'providers' to comply with article 53 regardless of where the GPAI model was trained. The stated intention of such 'extraterritorial' effect is to ensure a level playing field for EU-based AI developers.

Code of Practice for GPAI Models

Article 53(4) of the AI Act states that, until a harmonised standard is published, providers can rely on adherence to Codes of Practice to demonstrate compliance with the abovementioned obligations.

The Code of Practice published in July 2025 includes chapters on copyright, transparency and safety and security. The multi-stakeholder drafting process was led by independent experts appointed by the AI Office, with three drafts leading up to the final version, published on 10 July 2025. Several of the leading developers of AI models have already signed the Code of Practice.

Commitment 1 of the copyright chapter relates to the "copyright policy" required by article 53(1)(c) of the AI Act for all GPAI models that the provider places on the EU market, which includes the adoption of several measures to ensure effective communication with affected rightsholders (Measure 1.5). The copyright chapter also focuses on acceptable web crawling activities and identification of rights reservations (measures 2 and 3) and on mitigation of risk of infringing outputs (measure 4). 

In line with article 53(1)(c) of the AI Act, Measure 2 comprises commitments to ensure that only lawfully accessible content is reproduced and extracted when conducting web crawling for text and data mining: in particular, the commitment to respect effective technological measures, and to exclude websites that are found by EU or EEA courts or authorities as "persistently and repeatedly infringing copyright and related rights on a commercial scale". The Code envisages the publication of a list of such websites issued by bodies in the EU and EEA on an EU website.

As for measure 3, the Code details a series of commitments aimed at identifying and complying with rights reservations when engaging in web-crawling activities for the purpose of text and data mining: 

  • Use web crawlers that identify rights reservations via the Robot Exclusion Protocol (robots.txt), and identify and comply with other appropriate machine-readable protocols, such as llms.txt,  to express rights reservations that have been adopted by international or European standardization organisations or are state-of-the-art.
    This commitment, however, will not restrict rightsholders' ability to reserve rights by any appropriate means. Signatories to the Code are also encouraged to collaborate with rightsholders and other relevant stakeholders to develop appropriate machine-readable rights reservation standards and protocols.
  • Implement appropriate measures to enable rightsholders to access information about web crawler usage, robot.txt features and other measures adopted by signatories to identify and comply with rights reservations (including updates to such mechanisms).
  • Lastly, signatories providing or controlling an online search engine are encouraged to ensure that their compliance with rights reservations for data mining does not have a direct adverse effect on the indexing of related content (such effects would include, for example, exclusion from search results or significantly lower ranking). 

Measure 4 of the Code is concerned with potential IP infringement in the so-called "output phase" (i.e. the generation of output by a downstream AI system that infringes copyright or related rights) and comprises a technical and a legal commitment, adopted regardless of whether the model is vertically integrated into the provider's own AI system or is contractually provided to a third party: (i) the implementation of "appropriate and proportionate technical safeguards" to prevent the models from generating infringing outputs and (ii) the prohibition of copyright-infringing uses of the model in the "acceptable use policy, terms and conditions, or other equivalent documents". Although the EU AI Act does not contain any requirements regarding model outputs, these commitments reflect what some providers are already implementing (i.e., measures such as automated input-output comparison and prompt and output filters, among others).

The AI Office has offered to collaborate closely with providers that adhere to the Code of Practice and has stated that during this first year (i.e., until August 2026) it will not consider the providers to have broken their commitments and will not adopt any measures against them for infringing the AI Act if they do not fully implement all commitments immediately after signing the Code.

Template for Training Content Summary

The Template for the Training Content Summary (the "Template") was published on 24 July 2025, serving as a minimum standard that does not prevent providers from voluntarily disclosing additional information. It is also the result of a multi-stakeholder process during which participants involved in the drafting process of the Code of Practice have been given the opportunity to provide feedback.

The Explanatory Notice accompanying the Template published by the Commission specifies that the Summary should mandatorily cover data used in all stages of "training" (which encompasses pre-training to post-training, but in principle would exclude other input data used during the model's operation, for example, through retrieval augmented generation). It should be published on the provider's official website and all public distribution channels when the model is placed on the EU market at the latest. Moreover, the Summary should be updated when the provider conducts further training on data requiring such an update.

The AI Office may verify whether the Template has been completed correctly and can sanction non-compliance with fines of up to 3% of the provider’s annual total worldwide turnover in the preceding financial year or EUR 15 000 000, whichever is higher; such enforcement powers will enter into force as of 2 August 2026.

The Template comprises three sections:

  • "General information" about the provider, model and training data (in particular, general characteristics and information on modalities and the size of those modalities, using broad ranges).
  • "List of data sources", including an identification of the main public and private datasets used for training (with differing levels of detail) and a narrative description of data scraped online and all other data sources used (including a summary of the main scraped domains; identified as those in the 10% of all domain names, determined by size of content scraped, with some particularities for SMEs).
  • "Relevant data processing aspects", to support rightsholders and other parties with legitimate interests in exercising their rights. This includes information on measures adopted to identify and comply with rights reservations and on measures taken to avoid or remove illegal content from the training data.

According to the Explanatory Notice accompanying the Template, it has been designed to require differing disclosure requirements depending on the source of data, with the aim of striking a balance between transparency and protection of trade secrets pertaining to the provider. Thus, for example, whereas the disclosure required is more limited in the case of licensed data and private datasets that are not commercially licensed, the disclosure on publicly available datasets requires the provision of more detailed information.

Outlook

The following months will allow organisations to start seeing how the obligations of article 53(1)(c) and (d) of the AI Act are implemented in practice. Whilst the Code of Practice and the Template do provide stakeholders with some additional legal certainty regarding GPAI providers' compliance obligations, significant uncertainty still remains, both as to how rightsholder opt-outs can be communicated and identified, and regarding the level of granularity that must be provided in the Training Content Summary.

  • Share on Twitter
  • Share on LinkedIn
  • Share via email
Back to top