AI & Copyright: Why the Debate No Longer Starts with Training Alone

27. March 2026
Gernot Fritz, Hannah Kercz, Amina Kovacevic

The copyright debate surrounding artificial intelligence has gained significant clarity and intensity over the past two years. What was long treated as a largely theoretical question is now increasingly shaped by case law. Across Europe, the United States, and Asia, more defined legal contours are emerging – yet without converging into a fully coherent or unified framework.

What is particularly striking is the shift in focus. The discussion no longer primarily revolves around whether AI-generated output can be protected by copyright. Instead, the real conflict now unfolds along the entire AI value chain: from training data to input data, and ultimately to generated outputs.

In this context, copyright is no longer a niche issue. It is becoming a structural dimension of AI. Recent developments illustrate this evolution with increasing clarity.

Training AI systems: between exceptions, opt-outs, and emerging licensing logic

A central focus of recent case law concerns whether, and under what conditions, copyrighted works may be used to train AI systems.

The decisions of the Regional Court and Higher Regional Court of Hamburg in Kneschke v. LAION reflect a comparatively open approach. They clarify that the use of protected content for training purposes is not automatically unlawful but may, under certain conditions, be covered by copyright exceptions – particularly in the context of text and data mining (TDM).

This approach is doctrinally consistent. AI training is neither treated as a legal vacuum nor as inherently unlawful, but rather embedded within the existing copyright framework. Key factors include lawful access to the content, the technical necessity of reproductions, and the absence of an effective opt-out.

The latter is particularly relevant in practice. The Hamburg courts set a high bar for opt-outs: purely textual reservations are insufficient; a machine-readable opt-out is required. This marks an important shift. The enforceability of copyright increasingly depends on its technical implementation.

This line of reasoning is directly relevant in Austria as well. Section 42h of the Austrian Copyright Act provides a comparable TDM exception based on EU law. Here, too, technically necessary analytical processes will generally be permissible unless a valid opt-out exists.

At the same time, different nuances are beginning to emerge within Europe. While the Higher Regional Court of Hamburg requires strict technical implementation of opt-outs, Danish case law suggests that less formalised mechanisms may also carry legal weight. This creates an initial tension within what is otherwise a harmonised TDM framework.

This divergence is not limited to case law. The European Parliament’s March 2026 initiative report makes clear that the political debate has already moved further. The report explicitly criticises existing opt-out mechanisms as difficult to implement, insufficiently standardised, and lacking transparency. At the same time, it calls for greater transparency regarding training data, viable licensing and remuneration models, and standardised, machine-readable opt-out solutions.

The focus is thus shifting. The question is no longer merely whether training is permissible in individual cases, but whether existing copyright tools are structurally sufficient to govern AI training in a fair and controllable way. The debate is moving away from a purely exception-based logic towards a more licensing- and governance-driven framework.

A look at the United States confirms that no unified approach has emerged there either. The assessment is primarily based on the doctrine of fair use and remains highly case-specific. Decisions such as Thomson Reuters v. ROSS Intelligence, Bartz v. Anthropic, and Kadrey v. Meta illustrate that the origin of training data and its economic impact are key considerations. The more problematic or unlawfully sourced the dataset, the more critical the legal assessment becomes.

For practice, this means that it is no longer just the performance of a model that matters, but increasingly the quality and traceability of its data foundation.

Outputs: where legal risk becomes tangible

Alongside training, the output of AI systems is increasingly moving to the centre of attention. The key question here is: when does a generated result itself constitute copyright infringement?

The decision of the Munich Regional Court I in GEMA v. OpenAI highlights this shift in perspective. The focus is no longer primarily on abstract discussions about training, but on the concrete question of whether a system reproduces or makes recognisable use of protected content.

While training processes are often opaque and technically complex, outputs are directly visible and economically exploitable. This is where legal relevance – and risk – materialises most clearly.

Similar developments can be observed in the United States, particularly in litigation brought by media companies against AI providers. Here, too, the conflict increasingly centres on whether AI systems substitute or commercially exploit protected content.

The Getty v. Stability AI proceedings in the United Kingdom highlight another key issue: the lack of transparency in training processes. The case does not so much provide substantive clarity as it demonstrates how difficult it currently is to establish the relevant facts in practice.

This is precisely where regulatory developments begin to intersect with copyright. The AI Act introduces binding transparency and documentation obligations for certain AI systems and models. While it does not directly resolve copyright questions, it significantly improves the practical enforceability of related claims.

In parallel, the European Parliament’s initiative report explicitly expands the perspective beyond training. It addresses not only training data, but also inference processes, retrieval-augmented generation, and AI-driven competitive services. In sectors such as media and journalism, the focus is no longer limited to past data use, but extends to ongoing value creation and the potential substitution of existing business models.

Internationally, a clear trend emerges: while training may still allow for some legal flexibility, outputs are far more exposed to liability – particularly where protected content is recognisably reproduced or economically substituted.

China provides a particularly vivid illustration of this development. Courts have, in several cases, affirmed liability for AI-generated outputs where protected works were recognisably reproduced. In the so-called “Ultraman” case, the Guangzhou Internet Court held a generative AI provider liable for infringing outputs. Subsequent decisions by the Hangzhou Internet Court and the Hangzhou Intermediate People’s Court confirm this output-focused approach.

Protection of AI-generated content: human creativity remains key

By contrast, the question of whether AI-generated content itself is copyright-protected is relatively settled.

Both in the United States and in Europe, human creativity remains the central requirement for copyright protection. Decisions such as Thaler v. Perlmutter, as well as recent national rulings – including the Munich District Court’s decision on AI-generated logos – clearly confirm this principle.

At the same time, courts are increasingly refining what constitutes sufficient human contribution. Mere prompting, selection, or technical effort is not enough. What matters is whether an independent creative contribution of a natural person is reflected in the final output.

This aligns with the core logic of copyright law: protection attaches not to the use of technology, but to individual creative expression.

Chinese case law provides further nuanced insights. While it adopts a comparatively strict approach to liability for AI-generated outputs, it also recognises protection where human contributions can be clearly demonstrated. In the so-called “Half Heart” case, a Midjourney-generated and subsequently edited image was held to be copyright-protected because the human contribution – through selection, guidance, and post-processing – was sufficiently substantiated.

At the same time, further decisions underline that the threshold for such proof is high. Courts increasingly require a traceable and well-documented creative process and generally do not accept post hoc reconstructions of prompting as sufficient evidence. Here, too, the decisive factor is not the use of AI as such, but the demonstrable human creative input.

Interim conclusion: fragmented, but with clear trajectories

While case law does not yet provide a fully coherent picture, several clear patterns emerge.

Training tends to be assessed more permissively than output. The origin and quality of data are becoming increasingly important. Human creativity remains the key criterion for protection. And transparency is emerging as a central lever – both for enforcement and for compliance.

At the international level, further differentiation becomes visible. Europe emphasises transparency, opt-out mechanisms, and regulatory frameworks. The United States relies on the flexible concept of fair use. Asia, in turn, shows a stronger focus on outputs, with a greater willingness to impose liability, combined with a nuanced recognition of human contribution.

This fragmentation is not merely transitional. It reflects structural differences between legal systems – and increasingly also differing industrial policy priorities.

What this means for companies

For companies, the focus is shifting significantly. Copyright risks no longer arise at an abstract level, but along concrete process steps: in the selection and documentation of training data, in the use of input data, in the integration of models, and particularly in the monitoring and control of outputs.

In Europe, this is further reinforced by the growing interplay between copyright and regulation. The AI Act embeds transparency, documentation, and governance requirements into the operational framework of AI systems.

In practice, this means that AI is no longer just a technology project. Anyone developing, procuring, or deploying AI must address copyright issues from the outset – not in isolation, but across the entire system architecture.

Internal policies, clear approval processes, robust documentation, and carefully designed contractual frameworks are no longer optional. They are becoming prerequisites for the compliant and scalable use of AI.

Conclusion: copyright as infrastructure

Recent developments make one thing clear: copyright issues in the context of AI can no longer be addressed in isolation. They are an integral part of how AI systems are designed, structured, and operated.

The key shift lies less in the underlying principles than in their application. Copyright is increasingly assessed along data flows, processes, and system decisions – while at the same time being shaped by regulatory developments.

Case law, regulation, and policy initiatives are converging. While courts continue to refine doctrinal lines, the AI Act – and in particular the European Parliament’s initiative report – drive structural change towards greater transparency, traceability, and remuneration.

The copyright conflict does not begin with output – and not even with training. It begins with the data foundation, continues in system architecture, and ultimately materialises in use.

Those who want to deploy AI in a compliant and scalable way must start precisely there – happy to accompany you on that journey.