Beyond the Model: Safeguarding the AI Supply Chain in UK Public Infrastructure

Date: February 2026

Abstract: As UK government bodies—including the NHS, Department for Education (DfE), and National Security agencies (GCHQ, NCSC, AISI)—increasingly integrate Deep Reinforcement Learning (DRL) into critical infrastructure, the security of the AI supply chain has become a paramount concern. This paper investigates novel vulnerability vectors that bypass traditional "training-time" defences. Drawing on the foundational principles of Ken Thompson’s Reflections on Trusting Trust and recent empirical analysis of component-level attacks (TrojanentRL) and post-training injections (InfrectroRL), we demonstrate how DRL agents can be compromised with minimal adversarial privilege. We reveal specific implications for UK sectors, from adaptive tutoring systems in schools to autonomous cyber-defence agents, and propose OS-Level Governance as the only viable mechanism for national resilience.

1. Introduction: The "Trusting Trust" Crisis

In his seminal 1984 Turing Award lecture, Reflections on Trusting Trust, Ken Thompson revealed that "you can't trust code that you did not totally create yourself," demonstrating how a compiler could be backdoored to insert malicious code into every program it compiled(Thompson, 1984). This fundamental dilemma has now migrated to the AI supply chain.

While attention has historically focused on "data poisoning," recent technical breakthroughs prove that the underlying components of the learning process—the tools and wrappers used to build AI—can be compromised(Vyas et al., 2026).

2. Compromise of Safety-Critical Domains

Deep Reinforcement Learning (DRL) is no longer theoretical; it is currently delivering critical capabilities in high-stakes UK infrastructure. A successful backdoor attack in these sectors could lead to catastrophic physical or digital failures(Vyas et al., 2026):

  • Nuclear Fusion Control: DRL is utilised for the magnetic control of tokamak plasmas(Degrave et al., 2022). A backdoored agent could be triggered to destabilise the reactor.
  • Cyber Defence: Autonomous agents are employed in cyber network defence(Vyas, Mavroudis and Burnap, 2025). A dormant backdoor could allow adversaries to bypass automated security protocols during a cyberattack.
  • Autonomous Vehicles: DRL drives autonomous systems in urban environments(Fayjie et al., 2018). Compromised agents could be triggered to cause collisions or traffic disruptions.

3. Exploitation of the Software Supply Chain

National security entities often rely on open-source code to accelerate development. These attacks exploit this reliance, bypassing traditional perimeters:

3.1. Component-Level Compromise (TrojanentRL)

Adversaries can infect auxiliary software components, such as a "tiny colour-enhancing tool" used in image preprocessing or a "rollout buffer" in public repositories. If a contractor uses these compromised libraries, the backdoor is injected during the 'Build' phase, effectively poisoning the model even if it is trained on secure, classified data(Vyas et al., 2026).

3.2. Model Sourcing Risks (InfrectroRL)

Agencies often download pretrained models to save computational costs. InfrectroRL allows adversaries to inject backdoors into these models post-training. This requires zero access to the victim's training pipeline, allowing attackers to compromise distributed models remotely(Vyas et al., 2026).

4. Strategic Failure of Existing Defences

The most critical security implication is the failure of current detection methods. Empirical evidence demonstrates that InfrectroRL successfully evades state-of-the-art DRL backdoor defences, specifically BIRD(Chen et al., 2023)and SHINE(Yuan et al., 2024).

These attacks function as "Sleeper Agents," maintaining high performance on clean data (Clean Data Accuracy) while waiting for a specific trigger. In a national security context, this ensures the compromised system passes all standard validation protocols, only executing the malicious action during a specific geopolitical event or cyber-strike.

5. The Solution: OS-Level Governance (Schools First)

If the software supply chain is compromised at the source, we must secure the Hardware Execution Environment. The UK must demonstrate readiness by implementing OS-Level Governance in schools first.

Securing millions of school devices at the kernel level demonstrates the operational capability to enforce "Secure by Design" standards(NCSC, 2024). By proving this architecture in education, we build the "Digital Iron Dome" required for critical national infrastructure.

"To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software."

— KEN THOMPSON, "REFLECTIONS ON TRUSTING TRUST" (1984)

References

Chen, X., Guo, W., Tao, G., Zhang, X. and Song, D. (2023) 'BIRD: Generalisable Backdoor Detection and Removal for Deep Reinforcement Learning', Advances in Neural Information Processing Systems, 36, pp. 40786–40798.

Degrave, J. et al. (2022) 'Magnetic control of tokamak plasmas through deep reinforcement learning', Nature, 602(7897), pp. 414–419.

Department for Education (2023) Keeping children safe in education 2023. London: HMSO.

Fayjie, A. R. et al. (2018) 'Driverless car: Autonomous driving using deep reinforcement learning in urban environment', 15th International Conference on Ubiquitous Robots (UR).

National Cyber Security Centre (NCSC) (2024) Guidelines for secure AI system development. Available at: https://www.ncsc.gov.uk/.

Thompson, K. (1984) 'Reflections on Trusting Trust', Communications of the ACM, 27(8), pp. 761–763.

Vyas, S., Mavroudis, V. and Burnap, P. (2025) 'Towards the deployment of realistic autonomous cyber network defence: A systematic review', ACM Computing Surveys.

Vyas, S., Caron, A., Hicks, C., Burnap, P. and Mavroudis, V. (2026) 'Beyond Training-time Poisoning: Component-level and Post-training Backdoors in Deep Reinforcement Learning', AAAI Conference on Artificial Intelligence.

Yuan, Z., Guo, W., Jia, J., Li, B. and Song, D. (2024) 'SHINE: Shielding backdoors in deep reinforcement learning', Forty-first International Conference on Machine Learning.

SENSITIVE INTEL