The Growing Threat Landscape in Software Supply Chains
AI supply chain attacks exploit vulnerabilities in model registries, training datasets, and AI-powered development tools to compromise downstream applications. Therefore, organizations adopting AI must extend their security practices beyond traditional software supply chain protections. As a result, new attack vectors targeting model poisoning, prompt injection in CI/CD, and malicious model weights require dedicated defense strategies.
Attack Vectors in AI Pipelines
Malicious actors target multiple stages of the AI supply chain including model hosting platforms, training data sources, and fine-tuning pipelines. Moreover, compromised models on public registries like Hugging Face have been discovered containing hidden backdoors that activate on specific inputs. Consequently, downloading and deploying pre-trained models without verification creates significant security risks.
Training data poisoning represents another critical vector where attackers inject manipulated examples that cause models to produce incorrect outputs for targeted queries. Furthermore, these poisoned samples are often statistically indistinguishable from legitimate training data, so detection after the fact is genuinely difficult. Researchers have demonstrated that even a small fraction of poisoned records can implant a reliable backdoor while leaving aggregate accuracy untouched, which is precisely what makes the technique attractive to attackers.
AI supply chain attacks target models, training data, and deployment pipelines
Why Pickle Files Are the Soft Underbelly
A large share of distributed model artifacts still rely on Python’s pickle format, and that format is fundamentally unsafe to load from untrusted sources. Pickle is not merely a data serialization scheme; it encodes a small stack-based program, and the __reduce__ protocol lets an object specify a callable to run during deserialization. As a result, simply calling torch.load() on a malicious checkpoint can execute arbitrary code on the host — no exploit chain required, just a crafted file.
The dangerous part is how ordinary this looks. A weaponized checkpoint behaves exactly like a legitimate one for inference, so nothing about model accuracy reveals the payload. The malicious code typically fires the moment the weights are loaded, often before any prediction is made. Therefore, the only durable defenses are scanning artifacts before loading them and migrating distribution to a format that cannot carry executable code at all.
Defending Against Model Tampering and Poisoning
Model provenance verification ensures that downloaded models come from trusted sources with cryptographic signatures. Additionally, scanning model weights for known malicious patterns using tools like ModelScan and Fickling detects pickle deserialization attacks before deployment. For example, a model file containing embedded Python code execution will trigger detection rules.
# Model security scanning with ModelScan
from modelscan import ModelScan
scanner = ModelScan()
# Scan a model file for malicious content
results = scanner.scan("downloaded_model.pkl")
for issue in results.issues:
print(f"[{issue.severity}] {issue.description}")
print(f" Location: {issue.source}")
print(f" Operator: {issue.operator}")
# Verify model hash against registry
import hashlib
def verify_model_integrity(model_path, expected_hash):
sha256 = hashlib.sha256()
with open(model_path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
sha256.update(chunk)
actual_hash = sha256.hexdigest()
assert actual_hash == expected_hash, (
f"Model integrity check failed: "
f"expected {expected_hash}, got {actual_hash}"
)
return True
# Use safetensors format instead of pickle
from safetensors.torch import load_file
model_weights = load_file("model.safetensors") # Safe
Using safetensors format eliminates arbitrary code execution risks because it stores only tensor data and metadata, with no opcode stream to abuse. Therefore, prefer this format over pickle for all model distribution, and reject any inbound artifact that cannot be converted to it.
Wiring Verification Into the Pipeline
Manual scanning does not scale, so the controls above belong in an automated gate that every model passes through before it can be promoted. A practical pattern teams use is a small admission step in CI that scans the artifact, checks its hash against an allowlist, and verifies a signature — failing the build on any one of them. The example below sketches that gate as a reusable function.
# CI admission gate: scan, hash-pin, and verify signature before promotion
import subprocess
from modelscan import ModelScan
def admit_model(path: str, expected_hash: str) -> None:
# 1. Static scan for unsafe operators / embedded code
results = ModelScan().scan(path)
if any(i.severity in ("HIGH", "CRITICAL") for i in results.issues):
raise SystemExit("Blocked: unsafe operators detected in model")
# 2. Pin to a known-good hash recorded at training time
verify_model_integrity(path, expected_hash)
# 3. Verify a cryptographic signature produced by the training job
subprocess.run(
["cosign", "verify-blob", "--signature", f"{path}.sig", path],
check=True,
)
print("Model admitted to production registry")
Because each check is independent, an attacker has to defeat all three at once: bypass the scanner, match a hash recorded before they had access, and forge a signature tied to your signing identity. In contrast to a single check, this layered gate raises the cost of a successful tamper dramatically.
CI/CD Pipeline Protection
AI-powered code generation tools introduce new risks when their suggestions contain vulnerable or malicious code patterns. However, automated security scanning of AI-generated code catches many issues before they reach production. In contrast to human-written code, AI-generated code may contain subtle logic flaws that pass syntax checks but introduce security vulnerabilities such as hardcoded secrets, missing authorization checks, or insecure deserialization copied from training data.
Prompt injection deserves special attention inside the pipeline. When an LLM agent has tool access — reading issues, opening pull requests, or running build steps — attacker-controlled text in a ticket or dependency README can hijack the agent’s instructions. Therefore, treat any model with write access to the repository as a privileged actor and constrain its permissions with the same least-privilege discipline you would apply to a service account.
Implement model signature verification in your deployment pipeline to ensure only approved models reach production. Specifically, sign models during the training pipeline and verify signatures during the deployment step using cosign or similar tools.
Model signature verification prevents deployment of tampered models
Organizational Security Framework
Establish an AI model inventory that tracks all models in use, their sources, versions, and risk assessments. Additionally, create approval workflows for adopting new models that include security review and vulnerability scanning. For instance, require security team sign-off before any third-party model enters the production environment.
Regular retraining with verified data sources and continuous monitoring of model outputs for anomalous behavior complete the defense-in-depth strategy. Moreover, incident response plans should include procedures for model rollback when compromise is detected. An AI bill of materials — analogous to a software SBOM — gives responders the lineage they need to answer “which datasets and base models fed this artifact?” within minutes rather than days.
Defense-in-depth strategy covers the entire AI model lifecycle
When These Controls Are Overkill — and Their Limits
Not every project needs the full apparatus. A team running only a vendor-hosted frontier model through an API has no weights to scan and a very different threat profile, so investing heavily in pickle scanning would be misplaced effort — their risk lives in prompt injection and data exfiltration instead. Honest scoping matters: match the controls to where artifacts actually cross a trust boundary.
It is also worth being candid about what these defenses do not solve. Scanning catches known-unsafe operators, but a backdoor baked into the weights themselves through training-time poisoning leaves no suspicious opcode to find, so signatures and hashes prove provenance without proving the model is benign. Detecting behavioral backdoors remains an open research problem, and red-team evaluation of model outputs is currently the best available mitigation rather than a guarantee. Treat the framework here as raising attacker cost, not as eliminating risk.
Related Reading:
Further Resources:
In conclusion, defending against AI supply chain attacks requires extending traditional security practices to cover model provenance, weight scanning, and training data integrity. Therefore, implement model verification and safe serialization formats across your entire AI pipeline, automate those checks as build gates, and pair them with red-team evaluation so that both code-execution and behavioral threats are addressed in depth.