Agent skill supply-chain testing¶
Use this workflow during authorized assessments of AI-agent platforms, internal skill registries, curated skill bundles, and marketplace scanner pipelines. It turns the Trail of Bits June 2026 skill-distribution research into a reusable validation path without publishing ready-to-run malicious payloads.
Operator value¶
Agent skills are a hybrid supply-chain artifact: natural-language instructions, helper scripts, packaged data, and sometimes compiled or archived content. Scanner pass/fail output is not enough proof that a skill is safe or that a marketplace has meaningful review coverage.
Durable test targets:
- Marketplace or registry upload review.
- Skill ZIP import APIs.
- Git-repository based skill ingestion.
- Agent harnesses that execute helper scripts referenced from
SKILL.md. - Scanner integrations that combine static patterns, LLM review, package policy, or third-party verdicts.
Inputs¶
Collect these before testing:
- Written authorization for skill upload/import tests.
- A disposable test organization, workspace, and agent identity.
- The skill packaging rules: allowed file extensions, size limits, symlink handling, archive nesting, and executable-bit handling.
- The scanner decision model: block/warn/allow, where verdicts appear, and whether maintainers can override them.
- Network and filesystem egress controls for the agent runtime.
Containment
Use inert canary strings and local-only callbacks in lab infrastructure. Do not include credential theft, destructive commands, persistence, or external exfiltration in marketplace submissions.
Test matrix¶
| Boundary | What to vary | Evidence to capture |
|---|---|---|
| Prompt review | Long padding before risky instructions; instructions hidden in non-obvious sections; conflicting benign summaries | Scanner transcript, truncation indicators, final verdict |
| File-type policy | Plain text, Markdown, JSON/YAML config, office documents, bytecode, archives, nested archives | Accepted/rejected file list and unpacked artifact tree |
| Helper execution | SKILL.md references to scripts, shell snippets, package managers, interpreters, or generated commands | Agent plan, command preview, execution log |
| Indirection | Instructions stored outside SKILL.md, referenced by filename, config key, or generated task | Whether scanner follows the reference chain |
| Binary/compiled content | Precompiled bytecode, minified bundles, wasm, native extensions, or generated lockfiles | Whether scanner decompiles, hashes, or ignores content |
| UI review | Very large files, long scroll regions, collapsed sections, misleading filenames | Reviewer-visible screenshot and raw source diff |
| Verdict consistency | Re-submit equivalent artifacts with renamed files, reordered content, or harmless formatting changes | Per-submission verdict deltas |
Workflow¶
- Mirror the ingestion path. Test the exact route real users use: direct ZIP upload, Git repository import, marketplace submission, or internal registry publish.
- Start with a benign control skill. Confirm that a minimal skill imports, scans, and runs as expected. Save the raw artifact, scanner output, and agent execution trace.
- Map package-policy enforcement. Submit harmless artifacts that vary extension, size, nesting, and executable bits. The goal is to learn what reaches the scanner and what reaches the agent runtime.
- Probe scanner visibility. Place a unique inert canary in each location (
SKILL.md, helper script, config file, document XML, compiled artifact metadata) and verify which canaries appear in scanner findings or reviewer UI. - Probe indirection handling. Reference secondary files from
SKILL.mdand record whether scanners follow the chain or only review top-level Markdown. - Probe truncation and parser gaps. Use large but harmless padding, deeply nested structures, or verbose generated files to determine whether scanner summaries omit tail content.
- Exercise runtime boundaries in a sandbox. If execution is permitted, make helper scripts print a local canary and current working directory only. Capture whether the agent asks for approval, previews commands, or runs automatically.
- Report scanner bypasses as review-coverage failures. The finding is stronger when it shows a scanner verdict mismatch: the raw artifact contains a clearly labeled inert policy violation, but the platform marks it safe or hides the relevant content from reviewers.
Safe canary patterns¶
Prefer canaries that prove reachability without collecting secrets:
SKILLZ_CANARY_DO_NOT_EXECUTE_<case-id>
SKILLZ_POLICY_VIOLATION_MARKER_<case-id>
SKILLZ_LOCAL_RUNTIME_MARKER_<case-id>
For runtime tests, use local-only commands such as printing the marker, interpreter version, and working directory. Avoid environment dumps, token paths, outbound HTTP, shell reverse connections, or destructive filesystem writes.
Reporting checklist¶
Include:
- Artifact hash and submission timestamp.
- Upload/import path and scanner name/version if visible.
- Allowed file tree after platform unpacking or normalization.
- Exact verdict text and screenshots of reviewer UI.
- Canary placement map and whether each canary appeared in scanner output.
- Runtime evidence limited to benign local markers.
- Clear impact statement: users can install or run a skill containing scanner-invisible instructions or code-like content.
Sources¶
- Trail of Bits, "The sorry state of skill distribution" (June 3, 2026): https://blog.trailofbits.com/2026/06/03/the-sorry-state-of-skill-distribution/
- Trail of Bits Blog RSS: https://blog.trailofbits.com/feed/