testing-end-user
This skill MUST be invoked when the user says "TEST:", "TEST:VERIFY", "TEST:CONTRACT", "execute verification", "run test task", or "verification test". SHOULD also invoke when the user says "Setup/Action/Assert", "verify against infrastructure", or "capture evidence".
What this skill does
# End-User Verification Testing
## Overview
Execute verification tasks that validate real infrastructure behavior through structured Setup/Action/Assert sequences. Classify tasks at runtime (CLI/GUI/SUBJECTIVE) to determine whether to auto-approve or present human checkpoints. This skill transforms tasks marked with `**TEST:**` into executable verification sequences with captured evidence.
**Violating the letter of the rules is violating the spirit of the rules.**
Verification testing exists to catch failures before they reach production. Every shortcut in this process is a potential production incident waiting to happen.
## When to Use
- Tasks marked with `**TEST:**`, `**TEST:VERIFY**`, or `**TEST:CONTRACT**`
- Legacy `**HUMAN VERIFICATION**` tasks (mapped to unified format)
- CLI command verification with expected output
- File system state validation
- Real process behavior testing (not mocks)
- GUI/UI verification requiring human judgment
- End-to-end validation before deployment
- Quality gate execution (lint, build, test suite) as part of cycle verification
## When NOT to Use
- Unit tests that run in isolation
- Mock-based testing without infrastructure
- Static code analysis tasks
- Documentation review tasks
- Tasks without clear pass/fail criteria
- When verification environment is unavailable
## Core Process
### Task Detection
Identify tasks containing verification markers:
```markdown
- [ ] **TN.X**: **TEST:** - {Description}
- **Setup**: {Prerequisites} (optional)
- **Action**: {Command or instruction}
- **Assert**: {Expected outcome}
- **Capture**: {console, screenshot, logs} (optional)
```
**Supported markers** (all normalized to unified format):
- `**TEST:**` - Unified format (preferred)
- `**TEST:VERIFY**` - Legacy format
- `**TEST:CONTRACT**` - Legacy format
- `**HUMAN VERIFICATION**` - Legacy format (maps Setup/Action/Verify fields)
See [references/TASK-PARSING.md](references/TASK-PARSING.md) for field marker extraction rules.
### Execution Sequence
Execute in strict order. No skipping steps. No reordering.
**1. Parse Task**
Extract structured data:
- Task ID (T{N}.{X})
- Test type (VERIFY or CONTRACT)
- Setup commands
- Actions with modifiers
- Assert conditions
- Capture requirements
- Human review criteria
**2. Execute Setup**
Run setup commands sequentially. Fail fast if any setup fails. Record all output for debugging. Setup failures block action execution.
**3. Execute Actions**
Run each action respecting modifiers:
| Modifier | Example | Behavior |
|----------|---------|----------|
| `(background)` | `npm start (background)` | Run async, track PID |
| `(timeout Ns)` | `curl ... (timeout 10s)` | Override 60s default |
| `(in {path})` | `make build (in ./backend)` | Change directory |
Capture all console output. Track background processes. Enforce timeouts. See [references/EVIDENCE-CAPTURE.md](references/EVIDENCE-CAPTURE.md) for capture details.
**4. Evaluate Asserts**
Check each assert against captured evidence:
| Pattern | Verification |
|---------|--------------|
| `Console contains "{text}"` | Substring match |
| `Console contains "{text}" (within Ns)` | Timed match |
| `File exists: {path}` | `test -f {path}` |
| `Response status: {code}` | HTTP status check |
**5. Generate Report**
- **All PASS**: Minimal report (status + duration)
- **Any FAIL**: Rich report with evidence table
See [references/REPORT-TEMPLATES.md](references/REPORT-TEMPLATES.md) for templates.
**6. Present Checkpoint**
Ask human to approve, reject, or retry. Human decision gates cycle completion. No proceeding without explicit human approval.
### Task Classification
Before execution, classify the task based on Action and Assert content:
| Classification | Criteria | Checkpoint Behavior |
|----------------|----------|---------------------|
| **CLI** | Backtick commands + measurable asserts | May auto-approve if 100% pass |
| **GUI** | UI actions (`click`, `tap`) or screenshot capture | Always human checkpoint |
| **SUBJECTIVE** | Qualitative terms (`looks`, `feels`, `appears`) | Always human checkpoint |
Default to SUBJECTIVE if uncertain (safe fallback).
### Result Classification
| Status | Meaning |
|--------|---------|
| `PASS` | All asserts passed |
| `FAIL` | One or more asserts failed |
| `PARTIAL` | Mixed results, needs judgment |
| `TIMEOUT` | Action exceeded time limit |
| `ERROR` | Execution error (not assertion) |
### Evidence Types
| Type | Capture Method |
|------|----------------|
| `console` | stdout/stderr from commands |
| `screenshot` | Platform-specific screen capture |
| `logs` | Contents of specified log files |
| `timing` | Duration of each action |
## Quality Gates
Before presenting checkpoint, verify completion of ALL items:
- [ ] All setup commands completed
- [ ] All actions executed (or timed out)
- [ ] All asserts evaluated
- [ ] Evidence captured per Capture field
- [ ] Report generated with proper detail level
No presenting partial results. No skipping evidence capture. No proceeding without human approval.
## Quality Gate Execution
When dispatched as part of an implementation cycle verification, execute quality gates alongside TEST: task verification. Quality gates are command-based checks that always auto-resolve.
### Quality Gate Sequence
1. **Identify quality gate commands** from `tasks.md` `## Quality Gates` section and/or `plan.md` build configuration
2. **Execute each command** sequentially (lint, build, tests)
3. **Record results** with exit code, stdout, and stderr
4. **Classify**: exit 0 = pass, non-zero = fail
5. **Include in verification report** under `quality_gates` frontmatter section
### Quality Gate Report Format
Add a `quality_gates` section to the verification-report YAML frontmatter:
```yaml
verification:
test_tasks:
total: 2
passed: 2
quality_gates:
lint:
status: pass
command: "pnpm lint"
build:
status: pass
command: "pnpm build"
tests:
status: pass
command: "pnpm test"
passed: 47
failed: 0
skipped: 2
```
Each quality gate entry includes the command run and its status. For test suites, include pass/fail/skip counts when available.
### Quality Gate Auto-Resolution
Quality gates always auto-resolve. No human checkpoint is needed for "did lint pass?" decisions:
- **Exit 0** = pass. Record and continue.
- **Non-zero exit** = fail. Record the failure output. Include in verification report.
- **No human checkpoint** for quality gate results — they are deterministic.
Quality gate failures are surfaced through the verification report to the cycle-checkpoint gate, which evaluates them deterministically.
**No exceptions:**
- Not for "simple tests that obviously pass"
- Not if the user seems impatient
- Not if evidence capture is slow
- Not even if setup was identical to previous run
- Not even if "just checking one thing"
## Red Flags - STOP and Restart Properly
If any of these thoughts arise, STOP immediately:
- "The test obviously passed, no need for full evidence capture"
- "I already know this works from previous runs"
- "Just a quick verification, minimal report is fine"
- "The user seems impatient, skip to the result"
- "This is a simple test, full process is overkill"
- "Evidence capture is taking too long"
- "I can infer the result without running the test"
- "The setup is the same as last time"
**All of these mean:** Rationalization in progress. Return to the execution sequence. Follow every step.
## Common Rationalizations
| Excuse | Reality |
|--------|---------|
| "Test obviously passed" | Obvious passes hide subtle failures. Capture evidence anyway. |
| "Already ran this before" | Previous runs are stale. Each execution is independent. Run again. |
| "User wants quick answer" | Quick answers without evidence are unreliable. Process protects user. |
| "Simple test case" | Simple tests catch complex bugs. Full process regardless of simplicity. |
| "EvidenRelated in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard — shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.