Claude
Skills
Sign in
Back

testing-end-user

Included with Lifetime
$97 forever

This skill MUST be invoked when the user says "TEST:", "TEST:VERIFY", "TEST:CONTRACT", "execute verification", "run test task", or "verification test". SHOULD also invoke when the user says "Setup/Action/Assert", "verify against infrastructure", or "capture evidence".

Productivity

What this skill does


# End-User Verification Testing

## Overview

Execute verification tasks that validate real infrastructure behavior through structured Setup/Action/Assert sequences. Classify tasks at runtime (CLI/GUI/SUBJECTIVE) to determine whether to auto-approve or present human checkpoints. This skill transforms tasks marked with `**TEST:**` into executable verification sequences with captured evidence.

**Violating the letter of the rules is violating the spirit of the rules.**

Verification testing exists to catch failures before they reach production. Every shortcut in this process is a potential production incident waiting to happen.

## When to Use

- Tasks marked with `**TEST:**`, `**TEST:VERIFY**`, or `**TEST:CONTRACT**`
- Legacy `**HUMAN VERIFICATION**` tasks (mapped to unified format)
- CLI command verification with expected output
- File system state validation
- Real process behavior testing (not mocks)
- GUI/UI verification requiring human judgment
- End-to-end validation before deployment
- Quality gate execution (lint, build, test suite) as part of cycle verification

## When NOT to Use

- Unit tests that run in isolation
- Mock-based testing without infrastructure
- Static code analysis tasks
- Documentation review tasks
- Tasks without clear pass/fail criteria
- When verification environment is unavailable

## Core Process

### Task Detection

Identify tasks containing verification markers:

```markdown
- [ ] **TN.X**: **TEST:** - {Description}
  - **Setup**: {Prerequisites} (optional)
  - **Action**: {Command or instruction}
  - **Assert**: {Expected outcome}
  - **Capture**: {console, screenshot, logs} (optional)
```

**Supported markers** (all normalized to unified format):
- `**TEST:**` - Unified format (preferred)
- `**TEST:VERIFY**` - Legacy format
- `**TEST:CONTRACT**` - Legacy format
- `**HUMAN VERIFICATION**` - Legacy format (maps Setup/Action/Verify fields)

See [references/TASK-PARSING.md](references/TASK-PARSING.md) for field marker extraction rules.

### Execution Sequence

Execute in strict order. No skipping steps. No reordering.

**1. Parse Task**

Extract structured data:
- Task ID (T{N}.{X})
- Test type (VERIFY or CONTRACT)
- Setup commands
- Actions with modifiers
- Assert conditions
- Capture requirements
- Human review criteria

**2. Execute Setup**

Run setup commands sequentially. Fail fast if any setup fails. Record all output for debugging. Setup failures block action execution.

**3. Execute Actions**

Run each action respecting modifiers:

| Modifier | Example | Behavior |
|----------|---------|----------|
| `(background)` | `npm start (background)` | Run async, track PID |
| `(timeout Ns)` | `curl ... (timeout 10s)` | Override 60s default |
| `(in {path})` | `make build (in ./backend)` | Change directory |

Capture all console output. Track background processes. Enforce timeouts. See [references/EVIDENCE-CAPTURE.md](references/EVIDENCE-CAPTURE.md) for capture details.

**4. Evaluate Asserts**

Check each assert against captured evidence:

| Pattern | Verification |
|---------|--------------|
| `Console contains "{text}"` | Substring match |
| `Console contains "{text}" (within Ns)` | Timed match |
| `File exists: {path}` | `test -f {path}` |
| `Response status: {code}` | HTTP status check |

**5. Generate Report**

- **All PASS**: Minimal report (status + duration)
- **Any FAIL**: Rich report with evidence table

See [references/REPORT-TEMPLATES.md](references/REPORT-TEMPLATES.md) for templates.

**6. Present Checkpoint**

Ask human to approve, reject, or retry. Human decision gates cycle completion. No proceeding without explicit human approval.

### Task Classification

Before execution, classify the task based on Action and Assert content:

| Classification | Criteria | Checkpoint Behavior |
|----------------|----------|---------------------|
| **CLI** | Backtick commands + measurable asserts | May auto-approve if 100% pass |
| **GUI** | UI actions (`click`, `tap`) or screenshot capture | Always human checkpoint |
| **SUBJECTIVE** | Qualitative terms (`looks`, `feels`, `appears`) | Always human checkpoint |

Default to SUBJECTIVE if uncertain (safe fallback).

### Result Classification

| Status | Meaning |
|--------|---------|
| `PASS` | All asserts passed |
| `FAIL` | One or more asserts failed |
| `PARTIAL` | Mixed results, needs judgment |
| `TIMEOUT` | Action exceeded time limit |
| `ERROR` | Execution error (not assertion) |

### Evidence Types

| Type | Capture Method |
|------|----------------|
| `console` | stdout/stderr from commands |
| `screenshot` | Platform-specific screen capture |
| `logs` | Contents of specified log files |
| `timing` | Duration of each action |

## Quality Gates

Before presenting checkpoint, verify completion of ALL items:

- [ ] All setup commands completed
- [ ] All actions executed (or timed out)
- [ ] All asserts evaluated
- [ ] Evidence captured per Capture field
- [ ] Report generated with proper detail level

No presenting partial results. No skipping evidence capture. No proceeding without human approval.

## Quality Gate Execution

When dispatched as part of an implementation cycle verification, execute quality gates alongside TEST: task verification. Quality gates are command-based checks that always auto-resolve.

### Quality Gate Sequence

1. **Identify quality gate commands** from `tasks.md` `## Quality Gates` section and/or `plan.md` build configuration
2. **Execute each command** sequentially (lint, build, tests)
3. **Record results** with exit code, stdout, and stderr
4. **Classify**: exit 0 = pass, non-zero = fail
5. **Include in verification report** under `quality_gates` frontmatter section

### Quality Gate Report Format

Add a `quality_gates` section to the verification-report YAML frontmatter:

```yaml
verification:
  test_tasks:
    total: 2
    passed: 2
  quality_gates:
    lint:
      status: pass
      command: "pnpm lint"
    build:
      status: pass
      command: "pnpm build"
    tests:
      status: pass
      command: "pnpm test"
      passed: 47
      failed: 0
      skipped: 2
```

Each quality gate entry includes the command run and its status. For test suites, include pass/fail/skip counts when available.

### Quality Gate Auto-Resolution

Quality gates always auto-resolve. No human checkpoint is needed for "did lint pass?" decisions:

- **Exit 0** = pass. Record and continue.
- **Non-zero exit** = fail. Record the failure output. Include in verification report.
- **No human checkpoint** for quality gate results — they are deterministic.

Quality gate failures are surfaced through the verification report to the cycle-checkpoint gate, which evaluates them deterministically.

**No exceptions:**
- Not for "simple tests that obviously pass"
- Not if the user seems impatient
- Not if evidence capture is slow
- Not even if setup was identical to previous run
- Not even if "just checking one thing"

## Red Flags - STOP and Restart Properly

If any of these thoughts arise, STOP immediately:

- "The test obviously passed, no need for full evidence capture"
- "I already know this works from previous runs"
- "Just a quick verification, minimal report is fine"
- "The user seems impatient, skip to the result"
- "This is a simple test, full process is overkill"
- "Evidence capture is taking too long"
- "I can infer the result without running the test"
- "The setup is the same as last time"

**All of these mean:** Rationalization in progress. Return to the execution sequence. Follow every step.

## Common Rationalizations

| Excuse | Reality |
|--------|---------|
| "Test obviously passed" | Obvious passes hide subtle failures. Capture evidence anyway. |
| "Already ran this before" | Previous runs are stale. Each execution is independent. Run again. |
| "User wants quick answer" | Quick answers without evidence are unreliable. Process protects user. |
| "Simple test case" | Simple tests catch complex bugs. Full process regardless of simplicity. |
| "Eviden

Related in Productivity