Claude
Skills
Sign in
Back

quality-detect-regressions

Included with Lifetime
$97 forever

Compares current quality metrics (tests, coverage, type errors, linting, dead code) to baseline and detects regressions. PROACTIVELY invoked after task completion, before marking task complete, or before merging/committing. Blocks on regressions to prevent quality degradation. Use when completing tasks, validating changes, or checking for quality regressions against baselines.

Productivity

What this skill does


# detect-quality-regressions

## Purpose

Compare current quality metrics against a stored baseline to detect regressions in tests, coverage, type errors, linting, and dead code. Enforces quality standards by blocking task completion when metrics degrade beyond tolerance thresholds.

## When to Use

**MANDATORY invocation scenarios:**
- After completing a task (before marking complete)
- Before creating a commit or pull request
- Before merging to main branch
- When user asks "is this done?" or "check quality"

**User trigger phrases:**
- "detect regressions"
- "check against baseline"
- "validate quality hasn't degraded"
- "compare to baseline"

## Quick Start

**Basic usage after completing a task:**

```
1. Complete your code changes
2. Run tests manually to verify they pass
3. Invoke this skill: "Detect regressions against baseline_feature_2025-10-16"
4. If PASS: Mark task complete
5. If FAIL: Fix regressions, re-run detection
```

**This skill automatically:**
- Loads baseline from memory
- Runs ./scripts/check_all.sh
- Compares 5 metrics (tests, coverage, type errors, linting, dead code)
- Detects regressions with tolerance rules
- Returns PASS/FAIL with actionable delta report

## Table of Contents

### Core Sections
- [Instructions](#instructions) - Complete workflow for regression detection
  - [Step 1: Load Baseline from Memory](#step-1-load-baseline-from-memory) - Retrieve and validate baseline metrics
  - [Step 2: Run Current Quality Checks](#step-2-run-current-quality-checks) - Execute check_all.sh and capture metrics
  - [Step 3: Compare Metrics (Regression Detection)](#step-3-compare-metrics-regression-detection) - Apply comparison rules
  - [Step 4: Generate Delta Report](#step-4-generate-delta-report) - Create metric comparison report
  - [Step 5: Return Result](#step-5-return-result) - PASS/FAIL decision logic
- [When to Invoke](#when-to-invoke) - Triggering conditions (after tasks, before commits, status checks)
- [Examples](#examples) - Real-world scenarios
  - [Example 1: No Regressions (PASS)](#example-1-no-regressions-pass) - Successful validation scenario
  - [Example 2: Regression Detected (FAIL)](#example-2-regression-detected-fail) - Handling quality degradation
- [Edge Cases](#edge-cases) - Special situations (missing baseline, check failures, pre-existing issues)

### Advanced Topics
- [Integration Points](#integration-points) - Coordination with other skills and agents
- [Anti-Patterns to Avoid](#anti-patterns-to-avoid) - Common mistakes and correct approaches
- [Success Criteria](#success-criteria) - Validation checklist
- [Supporting Files](#supporting-files) - References and examples
- [Requirements](#requirements) - Environment, memory schema, tools

## Instructions

### Step 1: Load Baseline from Memory

**Query memory for baseline:**

Use `mcp__memory__find_memories_by_name` to retrieve the baseline:

```python
baseline_names = ["baseline_<feature>_<date>"]
# Example: ["baseline_auth_2025-10-16"]
```

**Validate baseline exists:**
- If not found: Return `⚠️ WARNING - No baseline found, suggest capturing baseline first`
- If found: Parse baseline metrics

**Extract baseline metrics:**

Parse the baseline entity's observations to extract:

1. **Tests:** `X passed, Y failed, Z skipped`
2. **Coverage:** `X%`
3. **Type errors:** `X errors`
4. **Linting errors:** `X errors`
5. **Dead code:** `X%`

**Example baseline observations:**
```
- Tests: 145 passed, 0 failed, 3 skipped
- Coverage: 87%
- Type errors: 0
- Linting errors: 0
- Dead code: 1.2%
```

### Step 2: Run Current Quality Checks

**Execute quality checks:**

```bash
cd /Users/dawiddutoit/projects/play/project-watch-mcp
./scripts/check_all.sh
```

**Capture output:**
- Save stdout and stderr
- Parse same 5 metrics as baseline
- Handle script failures (return FAIL if checks can't run)

**Parse current metrics:**

Extract from check_all.sh output:

1. **Tests:** Look for "X passed" in pytest output
2. **Coverage:** Look for "TOTAL" line with percentage
3. **Type errors:** Count errors in pyright output
4. **Linting errors:** Count violations in ruff output
5. **Dead code:** Parse vulture output for percentage

### Step 3: Compare Metrics (Regression Detection)

**Apply comparison rules:**

| Metric        | Rule                              | Tolerance | Regression If             |
|---------------|-----------------------------------|-----------|---------------------------|
| Tests passed  | Must be >= baseline               | None      | current < baseline        |
| Coverage      | Must be >= baseline - 1%          | 1%        | current < baseline - 1%   |
| Type errors   | Must be <= baseline               | None      | current > baseline        |
| Linting       | Must be <= baseline               | None      | current > baseline        |
| Dead code     | Must be <= baseline + 2%          | 2%        | current > baseline + 2%   |

**For each metric:**

1. Calculate change: `current - baseline`
2. Check if regression: Apply rule from table
3. Mark status: `improved`, `stable`, or `regressed`
4. Calculate severity: `critical`, `high`, `medium`, `low`

**Regression severity:**

- **Critical:** Type errors increased (breaks type safety)
- **High:** Tests decreased or coverage dropped >2%
- **Medium:** Linting errors increased
- **Low:** Dead code increased slightly (within tolerance)

### Step 4: Generate Delta Report

**Create comparison for each metric:**

```yaml
metric_name:
  baseline: <value>
  current: <value>
  change: <+/- difference>
  status: improved | stable | regressed
  severity: critical | high | medium | low (if regressed)
```

**Identify regressions:**

Filter metrics where `status == regressed` and create regression list:

```yaml
regressions:
  - metric: tests
    baseline: 152
    current: 150
    change: -2
    severity: high
    action: "Investigate test_user_service.py, test_auth_service.py"
```

**Identify improvements:**

Filter metrics where `status == improved` for positive feedback.

### Step 5: Return Result

**Decision logic:**

```
IF any metric has status == regressed:
  RETURN FAIL with regression list
ELSE:
  RETURN PASS with improvements
```

**PASS result format:**

```
✅ PASS - No regressions detected

Delta Report:
- Tests: +3 passed (148 total) 🎉
- Coverage: +2% (89% total) 🎉
- Type errors: No change (0) ✅
- Linting: No change (0) ✅
- Dead code: -0.1% (1.1% total) 🎉

All metrics maintained or improved. Safe to mark task complete.
```

**FAIL result format:**

```
🔴 FAIL - 4 regressions detected

Regressions:
1. Tests: -2 passed (150 vs 152)
   → 2 tests removed or now failing
   → Action: Investigate test_user_service.py, test_auth_service.py

2. Coverage: -4% (85% vs 89%)
   → Coverage dropped below tolerance (88%)
   → Action: Add tests for newly refactored code

3. Type errors: +2 new errors (5 vs 3)
   → New type errors introduced
   → Action: Run pyright --verbose, fix errors

4. Linting: +2 new errors
   → New linting violations
   → Action: Run ruff check --fix, review changes

❌ BLOCKED - Do not mark task complete until regressions fixed.

Fix order:
1. Fix linting (ruff check --fix)
2. Fix type errors (pyright)
3. Re-run tests (investigate failures)
4. Add coverage for new code
5. Re-run regression detection
```

## When to Invoke

**After completing a task (@implementer, @unit-tester, @integration-tester):**
1. Code changes complete
2. Tests written and pass locally
3. **→ Invoke detect-quality-regressions before marking task complete**
4. If PASS: Mark complete, move to next task
5. If FAIL: Fix regressions, re-run detection

**Before committing/merging:**
1. User requests commit
2. **→ Invoke detect-quality-regressions to validate**
3. If PASS: Proceed with commit
4. If FAIL: Block commit, report regressions

**When checking status (@statuser):**
1. User asks "What's the status?"
2. **→ Invoke detect-quality-regressions to get current quality state**
3. Report status with delta

## Examples

### Example 1

Related in Productivity