Claude
Skills
Sign in
โ† Back

kill-argument

Included with Lifetime
$97 forever

Two-thread adversarial review: a fresh reviewer constructs the strongest 200-word rejection memo, then a second fresh reviewer defends the paper point-by-point and surfaces still-unresolved critical issues. Use when user says "kill argument", "adversarial review", "hostile review", "rebuttal preparation", "reviewer-2 simulation", or before submitting a theory paper that has already passed standard review rounds.

Productivity

What this skill does


# Kill Argument Exercise: Adversarial Attack-Defense Review

> ๐Ÿ”’ **Do not wrap this skill in `/loop`, `/schedule`, or `CronCreate`.** It is
> verdict-bearing โ€” it produces an adversarial accept/reject verdict (attack โ†’
> adjudication). Re-firing it on a wall-clock timer adds no new signal (the
> attack changes only when the *paper* changes). Schedule the *external wait
> that precedes it* โ€” draft stable โ†’ then run this **once** before submission.
> See
> [`shared-references/external-cadence.md`](../shared-references/external-cadence.md).

Stress-test the headline claims of a paper against the strongest possible rejection argument: **$ARGUMENTS**

## Why This Exists

Standard score-based reviews (`/research-review`, `/auto-paper-improvement-loop`) tend to produce **balanced** weakness lists.  Each weakness gets ~equal attention, ranked CRITICAL > MAJOR > MINOR.  Empirically, this misses one specific failure mode: the **single most damaging argument** a reviewer would write in a rejection paragraph โ€” the one sentence that, if a senior area chair reads it, kills the paper.

A balanced reviewer might list "scope-overclaim risk" as MAJOR alongside 3-5 other MAJORs, never quite committing.  An adversarial reviewer **must commit**: their entire job is to convince the area chair to reject in 200 words.

This skill runs that adversarial pass deliberately, then forces a second fresh reviewer to defend point-by-point, classify each rejection as already-fixed / partially-fixed / still-unresolved, and surface what's actually load-bearing.

**Empirical motivation:** in a real submission run, after several rounds of standard improvement (score 7-8/10), the kill-argument exercise surfaced framing weaknesses that no prior review caught (e.g., a setting being mostly conditional rather than truly general, or a baseline being irrelevant to real systems).  Author rebuttal forced explicit scope qualifications in abstract and discussion that weren't visible from the score-based reviews alone.

## How This Differs From Other Review Skills

| Skill | What it asks the reviewer | Output |
|-------|---------------------------|--------|
| Standard peer review | "Score this paper, list weaknesses by severity" | balanced weakness list |
| `/research-review` | "Deep technical review of methods + claims" | structured deep critique |
| `/proof-checker` | "Is this theorem actually proved?" | per-step proof obligation audit |
| `/paper-claim-audit` | "Does the paper report numbers truthfully?" | per-claim evidence verification |
| `/citation-audit` | "Are citations real and used in correct context?" | per-entry KEEP/FIX/REPLACE/REMOVE |
| **`/kill-argument`** | **"Write the single strongest rejection paragraph; then defend it."** | **attack memo + per-point defense + unresolved surfaced** |

This skill is **complementary**, not a replacement.  Run after standard reviews when you want to know what the worst-case reviewer paragraph would look like, before camera-ready or rebuttal preparation.

## When To Use

- After 1-2 rounds of `/auto-paper-improvement-loop` settled at a stable score, but before submission.  Surfaces what additional fixes would close the headline-attack gap.
- During rebuttal preparation, to predict reviewer-2's strongest objection so you can prepare the response in advance.
- For theory papers with a high-level title that may oversimplify the actual theorem (the most common reject-attack pattern).
- For papers where a reviewer might attack scope, assumption-vs-claim mismatch, missing proof obligations, or evidence-vs-headline gaps.

This skill is most valuable for **theory papers** with โ‰ฅ5 theorem-class environments (so the headline depends on real proof obligations).  For empirical papers without theorems, use `/research-review` instead.

## Constants

- **REVIEWER_MODEL** = `gpt-5.5` (default; specify `gpt-5.4` if you want to fall back to the legacy default).  Reviewer reasoning effort = `xhigh`.
- **CONTEXT_POLICY** = `fresh` (REVIEWER_BIAS_GUARD).  Each thread is a fresh `mcp__codex__codex` call.  **Never** use `mcp__codex__codex-reply`.  No prior review summary, fix list, or executor explanation enters either prompt.
- **ATTACK_LENGTH** = approximately 200 words (do not exceed 250).  Single coherent argument, not a list.
- **DEFENSE_DECOMPOSITION** = 3-7 atomic rejection points extracted from the attack memo.  Each gets its own classification.
- **CLASSIFICATION** = `answered_by_current_text` / `partially_answered` / `still_unresolved`.  (Names chosen so the adjudicator does not assume "fixed" implies prior history of patching โ€” they read the paper as a fresh reviewer would.)
- **OUTPUT** = `KILL_ARGUMENT.md` (human-readable) + `KILL_ARGUMENT.json` (machine-readable) in the paper directory.
- **RENDER_HTML = true** โ€” When `true` (default), auto-render `KILL_ARGUMENT.md` to HTML after writing the report. Uses **full Codex review gate** (audit-class artifact โ€” full render-fidelity check matches the skill's cross-model audit invariant; the sidecar `KILL_ARGUMENT.json` is also passed to the renderer). Set `false` to skip, or pass `โ€” render html: false`.

## Workflow

### Step 1: Discover paper files

Locate the paper directory and inventory the source.

```bash
PAPER_DIR="$ARGUMENTS"   # e.g., paper-overleaf/ or paper/
cd "$PAPER_DIR"

# Find the LaTeX entry point
ENTRY=$(grep -lE '^\\documentclass' *.tex 2>/dev/null | head -1)
echo "Entry: $ENTRY"

# Find all source files codex should read
find . -name "*.tex" -not -path "./.git/*" 2>/dev/null
find . -name "*.bib" -not -path "./.git/*" 2>/dev/null
find figures/ -name "*.pdf" -o -name "*.png" 2>/dev/null
ls -la *.pdf 2>/dev/null  # compiled PDF
```

If a compiled PDF is missing, the skill should still run on .tex source alone, but the prompt should mention this so the reviewer doesn't waste cycles trying to extract from a non-existent PDF.

### Step 2: Attack memo (Thread 1, fresh codex)

Invoke `mcp__codex__codex` (NOT `codex-reply`) with the following prompt structure:

```
mcp__codex__codex:
  model: gpt-5.5
  config: {"model_reasoning_effort": "xhigh"}
  sandbox: read-only
  cwd: <paper directory>
  prompt: |
    You are simulating a hostile NeurIPS / ICLR / ICML reviewer for a paper.
    This is a kill-argument adversarial check โ€” your task is NOT to give a
    balanced review but to construct the **single strongest argument for
    rejecting this paper**.

    ## Files to read
    - LaTeX entry: <ENTRY>
    - All section files under sections/ or wherever they live
    - Macro files (math_commands.tex, etc.)
    - Compiled PDF: <main.pdf> (if available)

    Read the source carefully. Do not consult any prior reviews, fix lists,
    or summaries; this must be a fresh, zero-context adversarial pass.

    ## Your task
    Construct the single best argument to reject this paper in approximately
    200 words. Your goal is to write the worst-case rejection memo a senior
    NeurIPS area chair would produce after reading the paper.

    Focus on these axes (pick the most damaging combination, do not list all):
    1. Theorem validity: are central theorems actually proved as stated?
    2. Assumption-vs-claim mismatch: does the body silently retreat to a
       narrower object than the title/abstract advertise?
    3. Missing proof obligations: is a fundamental lemma invoked but not
       proved (e.g., concentration, generic position, prefactor envelope)
       that the headline depends on?
    4. Limit-order ambiguity: are limits in K/n/d/eps composed in a way the
       paper does not commit to?
    5. Claim-vs-evidence gap: is the empirical/numerical evidence too narrow
       to support the breadth of the stated theorem or take-away?
    6. Scope overclaim: does the title or abstract sell a result substantially
       broader than what the body proves?

    ## Constraints
    - Approximately 200 words total (do NOT exceed 250).
    - Single argument, not a list โ€” pick the most damaging line of attack
      a

Related in Productivity