kill-argument
Two-thread adversarial review: a fresh reviewer constructs the strongest 200-word rejection memo, then a second fresh reviewer defends the paper point-by-point and surfaces still-unresolved critical issues. Use when user says "kill argument", "adversarial review", "hostile review", "rebuttal preparation", "reviewer-2 simulation", or before submitting a theory paper that has already passed standard review rounds.
What this skill does
# Kill Argument Exercise: Adversarial Attack-Defense Review
> ๐ **Do not wrap this skill in `/loop`, `/schedule`, or `CronCreate`.** It is
> verdict-bearing โ it produces an adversarial accept/reject verdict (attack โ
> adjudication). Re-firing it on a wall-clock timer adds no new signal (the
> attack changes only when the *paper* changes). Schedule the *external wait
> that precedes it* โ draft stable โ then run this **once** before submission.
> See
> [`shared-references/external-cadence.md`](../shared-references/external-cadence.md).
Stress-test the headline claims of a paper against the strongest possible rejection argument: **$ARGUMENTS**
## Why This Exists
Standard score-based reviews (`/research-review`, `/auto-paper-improvement-loop`) tend to produce **balanced** weakness lists. Each weakness gets ~equal attention, ranked CRITICAL > MAJOR > MINOR. Empirically, this misses one specific failure mode: the **single most damaging argument** a reviewer would write in a rejection paragraph โ the one sentence that, if a senior area chair reads it, kills the paper.
A balanced reviewer might list "scope-overclaim risk" as MAJOR alongside 3-5 other MAJORs, never quite committing. An adversarial reviewer **must commit**: their entire job is to convince the area chair to reject in 200 words.
This skill runs that adversarial pass deliberately, then forces a second fresh reviewer to defend point-by-point, classify each rejection as already-fixed / partially-fixed / still-unresolved, and surface what's actually load-bearing.
**Empirical motivation:** in a real submission run, after several rounds of standard improvement (score 7-8/10), the kill-argument exercise surfaced framing weaknesses that no prior review caught (e.g., a setting being mostly conditional rather than truly general, or a baseline being irrelevant to real systems). Author rebuttal forced explicit scope qualifications in abstract and discussion that weren't visible from the score-based reviews alone.
## How This Differs From Other Review Skills
| Skill | What it asks the reviewer | Output |
|-------|---------------------------|--------|
| Standard peer review | "Score this paper, list weaknesses by severity" | balanced weakness list |
| `/research-review` | "Deep technical review of methods + claims" | structured deep critique |
| `/proof-checker` | "Is this theorem actually proved?" | per-step proof obligation audit |
| `/paper-claim-audit` | "Does the paper report numbers truthfully?" | per-claim evidence verification |
| `/citation-audit` | "Are citations real and used in correct context?" | per-entry KEEP/FIX/REPLACE/REMOVE |
| **`/kill-argument`** | **"Write the single strongest rejection paragraph; then defend it."** | **attack memo + per-point defense + unresolved surfaced** |
This skill is **complementary**, not a replacement. Run after standard reviews when you want to know what the worst-case reviewer paragraph would look like, before camera-ready or rebuttal preparation.
## When To Use
- After 1-2 rounds of `/auto-paper-improvement-loop` settled at a stable score, but before submission. Surfaces what additional fixes would close the headline-attack gap.
- During rebuttal preparation, to predict reviewer-2's strongest objection so you can prepare the response in advance.
- For theory papers with a high-level title that may oversimplify the actual theorem (the most common reject-attack pattern).
- For papers where a reviewer might attack scope, assumption-vs-claim mismatch, missing proof obligations, or evidence-vs-headline gaps.
This skill is most valuable for **theory papers** with โฅ5 theorem-class environments (so the headline depends on real proof obligations). For empirical papers without theorems, use `/research-review` instead.
## Constants
- **REVIEWER_MODEL** = `gpt-5.5` (default; specify `gpt-5.4` if you want to fall back to the legacy default). Reviewer reasoning effort = `xhigh`.
- **CONTEXT_POLICY** = `fresh` (REVIEWER_BIAS_GUARD). Each thread is a fresh `mcp__codex__codex` call. **Never** use `mcp__codex__codex-reply`. No prior review summary, fix list, or executor explanation enters either prompt.
- **ATTACK_LENGTH** = approximately 200 words (do not exceed 250). Single coherent argument, not a list.
- **DEFENSE_DECOMPOSITION** = 3-7 atomic rejection points extracted from the attack memo. Each gets its own classification.
- **CLASSIFICATION** = `answered_by_current_text` / `partially_answered` / `still_unresolved`. (Names chosen so the adjudicator does not assume "fixed" implies prior history of patching โ they read the paper as a fresh reviewer would.)
- **OUTPUT** = `KILL_ARGUMENT.md` (human-readable) + `KILL_ARGUMENT.json` (machine-readable) in the paper directory.
- **RENDER_HTML = true** โ When `true` (default), auto-render `KILL_ARGUMENT.md` to HTML after writing the report. Uses **full Codex review gate** (audit-class artifact โ full render-fidelity check matches the skill's cross-model audit invariant; the sidecar `KILL_ARGUMENT.json` is also passed to the renderer). Set `false` to skip, or pass `โ render html: false`.
## Workflow
### Step 1: Discover paper files
Locate the paper directory and inventory the source.
```bash
PAPER_DIR="$ARGUMENTS" # e.g., paper-overleaf/ or paper/
cd "$PAPER_DIR"
# Find the LaTeX entry point
ENTRY=$(grep -lE '^\\documentclass' *.tex 2>/dev/null | head -1)
echo "Entry: $ENTRY"
# Find all source files codex should read
find . -name "*.tex" -not -path "./.git/*" 2>/dev/null
find . -name "*.bib" -not -path "./.git/*" 2>/dev/null
find figures/ -name "*.pdf" -o -name "*.png" 2>/dev/null
ls -la *.pdf 2>/dev/null # compiled PDF
```
If a compiled PDF is missing, the skill should still run on .tex source alone, but the prompt should mention this so the reviewer doesn't waste cycles trying to extract from a non-existent PDF.
### Step 2: Attack memo (Thread 1, fresh codex)
Invoke `mcp__codex__codex` (NOT `codex-reply`) with the following prompt structure:
```
mcp__codex__codex:
model: gpt-5.5
config: {"model_reasoning_effort": "xhigh"}
sandbox: read-only
cwd: <paper directory>
prompt: |
You are simulating a hostile NeurIPS / ICLR / ICML reviewer for a paper.
This is a kill-argument adversarial check โ your task is NOT to give a
balanced review but to construct the **single strongest argument for
rejecting this paper**.
## Files to read
- LaTeX entry: <ENTRY>
- All section files under sections/ or wherever they live
- Macro files (math_commands.tex, etc.)
- Compiled PDF: <main.pdf> (if available)
Read the source carefully. Do not consult any prior reviews, fix lists,
or summaries; this must be a fresh, zero-context adversarial pass.
## Your task
Construct the single best argument to reject this paper in approximately
200 words. Your goal is to write the worst-case rejection memo a senior
NeurIPS area chair would produce after reading the paper.
Focus on these axes (pick the most damaging combination, do not list all):
1. Theorem validity: are central theorems actually proved as stated?
2. Assumption-vs-claim mismatch: does the body silently retreat to a
narrower object than the title/abstract advertise?
3. Missing proof obligations: is a fundamental lemma invoked but not
proved (e.g., concentration, generic position, prefactor envelope)
that the headline depends on?
4. Limit-order ambiguity: are limits in K/n/d/eps composed in a way the
paper does not commit to?
5. Claim-vs-evidence gap: is the empirical/numerical evidence too narrow
to support the breadth of the stated theorem or take-away?
6. Scope overclaim: does the title or abstract sell a result substantially
broader than what the body proves?
## Constraints
- Approximately 200 words total (do NOT exceed 250).
- Single argument, not a list โ pick the most damaging line of attack
aRelated in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit โ setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard โ shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.