production-model-router
Route each user request to the most cost-effective model or multi-model workflow based on task type, complexity, risk, latency, budget, tool needs, and verification requirements.
What this skill does
# Production Model Router ## Overview Use this skill to decide which model tier, workflow shape, and verification strategy should handle a user's request. The goal is to maximize cost-effectiveness without sacrificing task fit, correctness, or operational reliability. This skill does not blindly choose the strongest model. It chooses the cheapest safe path that still meets the quality bar for the task. It may recommend: - a single low-cost model - a single balanced model - a single premium model - a tool-assisted model workflow - a staged multi-model pipeline - a parallel comparison workflow - a draft-and-review workflow - a consensus or verifier workflow ## Primary objective For every request, choose the minimum-cost execution path that can still satisfy: - task quality - correctness requirements - latency expectations - safety or risk constraints - output format needs - tool and modality requirements ## When to use Use this skill when you need to decide: - which model should answer a given user request - whether a cheap model is enough - when to escalate to a stronger reasoning model - when to use one model versus multiple models - when to use tools instead of relying on pure model reasoning - how to handle complex calculations, code, multimodal input, long context, or high-risk tasks - how to balance cost, speed, and answer quality in production ## Do not use Do not use this skill to: - answer the original business question directly - fabricate model capabilities without evidence from the environment or configuration - assume the most expensive model is always the best choice - route high-risk exact tasks to a cheap model without verification - rely on pure language generation for exact arithmetic when tools are available ## Inputs to collect Collect or infer the following from the request and system context: ### Request characteristics - task type - domain - expected output type - presence of images, files, tables, code, or long documents - need for exactness versus approximate usefulness - whether the request is open-ended or precision-critical ### Execution constraints - budget sensitivity - latency sensitivity - quality expectation - token or context size pressure - tool availability - need for citations or traceability - need for reproducibility ### Risk profile - low-risk - medium-risk - high-risk ### Failure tolerance - whether a rough answer is acceptable - whether the answer must be verified - whether disagreement between models would be valuable ## Task taxonomy Classify the request into one or more of these categories: 1. Simple generation - rewrite - summarization - formatting - light translation - basic brainstorming 2. General reasoning - explanation - comparison - concept mapping - normal business analysis 3. Deep reasoning - multi-step planning - tradeoff analysis - architecture design - ambiguous decision support - chain-dependent reasoning 4. Exact calculation or formal logic - arithmetic - financial calculations - unit conversion - spreadsheet-like reasoning - symbolic or step-sensitive math - combinatorics or logic puzzles where exactness matters 5. Coding and technical execution - code generation - debugging - refactoring - test generation - query writing - infrastructure or API design 6. Long-context synthesis - large documents - multiple files - multi-source comparison - transcript or contract review 7. Multi-modal tasks - image understanding - diagram interpretation - PDF with layout-heavy content - video or audio related tasks if supported 8. High-risk tasks - medical - legal - financial decisions - compliance - security-sensitive operations - anything where incorrect advice has material consequences ## Core routing principle Always prefer the cheapest path that can safely succeed. Apply this order of preference: 1. Cheap single-model path 2. Balanced single-model path 3. Premium single-model path 4. Tool-assisted path 5. Staged multi-model path 6. Parallel multi-model comparison 7. Premium plus verifier or consensus workflow Do not escalate unless the task characteristics justify it. ## Model tiers Use abstract capability tiers unless the deployment specifies exact providers. ### Economy tier Use for: - simple rewriting - formatting - low-risk classification - short summaries - lightweight extraction - first-pass triage Strengths: - lowest cost - fast response - good for straightforward tasks Weaknesses: - weaker deep reasoning - more brittle on ambiguity - worse on exactness-critical tasks ### Balanced tier Use for: - everyday product and engineering work - standard reasoning - moderate code tasks - moderate document analysis - most business and writing tasks Strengths: - solid quality-cost tradeoff - handles most normal production traffic - reasonable speed and robustness Weaknesses: - may still fail on highly ambiguous or exacting tasks - not always enough for hard reasoning or high-risk requests ### Premium tier Use for: - deep reasoning - difficult code and architecture problems - long-context synthesis with subtle dependencies - high-value outputs - high-risk tasks requiring stronger judgment Strengths: - strongest reasoning - better ambiguity handling - better synthesis quality Weaknesses: - highest cost - often slower - overkill for simple tasks ### Tool-assisted tier Use when exactness matters more than fluent wording. Use this path for: - arithmetic - deterministic calculations - spreadsheet operations - formula application - structured data transformation - exact code execution or testing if available - retrieval-backed factual tasks Rule: When a task requires exact numeric correctness, prefer tools plus model orchestration over pure model reasoning. ## Decision dimensions Score the request across these dimensions: ### 1. Complexity - low - medium - high - very high ### 2. Exactness requirement - low: approximate answer is acceptable - medium: mostly correct is acceptable - high: exact result expected - critical: exact result plus verification required ### 3. Risk level - low - medium - high ### 4. Latency priority - urgent - normal - relaxed ### 5. Budget strategy - minimize cost - balanced - quality-first ### 6. Context burden - short - moderate - long - extreme ### 7. Modality burden - text only - image or PDF - mixed inputs ## Hard routing rules Apply these rules before any soft optimization. ### Exact calculation rule If the task involves exact arithmetic, formulas, tables, accounting-like operations, unit-sensitive conversions, or step-sensitive logic: - do not rely on a pure language-only route when tools are available - prefer tool-assisted execution - use a balanced or premium model only to interpret the task and explain results - add a verification step for high-impact numeric outputs ### High-risk rule If the task is high-risk: - do not use economy-only routing as the final path - require either premium single-model reasoning with grounding or a model plus verifier workflow - add citations, checks, or a review pass when possible ### Ambiguity rule If the task is materially ambiguous and the answer quality depends on interpretation: - use a stronger reasoning tier or a two-stage workflow - do not finalize on a cheap first-pass answer without clarification or review ### Long-context rule If the input is large or multi-document: - prefer staged processing - use extraction or chunk summarization first - then use a stronger model for synthesis if needed - avoid sending everything to the strongest model by default if staged reduction is cheaper and safe ### Multimodal rule If the task includes images, diagrams, PDFs with layout dependence, or visual interpretation: - use a model path that actually supports the required modality - do not route to a text-only path ### Coding rule For code tasks: - simple boilerplate or syntax transform
Related in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard — shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.