python-infrastructure
Python patterns for system reliability — background jobs and task queues (Celery, async), resilience and recovery (retries, backoff, timeouts, circuit breakers via tenacity), and observability (structured logging via structlog, metrics, distributed tracing, golden signals). USE WHEN building async workers, queueing tasks, handling transient network/IO failures, instrumenting Python services for production, designing retry policies, configuring logging or tracing, or any combination of these system-reliability concerns. NOT FOR language idioms or type hygiene (use `writing-python`) or project setup and dependency management (use `uv`).
What this skill does
# Python Infrastructure System-reliability concerns for Python services, grouped because real code uses them together: a task you queue (background-jobs) needs retries (resilience) and instrumentation (observability) on the same call path. ## Scope routing | If you need to… | Read | |---|---| | Design a task queue, schedule recurring jobs, or run async workers (Celery, RQ, asyncio task pools) | `References/background-jobs.md` | | Decide what to retry, with what backoff, and when to stop (tenacity patterns, idempotency, circuit breakers) | `References/resilience.md` | | Instrument a service with structured logs, metrics, and traces (structlog, OpenTelemetry, the four golden signals) | `References/observability.md` | ## Decision tree ``` Operation can fail transiently (network/IO/3rd-party API)? -> resilience.md (retry policy) Operation runs out-of-request (email, image processing, batch)? -> background-jobs.md (queue + worker) Need to know what's happening in production? -> observability.md (logs/metrics/traces) All three at once for one feature? -> all three references, in that order. ``` ## Cross-skill boundaries - **`writing-python`** — *how* to write the function. This skill — *how it survives in production*. - **`python-error-handling`** — *what exception to raise*. This skill — *what to do when it's raised across a network boundary*. - **`python-resource-management`** — *how to clean up resources* (context managers). This skill — *how to keep retrying when resources fail to acquire*. ## Gotchas - **Retry without backoff is a DoS amplifier.** A failed downstream + immediate retry from N clients = traffic burst that keeps the downstream down. Default to exponential backoff + jitter from day one. - **Retrying non-idempotent operations duplicates side effects.** A failed POST + retry can mean two charges. Always pair retry-on-failure with an idempotency key OR mark the operation non-retryable. - **Synchronous code inside an async worker blocks the event loop.** A "fast" `requests` call in an asyncio worker kills throughput. Use the async client (`httpx`, `aiohttp`) or run sync code in an executor. - **Structured logs and metrics serve different audiences.** Logs answer "what happened to this one request"; metrics answer "what's happening across all requests". Don't try to derive one from the other — instrument both. - **Trace context propagation needs explicit plumbing across the queue boundary.** Pushing a task to Celery loses the current trace unless you serialize the trace context into the task headers and restore it in the worker. Read the OpenTelemetry-Celery propagator docs before assuming it "just works".
Related in Productivity
gitea-workflow
IncludedOrchestrate agile development workflows for Gitea repositories using the tea CLI. Use when working with Gitea-hosted repos and asking to 'run the workflow', 'continue working', 'what's next', 'complete the task cycle', 'start my day', 'end the sprint', 'implement the next task', or wanting guided step-by-step development assistance. Keywords: workflow, orchestrate, agile, task cycle, sprint, daily, implement, review, PR, standup, retrospective, gitea, tea.
microsoft-graph-gateway
IncludedRoute Microsoft Graph work in this workspace. Use when users want to read or write Outlook mail, calendar events, contacts, OneDrive or SharePoint files, Teams, Planner, To Do, users, groups, directory data, or arbitrary Microsoft Graph endpoints from VS Code. Prefer WorkIQ for common read scenarios. Use Microsoft Graph for write actions and gap-read scenarios that need exact Graph properties, filters, permissions, or endpoints.
copilotkit
IncludedUse when building with CopilotKit — setup, development, integrations, debugging, upgrading, or contributing. Routes to the appropriate specialized skill based on the task.
wordly-wisdom
IncludedProvides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
swain-session
IncludedSession management and project status dashboard. Owns the full session lifecycle (start/work/close/resume), focus lane, bookmarks, worktree detection, and tab naming. Also serves as the project status dashboard — shows active epics, progress, actionable next steps, blocked items, tasks, GitHub issues, and recommendations. Worktree creation is deferred to swain-do task dispatch (SPEC-195). Triggers on: 'session', 'status', 'what's next', 'dashboard', 'overview', 'where are we', 'what should I work on', 'show me priorities', 'bookmark', 'focus on', 'session info'.
gandi
IncludedComprehensive Gandi domain registrar integration for domain and DNS management. Register and manage domains, create/update/delete DNS records (A, AAAA, CNAME, MX, TXT, SRV, and more), configure email forwarding and aliases, check SSL certificate status, create DNS snapshots for safe rollback, bulk update zone files, and monitor domain expiration. Supports multi-domain management, zone file import/export, and automated DNS backups. Includes both read-only and destructive operations with safety controls.