🤖 GitHub AI Remediation New
When Kill Switch detects a threshold violation and kills the runaway service, it can automatically trigger Claude Code to analyze your codebase and open a pull request fixing the root cause — so the same spike never happens again.
How it works
workflow_dispatch triggers your remediation workflowKill Switch passes structured violation data as workflow inputs so Claude has full context: which provider fired, which service, what metric, how far over threshold, and what action was already taken.
Setup
-
Add the workflow file to your repo
Copy the template from Kill Switch Settings → Alert Channels → GitHub, or copy it from the example file in the Kill Switch repo:
cp kill-switch-remediate.example.yml \ .github/workflows/kill-switch-remediate.ymlCommit and push to your default branch.
-
Add your Anthropic API key to the repo
Go to your repo on GitHub → Settings → Secrets and variables → Actions → New repository secret.
Name Value ANTHROPIC_API_KEYYour Anthropic API key (starts with sk-ant-)Get an API key at console.anthropic.com.
-
Create a GitHub Personal Access Token
Kill Switch needs a PAT to dispatch the workflow on your behalf. Create one at GitHub → Settings → Developer settings → Personal access tokens.
See PAT scopes below for required permissions.
-
Add the GitHub alert channel in Kill Switch
Go to Kill Switch Settings → Alert Channels → Add Channel → GitHub Remediation and fill in:
Field Description Personal Access Token The PAT you created in step 3 Repo owner Your GitHub org or username (e.g. acme-corp)Repo name The repo Claude should analyze (e.g. my-app)Workflow file kill-switch-remediate.ymlBranch Default branch (e.g. main)Use Test to verify the channel is configured correctly before saving.
Workflow reference
The workflow receives these inputs from Kill Switch via workflow_dispatch:
| Input | Type | Description |
|---|---|---|
provider | string | Cloud provider (e.g. cloudflare, gcp, aws, runpod) |
account_name | string | Human-readable name of the cloud account |
severity | string | critical, error, or warning |
violation_count | string | Number of threshold violations detected |
violations_json | string | JSON array — see structure below |
kill_switch_action | string | Action already taken (e.g. Disconnected d1-chunks-abc) |
dedup_key | string | accountId:YYYY-MM-DD — used to prevent duplicate runs |
violations_json structure
[
{
"serviceName": "d1-chunks-abc",
"metricName": "D1 Rows Read",
"currentValue": 302100000,
"threshold": 5000000,
"multiplier": "60x",
"severity": "critical"
}
]
Complete workflow template
name: Kill Switch — AI Remediation
on:
workflow_dispatch:
inputs:
provider:
required: true
type: string
account_name:
required: true
type: string
severity:
required: true
type: string
violation_count:
required: true
type: string
violations_json:
required: true
type: string
kill_switch_action:
required: true
type: string
dedup_key:
required: true
type: string
# Prevent duplicate runs for the same account+day
concurrency:
group: kill-switch-remediate-${{ inputs.dedup_key }}
cancel-in-progress: false
jobs:
remediate:
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: write
pull-requests: write
issues: write
actions: read
steps:
- uses: actions/checkout@v4
- name: Run Claude Code — analyze and open fix PR
uses: anthropics/claude-code-action@v1
env:
PROVIDER: ${{ inputs.provider }}
ACCOUNT_NAME: ${{ inputs.account_name }}
SEVERITY: ${{ inputs.severity }}
VIOLATION_COUNT: ${{ inputs.violation_count }}
VIOLATIONS_JSON: ${{ inputs.violations_json }}
KILL_SWITCH_ACTION: ${{ inputs.kill_switch_action }}
DEDUP_KEY: ${{ inputs.dedup_key }}
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
claude_args: "--allowedTools Edit,Read,Write,Bash,Glob,Grep --max-turns 30"
prompt: |
Kill Switch detected $VIOLATION_COUNT violation(s) on
"$ACCOUNT_NAME" ($PROVIDER, severity: $SEVERITY).
Kill Switch already took the emergency action: $KILL_SWITCH_ACTION
Violations: $VIOLATIONS_JSON
Find and fix the ROOT CAUSE so this doesn't happen again.
Create a branch fix/kill-switch-$DEDUP_KEY and open a PR.
${{ inputs.* }} values through
env: variables and reference them as $VAR_NAME in shell steps.
Never interpolate them directly into run: blocks — this prevents shell injection.
PAT scopes
| Token type | Required scopes |
|---|---|
| Classic PAT | repo, workflow |
| Fine-grained PAT | Actions (read/write), Contents (read/write), Pull requests (read/write) |
Create a PAT at GitHub → Settings → Developer settings → Personal access tokens.
Fine-grained tokens are recommended: they can be scoped to a single repository and expire automatically.
What Claude does
Claude Code receives the violation details and investigates based on the provider and metric type:
| Metric spike | What Claude looks for |
|---|---|
| D1 / database reads | N+1 queries, missing indexes, full-table scans, runaway background jobs, missing pagination |
| Worker / CPU / requests | Infinite loops, unbounded recursion, missing rate limiting, hot endpoints with no caching |
| Storage | Missing TTLs, unbounded writes, log accumulation |
| API token / AI cost | Retry loops without backoff, missing deduplication, expensive calls without result caching |
| GPU / compute | Unconstrained job queues, missing spot-instance limits, runaway training loops |
Claude then:
- Reads the violation details and recent git history (
git log --oneline -20) - Searches the codebase for code touching the affected service and metric
- Identifies the root cause
- Creates a branch
fix/kill-switch-<dedup-key> - Applies the minimal correct fix
- Opens a pull request with a clear description, what was found, and how to verify the fix
Deduplication
Kill Switch scopes the dedup_key to accountId:YYYY-MM-DD.
The workflow uses a GitHub Actions concurrency group keyed on this value,
so if the same violation fires multiple times in a day (e.g. across cron checks),
only the first workflow run proceeds — subsequent dispatches queue behind it and skip
if the first is still in progress.
concurrency:
group: kill-switch-remediate-${{ inputs.dedup_key }}
cancel-in-progress: false # don't cancel an in-progress fix
CLI setup
Add the GitHub remediation channel via the Kill Switch CLI:
ks alerts add \ --type github \ --token ghp_YOUR_PAT \ --repo-owner acme-corp \ --repo-name my-app \ --workflow-file kill-switch-remediate.yml \ --branch main
Test it immediately:
ks alerts test
This dispatches a test workflow_dispatch with empty violation data so you can
verify the channel is wired up before any real violation fires.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Dispatch returns 422 | Branch does not exist, or workflow file not found on that branch | Confirm the workflow file is committed to the branch configured in Kill Switch |
| Dispatch returns 404 | Repo owner/name wrong, or PAT lacks repo scope |
Double-check repo owner and name; re-create PAT with correct scopes |
| Dispatch returns 401 | PAT expired or revoked | Generate a new PAT and update the alert channel in Kill Switch Settings |
| Workflow runs but Claude doesn't open a PR | Missing ANTHROPIC_API_KEY secret, or permissions not set |
Check the ANTHROPIC_API_KEY secret exists; confirm contents: write and pull-requests: write permissions |
| Duplicate workflow runs | Concurrency group missing from workflow | Ensure the concurrency: block with cancel-in-progress: false is present |
| Channel shows as disabled after test | Test dispatch failed | Check Kill Switch alert history for the error code; fix PAT or repo config |