🤖 GitHub AI Remediation New

When Kill Switch detects a threshold violation and kills the runaway service, it can automatically trigger Claude Code to analyze your codebase and open a pull request fixing the root cause — so the same spike never happens again.

Zero-touch remediation loop: Kill Switch stops the bleeding immediately, then Claude Code digs into your code, finds why it happened, and opens a PR to fix it. You review and merge.

How it works

⚡ Kill Switch detects threshold violation
immediately
🚫 Kill action runs (disconnect worker / stop pod / scale to 0)
milliseconds later
🔔 Alerts fire (PagerDuty, Slack, email, …)
simultaneously
🚀 GitHub workflow_dispatch triggers your remediation workflow
GitHub Actions picks it up
🤖 Claude Code checks out your repo, reads the violation details
🔍 Searches codebase for root cause (N+1, loops, missing indexes, …)
📄 Opens a PR: "fix: N+1 query causing D1 overage"
you review & merge
✅ Violation fixed permanently

Kill Switch passes structured violation data as workflow inputs so Claude has full context: which provider fired, which service, what metric, how far over threshold, and what action was already taken.

Setup

  1. Add the workflow file to your repo

    Copy the template from Kill Switch Settings → Alert Channels → GitHub, or copy it from the example file in the Kill Switch repo:

    cp kill-switch-remediate.example.yml \
        .github/workflows/kill-switch-remediate.yml

    Commit and push to your default branch.

  2. Add your Anthropic API key to the repo

    Go to your repo on GitHub → Settings → Secrets and variables → Actions → New repository secret.

    NameValue
    ANTHROPIC_API_KEYYour Anthropic API key (starts with sk-ant-)

    Get an API key at console.anthropic.com.

  3. Create a GitHub Personal Access Token

    Kill Switch needs a PAT to dispatch the workflow on your behalf. Create one at GitHub → Settings → Developer settings → Personal access tokens.

    See PAT scopes below for required permissions.

  4. Add the GitHub alert channel in Kill Switch

    Go to Kill Switch Settings → Alert Channels → Add Channel → GitHub Remediation and fill in:

    FieldDescription
    Personal Access TokenThe PAT you created in step 3
    Repo ownerYour GitHub org or username (e.g. acme-corp)
    Repo nameThe repo Claude should analyze (e.g. my-app)
    Workflow filekill-switch-remediate.yml
    BranchDefault branch (e.g. main)

    Use Test to verify the channel is configured correctly before saving.

Workflow reference

The workflow receives these inputs from Kill Switch via workflow_dispatch:

InputTypeDescription
providerstringCloud provider (e.g. cloudflare, gcp, aws, runpod)
account_namestringHuman-readable name of the cloud account
severitystringcritical, error, or warning
violation_countstringNumber of threshold violations detected
violations_jsonstringJSON array — see structure below
kill_switch_actionstringAction already taken (e.g. Disconnected d1-chunks-abc)
dedup_keystringaccountId:YYYY-MM-DD — used to prevent duplicate runs

violations_json structure

[
  {
    "serviceName": "d1-chunks-abc",
    "metricName": "D1 Rows Read",
    "currentValue": 302100000,
    "threshold": 5000000,
    "multiplier": "60x",
    "severity": "critical"
  }
]

Complete workflow template

name: Kill Switch — AI Remediation

on:
  workflow_dispatch:
    inputs:
      provider:
        required: true
        type: string
      account_name:
        required: true
        type: string
      severity:
        required: true
        type: string
      violation_count:
        required: true
        type: string
      violations_json:
        required: true
        type: string
      kill_switch_action:
        required: true
        type: string
      dedup_key:
        required: true
        type: string

# Prevent duplicate runs for the same account+day
concurrency:
  group: kill-switch-remediate-${{ inputs.dedup_key }}
  cancel-in-progress: false

jobs:
  remediate:
    runs-on: ubuntu-latest
    timeout-minutes: 30
    permissions:
      contents: write
      pull-requests: write
      issues: write
      actions: read

    steps:
      - uses: actions/checkout@v4

      - name: Run Claude Code — analyze and open fix PR
        uses: anthropics/claude-code-action@v1
        env:
          PROVIDER: ${{ inputs.provider }}
          ACCOUNT_NAME: ${{ inputs.account_name }}
          SEVERITY: ${{ inputs.severity }}
          VIOLATION_COUNT: ${{ inputs.violation_count }}
          VIOLATIONS_JSON: ${{ inputs.violations_json }}
          KILL_SWITCH_ACTION: ${{ inputs.kill_switch_action }}
          DEDUP_KEY: ${{ inputs.dedup_key }}
        with:
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          claude_args: "--allowedTools Edit,Read,Write,Bash,Glob,Grep --max-turns 30"
          prompt: |
            Kill Switch detected $VIOLATION_COUNT violation(s) on
            "$ACCOUNT_NAME" ($PROVIDER, severity: $SEVERITY).

            Kill Switch already took the emergency action: $KILL_SWITCH_ACTION

            Violations: $VIOLATIONS_JSON

            Find and fix the ROOT CAUSE so this doesn't happen again.
            Create a branch fix/kill-switch-$DEDUP_KEY and open a PR.
Security note: Always pass ${{ inputs.* }} values through env: variables and reference them as $VAR_NAME in shell steps. Never interpolate them directly into run: blocks — this prevents shell injection.

PAT scopes

Token typeRequired scopes
Classic PAT repo, workflow
Fine-grained PAT Actions (read/write), Contents (read/write), Pull requests (read/write)

Create a PAT at GitHub → Settings → Developer settings → Personal access tokens.

Fine-grained tokens are recommended: they can be scoped to a single repository and expire automatically.

What Claude does

Claude Code receives the violation details and investigates based on the provider and metric type:

Metric spikeWhat Claude looks for
D1 / database readsN+1 queries, missing indexes, full-table scans, runaway background jobs, missing pagination
Worker / CPU / requestsInfinite loops, unbounded recursion, missing rate limiting, hot endpoints with no caching
StorageMissing TTLs, unbounded writes, log accumulation
API token / AI costRetry loops without backoff, missing deduplication, expensive calls without result caching
GPU / computeUnconstrained job queues, missing spot-instance limits, runaway training loops

Claude then:

  1. Reads the violation details and recent git history (git log --oneline -20)
  2. Searches the codebase for code touching the affected service and metric
  3. Identifies the root cause
  4. Creates a branch fix/kill-switch-<dedup-key>
  5. Applies the minimal correct fix
  6. Opens a pull request with a clear description, what was found, and how to verify the fix

Deduplication

Kill Switch scopes the dedup_key to accountId:YYYY-MM-DD. The workflow uses a GitHub Actions concurrency group keyed on this value, so if the same violation fires multiple times in a day (e.g. across cron checks), only the first workflow run proceeds — subsequent dispatches queue behind it and skip if the first is still in progress.

concurrency:
  group: kill-switch-remediate-${{ inputs.dedup_key }}
  cancel-in-progress: false   # don't cancel an in-progress fix

CLI setup

Add the GitHub remediation channel via the Kill Switch CLI:

ks alerts add \
  --type github \
  --token ghp_YOUR_PAT \
  --repo-owner acme-corp \
  --repo-name my-app \
  --workflow-file kill-switch-remediate.yml \
  --branch main

Test it immediately:

ks alerts test

This dispatches a test workflow_dispatch with empty violation data so you can verify the channel is wired up before any real violation fires.

Troubleshooting

SymptomLikely causeFix
Dispatch returns 422 Branch does not exist, or workflow file not found on that branch Confirm the workflow file is committed to the branch configured in Kill Switch
Dispatch returns 404 Repo owner/name wrong, or PAT lacks repo scope Double-check repo owner and name; re-create PAT with correct scopes
Dispatch returns 401 PAT expired or revoked Generate a new PAT and update the alert channel in Kill Switch Settings
Workflow runs but Claude doesn't open a PR Missing ANTHROPIC_API_KEY secret, or permissions not set Check the ANTHROPIC_API_KEY secret exists; confirm contents: write and pull-requests: write permissions
Duplicate workflow runs Concurrency group missing from workflow Ensure the concurrency: block with cancel-in-progress: false is present
Channel shows as disabled after test Test dispatch failed Check Kill Switch alert history for the error code; fix PAT or repo config
Need help? Open an issue at github.com/divinci-ai/kill-switch or email admin@kill-switch.net.