Automatically rerun failed GitHub Actions workflows and classify suspected flakes — before your team wastes time investigating.
Last updated: 2026-03-05
Version: 1.1
FlakeTriage is a GitHub App that automatically re-runs failed CI jobs to detect flaky tests. It monitors workflow_run.completed events and, when a failure is detected, triggers a re-run of the failed jobs and posts a Check Run with an evidence summary.
FlakeTriage also supports manual retries via /flaketriage retry <run_id> comments on issues/PRs (issue_comment events), and processes GitHub Marketplace plan-change webhooks (marketplace_purchase events) to keep installation tiers current.
FlakeTriage collects and processes only workflow and operational metadata — never your source code, build logs, test output, or personal data beyond your GitHub username.
| Data | Purpose | Retention |
|---|---|---|
Repository name (owner/repo) |
Route events, apply per-repo config | In-memory + SQLite (see below) |
| Workflow run ID | Track rerun decisions, prevent duplicates | SQLite, 30 days |
| Workflow name and path | Policy evaluation (deny sensitive workflows) | In-memory only |
Run conclusion (failure, success, etc.) |
Decide whether to trigger rerun | In-memory only |
| Head SHA | Pin config reads, create Check Runs | In-memory only |
| Installation ID | Authenticate API calls | In-memory only |
| GitHub delivery ID | Deduplicate webhook deliveries | SQLite, 30 days |
| Budget usage counters | Enforce daily rerun limits | SQLite, per UTC day |
/flaketriage retry):| Data | Purpose | Retention |
|---|---|---|
Commenter login (comment.user.login) |
Permission check (write/admin gate) | In-memory only (permission cache, 120s TTL) |
Repository and org context (repository.full_name) |
Route command, scope permission check | In-memory only |
Comment ID (comment.id) |
Idempotency / dedupe in structured logs | Structured log output only (not persisted in SQLite) |
Command outcome (rerun_triggered, permission_denied, etc.) |
Audit trail, Prometheus counter | Structured log + in-memory Prometheus counter |
| Requested workflow run ID | Target of the rerun action | In-memory only |
Note: FlakeTriage does not store the full comment body. The comment text is parsed in-memory to extract the
/flaketriage retry <run_id>command and then discarded.
| Data | Purpose | Retention |
|---|---|---|
| Installation ID | Link plan to installation | SQLite (installation_plans table), until cancelled or overwritten |
| Plan name / slug | Map to internal tier | Logged (structured log); normalized slug stored in SQLite |
Mapped internal tier (free/starter/team/business) |
Enforce usage budget | SQLite (installation_plans table) |
| Account login | Audit trail | Structured log output only (not persisted in SQLite) |
Action (purchased/changed/cancelled/pending_change) |
Determine tier update | Structured log output only |
| Webhook source timestamp | Audit trail | Structured log output only |
/metrics endpoint):| Data | Scope | Protection |
|---|---|---|
| Aggregate Prometheus counters | Process-level totals (e.g., reruns attempted, plan changes processed) | Bearer-token protected; no repository names or user identifiers in metric labels by default |
.github/flaketriage.yml for configuration)| Data Type | Retention Period |
|---|---|
| Delivery ID deduplication records | 30 days |
| Rerun records | 30 days |
| Run state (pending outcome correlation) | 24 hours |
| Budget counters | Per UTC day (current day only) |
| Config cache entries | 120 seconds (TTL) |
| Prometheus metrics | Since last process restart (in-memory) |
Plan-tier retention targets (metrics history — aspirational, not yet code-enforced):
Note: Prometheus metrics are currently held in-memory and reset on process restart regardless of plan tier. The per-tier retention targets above represent planned behavior for a future persistent metrics store (see WI-039). Actual data retention today is governed by the SQLite retention periods in the table above and the process lifecycle.
FlakeTriage requests the minimum permissions necessary:
| Permission | Scope | Purpose |
|---|---|---|
actions:write |
Repository | Trigger re-run of failed jobs |
checks:write |
Repository | Create/update Check Runs with evidence |
contents:read |
Repository | Read .github/flaketriage.yml config file |
issues:write |
Repository | Post reaction/reply on /flaketriage retry commands |
FlakeTriage subscribes to these webhook events:
| Event | Purpose |
|---|---|
workflow_run |
Detect workflow completions and trigger reruns |
push |
Invalidate config cache on default-branch changes |
issue_comment |
Process /flaketriage retry <run_id> manual retry commands |
marketplace_purchase |
Ingest plan purchase, change, and cancellation events to update installation tier |
x-hub-signature-256 header) before any processing occurs. Events with invalid or missing signatures are rejected./metrics telemetry endpoint is protected by a bearer token (FLAKETRIAGE_METRICS_TOKEN). Requests without a valid token receive HTTP 401.FlakeTriage processes only GitHub-public workflow metadata, not personal data in the GDPR sense. However, if you believe FlakeTriage holds data about you:
When you uninstall the FlakeTriage GitHub App:
We will update this document when our data practices change. The “Last updated” date at the top reflects the most recent revision.
For privacy questions or data requests: