Automatically rerun failed GitHub Actions workflows and classify suspected flakes — before your team wastes time investigating.
Audience: Developers and engineering managers who install FlakeTriage on their GitHub repositories.
Prerequisites: A GitHub account, at least one repository with GitHub Actions workflows.
Get FlakeTriage running on your repository in under 5 minutes:
Install the GitHub App — Go to the GitHub Marketplace listing, click Install, and grant access to your organization or specific repositories.
Done. FlakeTriage works immediately with safe defaults — no config file required.
(Optional) Customize — Create .github/flaketriage.yml in your repository to adjust the daily rerun budget or allow/deny specific workflows.
Verify — Push a change that triggers a CI workflow. If it fails, FlakeTriage will automatically rerun the failed jobs and post a Check Run on the commit with the decision and evidence.
Zero-config: If you skip step 3, FlakeTriage uses safe defaults: enabled, 10 reruns/day budget (capped by your plan tier), built-in deny patterns for deploy/release workflows.
When you install FlakeTriage from the Marketplace, GitHub asks you to choose the installation scope:
| Scope | Behavior |
|---|---|
| All repositories | FlakeTriage monitors every repository in the organization (including future ones). |
| Only select repositories | FlakeTriage monitors only the repositories you choose. You can add or remove repositories later from your organization’s Installed GitHub Apps settings. |
Recommendation: Start with “Only select repositories” for a controlled rollout, then expand to “All repositories” once you’re comfortable.
To change scope later:
FlakeTriage requests the minimum permissions necessary to function:
| Permission | Scope | Why It’s Needed |
|---|---|---|
actions:write |
Repository | Trigger re-run of failed jobs via the GitHub API |
checks:write |
Repository | Create and update Check Runs with evidence summaries |
contents:read |
Repository | Read .github/flaketriage.yml configuration file |
issues:write |
Repository | Post reactions and reply comments for /flaketriage retry feedback |
metadata:read |
Repository | Required by GitHub for all App installations (automatic) |
Privacy: FlakeTriage reads only workflow metadata and one small config file. It never reads source code, build logs, test output, or personal data. See docs/PRIVACY_POLICY.md for full details.
FlakeTriage subscribes to these webhook events:
| Event | Purpose |
|---|---|
workflow_run |
Detect completed workflow runs (trigger automatic rerun) |
push |
Invalidate config cache on default-branch push (instant config reload) |
issue_comment |
Listen for /flaketriage retry manual retry commands |
marketplace_purchase |
Process plan changes (purchased, changed, cancelled) |
.github/flaketriage.yml
This file is optional. If it’s missing, FlakeTriage uses safe defaults. The file is read from the exact commit SHA being processed (not the latest on the default branch), so config changes take effect on the next workflow run after the config is pushed.
Config caching: The config file is cached in memory for 120 seconds. Pushing to the default branch triggers instant cache invalidation — you don’t have to wait.
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled |
boolean |
true |
Enable or disable FlakeTriage for this repo. Set to false to pause all automatic reruns without uninstalling. |
budget_daily |
integer |
10 |
Maximum number of automatic reruns per UTC day for this repo. Must be a non-negative integer (0 = no reruns). The effective budget is min(budget_daily, plan tier limit). |
allow_workflows |
string[] |
[] |
Workflow names to always allow for automatic rerun, even if they match a deny pattern. Highest priority — overrides all deny rules. |
deny_workflows |
string[] |
[] |
Additional workflow name patterns to deny. These are added on top of the built-in deny patterns. |
Validation rules:
budgetDaily (camelCase) will be flagged.enabled: "yes" instead of enabled: true) produce a warning, and the default value is used for that field.allow_workflows or deny_workflows arrays are silently ignored.Built-in deny patterns (always active, case-insensitive substring match):
| Pattern | Protects against |
|---|---|
release |
Release workflows |
deploy |
Deployment workflows |
publish |
Package publish workflows |
production |
Production-targeted workflows |
How matching works: FlakeTriage performs a case-insensitive substring match against the workflow name (or file path). A workflow named “Deploy to staging” matches the
deploypattern. Useallow_workflowsto override if needed.
# .github/flaketriage.yml
# No config needed — FlakeTriage uses safe defaults:
# enabled: true
# budget_daily: 10
# allow_workflows: []
# deny_workflows: []
Or simply don’t create the file at all.
# .github/flaketriage.yml
enabled: true
budget_daily: 20
Allows up to 20 automatic reruns per day (or your plan tier limit, whichever is lower).
# .github/flaketriage.yml
# Large repo with many CI workflows — higher budget + selective allows
enabled: true
budget_daily: 50
allow_workflows:
- "unit-tests"
- "integration-tests"
- "e2e-smoke"
deny_workflows:
- "nightly-benchmark"
- "scheduled-cleanup"
This config:
unit-tests, integration-tests, and e2e-smoke — these will be rerun even if they happen to contain a built-in deny pattern.nightly-benchmark and scheduled-cleanup on top of the built-in deny patterns.# .github/flaketriage.yml
# Conservative setup for a regulated environment
enabled: true
budget_daily: 3
allow_workflows: []
deny_workflows:
- "compliance"
- "security-scan"
- "migration"
This config:
1. A GitHub Actions workflow FAILS (conclusion: failure or timed_out)
│
2. GitHub sends a webhook (workflow_run.completed) to FlakeTriage
│
3. FlakeTriage validates the webhook signature (HMAC-SHA256)
│
4. Checks: has FlakeTriage seen this delivery before? (dedup guard)
│
5. Reads .github/flaketriage.yml from the commit (or uses defaults)
│
6. POLICY CHECK: Is this workflow allowed?
│ ├─ If workflow matches allow_workflows → YES
│ ├─ If workflow matches deny pattern → NO (Check Run: "policy_denied")
│ └─ Otherwise → YES
│
7. BUDGET CHECK: Are reruns available today?
│ ├─ Effective budget = min(config budget_daily, plan tier limit)
│ ├─ If used >= effective budget → NO (Check Run: "budget_exhausted")
│ └─ Otherwise → YES
│
8. RERUN GUARD: Has this exact run already been rerun?
│ ├─ If already rerun → NO (Check Run: "already_rerun")
│ └─ Otherwise → YES
│
9. TRIGGER: POST /repos/{owner}/{repo}/actions/runs/{run_id}/rerun-failed-jobs
│
10. POST Check Run on the commit with the full decision evidence
│
─── Later, when the rerun completes ───
│
11. GitHub sends another webhook (workflow_run.completed, run_attempt > 1)
│
12. FlakeTriage looks up the stored state → classifies outcome:
├─ Rerun PASSED → "Suspected Flake"
└─ Rerun FAILED → "Likely Legit Failure"
│
13. PATCH the Check Run with the outcome classification
A rerun is triggered when all of these conditions are true:
| Condition | Detail |
|---|---|
Workflow conclusion is failure or timed_out |
Other conclusions (success, cancelled, skipped, etc.) are ignored |
This is the first attempt (run_attempt = 1) |
Rerun completions (attempt > 1) are used for classification, not re-rerunning |
| FlakeTriage is enabled for this repo | enabled: true in config (default) |
| Workflow passes the policy check | Not matching any deny pattern, or listed in allow_workflows |
| Daily budget has not been exhausted | used_today < min(config budget_daily, tier limit) |
| This run has not already been rerun | Single-rerun idempotency guard prevents double reruns |
| Skip Reason | Reason Code | What Happens |
|---|---|---|
| Workflow matches a deny pattern | policy_denied |
Check Run created with deny evidence |
| Daily budget exhausted | budget_exhausted |
Check Run created with budget details |
| Already rerun by FlakeTriage | already_rerun |
Check Run created (dedup) |
| FlakeTriage disabled in config | disabled_by_config |
Check Run created noting disabled |
| Webhook burst rate limited | rate_limited_webhook |
Check Run created noting rate limit |
| Not a failure/timed_out conclusion | (not processed) | No action, no Check Run |
Every decision produces a Check Run — whether FlakeTriage takes action or not. This gives you full visibility into what happened and why.
When a rerun completes, FlakeTriage updates the original Check Run with one of these outcomes:
| Rerun Result | Classification | Meaning |
|---|---|---|
| Rerun passed | Suspected Flake | The original failure was likely transient (flaky test or infrastructure issue) |
| Rerun failed | Likely Legit Failure | The failure persisted, suggesting a real regression |
FlakeTriage includes a metadata-only categorization heuristic (no log parsing):
| Category | Confidence | When Assigned |
|---|---|---|
test_flake |
med |
Rerun passed after original failure |
unknown |
low |
Rerun failed, was skipped, or hasn’t completed yet |
infra |
— | Reserved for future rules |
dependency |
— | Reserved for future rules |
Note: This heuristic is intentionally conservative. It never claims certainty without log-level insight. The
test_flakeatmedconfidence is the strongest signal currently produced.
After FlakeTriage processes a workflow run, look for a Check Run named “FlakeTriage” on the commit. Here’s what each outcome looks like:
When FlakeTriage successfully triggers a rerun of failed jobs:
## FlakeTriage — Automation Evidence
| Field | Value |
|---|---|
| Event | `workflow_run.completed` |
| Run ID | `12345678` |
| Conclusion | `failure` |
| Workflow | CI Tests |
| Reason Code | `rerun_triggered` |
| Reason | Rerun of failed jobs triggered successfully |
### Configuration
- **Source:** REPO
- **Path:** `.github/flaketriage.yml`
- **Enabled:** true
- **Budget Daily:** 10
### Policy Decision
- **Allowed:** true
- **Reason:** No deny patterns matched
### Budget
- **Daily Limit:** 10
- **Used Today:** 3
- **Remaining:** 6
- **Budget Decision:** Within budget
### Rerun Action
- **Triggered:** true
- **Outcome:** rerun_triggered
- **Summary:** Rerun of failed jobs triggered successfully
When the rerun succeeds, the Check Run is updated:
## FlakeTriage — Suspected Flake
| Field | Value |
|---|---|
| Classification | **Suspected Flake** |
| Run ID | `12345678` |
| Workflow | CI Tests |
| Head SHA | `abc1234` |
### Timeline
| Step | Detail |
|---|---|
| Original conclusion | `failure` |
| Rerun requested | 2026-03-08T10:00:00Z |
| Rerun attempt | 2 |
| Rerun conclusion | `success` |
| Rerun completed | 2026-03-08T10:05:30Z |
### Interpretation
The rerun **succeeded**, which suggests the original failure was caused by a
flaky test or transient infrastructure issue. The original failure is likely
**not a real regression**.
### Categorization (Heuristic)
- **Category:** test_flake
- **Confidence:** med
- **Rationale:** `rerun_passed`
When the rerun also fails:
## FlakeTriage — Likely Legit Failure
| Field | Value |
|---|---|
| Classification | **Likely Legit Failure** |
| Run ID | `12345678` |
| Workflow | CI Tests |
| Head SHA | `abc1234` |
### Timeline
| Step | Detail |
|---|---|
| Original conclusion | `failure` |
| Rerun requested | 2026-03-08T10:00:00Z |
| Rerun attempt | 2 |
| Rerun conclusion | `failure` |
| Rerun completed | 2026-03-08T10:06:00Z |
### Interpretation
The rerun **failed again**, which suggests the original failure is a
**real regression** — not a flake. Manual investigation is recommended.
### Categorization (Heuristic)
- **Category:** unknown
- **Confidence:** low
- **Rationale:** `rerun_failed`
## FlakeTriage — Automation Evidence
| Field | Value |
|---|---|
| Reason Code | `budget_exhausted` |
| Reason | Daily rerun budget exhausted (10/10 used) |
### Budget
- **Daily Limit:** 10
- **Used Today:** 10
- **Remaining:** 0
- **Budget Decision:** Budget exhausted for today
## FlakeTriage — Automation Evidence
| Field | Value |
|---|---|
| Reason Code | `policy_denied` |
| Reason | Workflow matches deny pattern |
### Policy Decision
- **Allowed:** false
- **Reason:** Denied by pattern match
- **Matched deny patterns:** `deploy`
## FlakeTriage — Automation Evidence
| Field | Value |
|---|---|
| Reason Code | `disabled_by_config` |
| Reason | FlakeTriage is disabled for this repository |
### Configuration
- **Source:** REPO
- **Path:** `.github/flaketriage.yml`
- **Enabled:** false
FlakeTriage supports manually triggering a rerun by posting a comment on any issue or pull request.
/flaketriage retry <run_id>
Where <run_id> is the numeric workflow run ID you want to rerun.
Example:
/flaketriage retry 12345678
Important: The syntax is
/flaketriage retry <run_id>— notrun=<run_id>. The run ID is a positional argument separated by a space.
github.com/{owner}/{repo}/actions/runs/{run_id}Alternatively, the FlakeTriage Check Run on your commit includes the Run ID in its evidence table.
| Requirement | Detail |
|---|---|
| Minimum permission | write, admin, or maintain on the repository |
| Verification method | GitHub Collaborator API (GET /repos/{owner}/{repo}/collaborators/{username}/permission) |
| Cache | Permission results cached in-memory for 120 seconds (configurable via FLAKETRIAGE_PERMISSION_CACHE_TTL_SECONDS) |
| Fallback on API failure | Denied by default — if the permission API is unreachable, access is denied |
| Plan requirement | /flaketriage retry is not available on the Free plan |
| Limit | Value |
|---|---|
| Window | 60 seconds (sliding window) |
| Max requests | 3 per repo × actor per window |
| Behavior when exceeded | Command silently ignored — no reaction, no reply, no Check Run |
Why silent? The rate-limit check runs before token exchange (step 2 in the pipeline), so FlakeTriage cannot post reactions or replies when rate-limited. The same applies to budget-exhausted manual retries. The event is still logged server-side and counted by the
flaketriage_manual_retry_requests_total{outcome="denied",reason="rate_limit"}Prometheus metric.
When you post /flaketriage retry <run_id> and the command passes rate-limit and budget checks:
Silent outcomes: If the command is rate-limited, budget-exhausted, on the Free plan, or has a missing run ID, FlakeTriage returns early before acquiring an API token. You will see no reaction, no reply, and no Check Run. Check server logs or Prometheus metrics to diagnose these cases.
| Outcome | Check Run Created? | Description |
|---|---|---|
rerun_triggered |
✅ (neutral) | Rerun of failed jobs triggered successfully |
rerun_failed |
✅ (failure) | Rerun API call failed (e.g., 403, 404) |
permission_denied |
✅ (failure) | Commenter lacks write/admin/maintain access |
permission_check_unavailable |
✅ (failure) | Permission API unreachable — denied by default |
budget_exhausted |
— | Daily rerun budget exceeded |
rate_limited |
— | Too many commands in 60-second window |
auth_failed |
— | Installation token exchange failed |
missing_run_id |
— | No run ID provided in command |
manual_retry_denied_free_tier |
— | Free plan does not include manual retry |
| Plan | Reruns/Day | Manual Retry | Monthly | Annual |
|---|---|---|---|---|
| Free | Up to 5 | No | $0 | $0 |
| Starter | Up to 50 | Yes | $10 | $100 |
| Team | Up to 300 | Yes | $30 | $300 |
| Business | Up to 1000 | Yes | $99 | $990 |
All plans include: Check Run evidence, policy engine, config caching, structured logging, Prometheus metrics, impact/export/hotspot endpoints.
For full pricing details be sure to see the Marketplace listing pricing section.
The daily rerun budget is usage-based and resets at midnight UTC:
effective_budget = min(config budget_daily, plan tier limit)
Example: If your config says budget_daily: 50 but you’re on the Free plan (limit: 5), your effective budget is 5 reruns/day.
budget_exhausted with the used/total counts.FlakeTriage resolves your plan tier using this priority order:
FLAKETRIAGE_PLAN_TIER env var (useful for self-hosted deployments).free.Per-repo overrides are available via FLAKETRIAGE_TIER_OVERRIDES (JSON map, environment variable). See the Marketplace listing pricing section for details.
FlakeTriage provides delivery-impact visibility directly inside GitHub — no separate dashboard required.
Every Check Run created by FlakeTriage includes a Time Saved mini-card at the bottom. The mini-card shows:
| Field | Description |
|---|---|
| Auto rerun triggered | Whether FlakeTriage triggered an automatic rerun |
| Result | Outcome: Suspected Flake, Likely Legit Failure, Skipped, or Rerun Pending |
| Time-to-green | Minutes between rerun request and successful completion (suspected flake only) |
| Estimated developer time saved | Estimated minutes saved by avoiding manual investigation |
Initial triage (rerun pending):
Outcome update (rerun completed):
All estimates use configurable values from .github/flaketriage.yml:
roi:
estimated_manual_rerun_minutes: 3 # How long a manual rerun takes (default: 3)
estimated_context_switch_minutes: 2 # Context-switch overhead (default: 2)
Formula: estimated_dev_time_saved = 1 × (manual_rerun + context_switch) per suspected flake.
FlakeTriage shows time savings only — no dollar or cost estimates.
FlakeTriage can post a weekly impact digest as a GitHub Issue in your repository. The digest summarizes:
The digest uses an idempotent issue — the same issue is updated each week rather than creating new ones. Look for issues with the flaketriage-digest label.
/impact API EndpointThe /impact endpoint returns a comprehensive JSON impact summary. See API / Telemetry Endpoints for usage and examples. The /roi endpoint is kept as a deprecated alias.
Important: All “Estimated” values are based on configured assumptions, not certainties. FlakeTriage uses the word “Estimated” wherever counterfactual savings are shown.
FlakeTriage exposes five telemetry endpoints for monitoring and analytics. These are available on the running FlakeTriage server (not on GitHub).
All telemetry endpoints require a bearer token set via the FLAKETRIAGE_METRICS_TOKEN environment variable.
| Scenario | Behavior |
|---|---|
| Token not configured | All endpoints return 403 Forbidden (telemetry_disabled) |
| Token configured, no Authorization header | Returns 401 Unauthorized |
| Token configured, wrong token | Returns 401 Unauthorized |
| Token configured, correct token | Returns data (200) |
Security: Token comparison uses constant-time
timingSafeEqualto prevent timing attacks.
| Endpoint | Method | Description |
|---|---|---|
GET /metrics |
GET | Prometheus-format metrics (all flaketriage_* counters and gauges) |
GET /impact |
GET | JSON impact summary (reruns attempted, successful, skipped, estimated time saved) |
GET /roi |
GET | Deprecated alias for /impact — returns identical data |
GET /export |
GET | Export impact + outcome data. Query: ?format=json (default) or ?format=csv |
GET /hotspots |
GET | Top N hotspot entries by count. Query: ?limit=N (default 10, max 50) |
Additional operational endpoints (no auth required):
| Endpoint | Method | Description |
|---|---|---|
GET /healthz |
GET | Liveness probe — always 200 (includes degraded flag) |
GET /readyz |
GET | Readiness probe — 200 if healthy, 503 if storage degraded |
# Set your bearer token
$headers = @{ Authorization = "Bearer $env:FLAKETRIAGE_METRICS_TOKEN" }
$baseUrl = "https://your-flaketriage-instance.fly.dev"
# Prometheus metrics
Invoke-RestMethod -Uri "$baseUrl/metrics" -Headers $headers
# Impact summary
Invoke-RestMethod -Uri "$baseUrl/impact" -Headers $headers
# Export as JSON
Invoke-RestMethod -Uri "$baseUrl/export?format=json" -Headers $headers
# Export as CSV
Invoke-WebRequest -Uri "$baseUrl/export?format=csv" -Headers $headers |
Select-Object -ExpandProperty Content
# Hotspots (top 10)
Invoke-RestMethod -Uri "$baseUrl/hotspots?limit=10" -Headers $headers
# Health check (no auth required)
Invoke-RestMethod -Uri "$baseUrl/healthz"
Example /impact response:
{
"suspected_flake_count": 31,
"total_reruns_attempted": 47,
"reruns_classified": 31,
"reruns_pending_or_failed": 16,
"reruns_skipped": {
"budget": 5,
"policy": 12,
"already_rerun": 2
},
"reruns_skipped_total": 19,
"estimated_manual_minutes_avoided": 155,
"ci_minutes_added": 0,
"budget_exhaustion_count": 5,
"impact_config": {
"estimated_manual_rerun_minutes": 3,
"estimated_context_switch_minutes": 2
},
"formulas": {
"estimated_manual_minutes_avoided": "suspected_flake_count(31) × (estimated_manual_rerun_minutes(3) + estimated_context_switch_minutes(2)) = 155"
},
"window": "since_start",
"disclaimer": "All saved-time figures are estimates based on configured assumptions, not certainties."
}
Example /hotspots response:
{
"hotspots": [
{ "category": "skip_reason", "label": "policy_denied", "count": 12 },
{ "category": "skip_reason", "label": "budget_exhausted", "count": 5 },
{ "category": "tier_attempt", "label": "free/triggered", "count": 47 }
],
"limit": 10,
"total_entries": 3,
"window": "since_start"
}
| Check | How |
|---|---|
| Is the App installed on the repo? | Go to Settings → Integrations → GitHub Apps — you should see FlakeTriage listed |
Is the workflow conclusion failure or timed_out? |
Other conclusions (success, cancelled, skipped) are intentionally ignored |
| Is FlakeTriage enabled? | Check .github/flaketriage.yml — if enabled: false, FlakeTriage won’t act |
| Is the workflow denied by policy? | Check the Check Run for policy_denied. Add it to allow_workflows to override |
| Is the budget exhausted? | Check the Check Run for budget_exhausted. Wait for UTC midnight reset or increase budget_daily |
| Is this a first attempt? | FlakeTriage only reruns on run_attempt = 1. Rerun completions (attempt > 1) are used for classification |
| Response Code | Meaning | What to Check |
|---|---|---|
| 202 | Accepted — webhook queued for processing | Normal operation. Check Runs will appear shortly. |
| 401 | Signature verification failed | Webhook secret mismatch between GitHub App settings and WEBHOOK_SECRET env var. |
| 400 | Invalid JSON payload | Unusual — may indicate a proxy or middleware corrupting the request body. |
| 503 | Server shutting down | Transient — the server is draining during a deployment. Redeliver the webhook. |
fly logs --app <app-name>).Note: Signature failures return 401 (not 400). If you see 401 responses, verify that the
WEBHOOK_SECRETenvironment variable matches the secret configured in your GitHub App settings.
budget_daily in .github/flaketriage.yml (still capped by your plan tier limit).| Outcome | Meaning | Fix |
|---|---|---|
permission_denied |
You don’t have write access to the repo | Ask a repo admin to grant write/maintain/admin access |
permission_check_unavailable |
GitHub API is unreachable | Wait and retry after GitHub status recovers |
manual_retry_denied_free_tier |
Free plan does not include manual retry | Upgrade to Starter or above |
budget_exhausted |
Daily rerun budget exceeded | Wait for UTC midnight reset |
rate_limited |
Too many commands in 60 seconds | Wait 60 seconds and retry |
missing_run_id |
No run ID in command | Use /flaketriage retry <run_id> with the numeric run ID |
If FlakeTriage cannot determine your plan tier (no database record, no environment variable), it falls back to the free tier. Signs this is happening:
flaketriage_installation_tier_missing_total Prometheus metric is incrementing.manual_retry_denied_free_tier.Fix: Ensure your Marketplace subscription is active, or set FLAKETRIAGE_PLAN_TIER environment variable explicitly.
| Source | What it shows |
|---|---|
| Check Run on the commit | Full decision evidence (config, policy, budget, outcome) |
| GitHub webhook deliveries | App settings → Advanced → Recent Deliveries (request/response payloads) |
| Server logs | Fly.io: fly logs --app <app-name> — structured JSON logs with event types |
/metrics endpoint |
Prometheus counters: webhook totals, reruns triggered/skipped, queue depth |
/healthz endpoint |
Liveness + storage mode (sqlite vs memory) + degraded flag |
Q: Does FlakeTriage read my source code or build logs? A: No. FlakeTriage operates on workflow metadata only — event payloads, conclusion status, workflow names, run IDs. It never fetches source code, build logs, or test output. See docs/PRIVACY_POLICY.md.
Q: What happens if I don’t create a .github/flaketriage.yml file?
A: FlakeTriage works with safe defaults: enabled, 10 reruns/day budget (capped by your plan tier), built-in deny patterns for release/deploy/publish/production workflows.
Q: Can FlakeTriage rerun the same workflow run multiple times? A: No. FlakeTriage has a single-rerun idempotency guard. Each workflow run can be rerun at most once. This is enforced regardless of configuration.
Q: Does FlakeTriage rerun the entire workflow?
A: No. FlakeTriage uses rerun-failed-jobs, which reruns only the failed jobs within the workflow — not the entire run. This saves CI minutes and is faster.
Q: What if my workflow fails during the rerun too? A: FlakeTriage classifies it as “Likely Legit Failure” and updates the Check Run accordingly. It does not attempt a second rerun.
Q: Are deploy/release workflows automatically rerun?
A: No. FlakeTriage has built-in deny patterns for release, deploy, publish, and production. These workflows are never automatically rerun unless you explicitly add them to allow_workflows in your config.
Q: Does the budget count manual retries?
A: Yes. Both automatic reruns and manual retries (/flaketriage retry) count toward the daily budget for that repo.
Q: What happens to my data if FlakeTriage’s storage becomes unavailable?
A: FlakeTriage degrades gracefully to in-memory mode. Delivery deduplication may miss duplicates, and budget counts may reset, but the service continues to function. The /healthz endpoint reports degraded: true in this state.
Q: Can I use FlakeTriage on private repositories? A: Yes. FlakeTriage works on both public and private repositories. It only requires the permissions listed in the Required Permissions section.
Q: How do I completely stop FlakeTriage for one repo without uninstalling?
A: Set enabled: false in .github/flaketriage.yml:
enabled: false
marketplace_purchase (action: changed) webhook and updates the installation’s tier in the database.pending_change event.changed event with the new plan, and FlakeTriage updates the tier.budget_daily config exceeds the new tier’s limit, the effective budget is automatically capped: effective = min(config budget_daily, new tier limit).cancelled event.If you’re not using GitHub Marketplace (e.g., self-hosted deployment), you can change tiers by updating the FLAKETRIAGE_PLAN_TIER environment variable and restarting the server:
fly secrets set FLAKETRIAGE_PLAN_TIER="team" --app your-flaketriage-app
Valid values: free, starter, team, business.
For privacy and data policy, see Privacy & Data Policy. For source code and architecture details, see the FlakeTriage repository.