Email Delivery Resilience: Timeout, Retry & Circuit Breaker
Email Delivery Resilience: Timeout, Retry & Circuit Breaker
Released in v1.0.451 · SCR-04
Overview
Transactional email (sent via Resend) is used to deliver HMRC deadline alerts through the Inngest critical-notification-email function. In earlier versions, the email client used a bare fetch() call with no timeout, retry logic, or circuit breaker. A slow or unresponsive Resend API could stall the function for the full platform timeout — exactly the wrong behaviour at quarterly submission deadlines.
v1.0.451 wraps the Resend API call in a three-layer resilience stack that ensures fast failure, transparent retries, and full graceful degradation. The sendEmail() function signature is unchanged.
How It Works
Every call to sendEmail() passes through three layers in order:
sendEmail()
└─ circuitBreakers.resend.execute() ← Layer 3: fast-fail if service is known-down
└─ fetchWithRetry(fn, { attempts: 3 }) ← Layer 2: retry 5xx / network errors
└─ fetchWithTimeout(url, 10 s) ← Layer 1: abort if Resend stalls
Layer 1 — Timeout (10 seconds)
A fetchWithTimeout call wraps the underlying fetch() with an AbortController deadline of 10 seconds, registered as TIMEOUT_MS.RESEND in src/lib/fetch-with-timeout.ts. If Resend does not respond within 10 s, the request is aborted and the error propagates to Layer 2 for retry evaluation.
10 s was chosen as a reasonable upper bound for a transactional email API — generous enough to handle momentary latency spikes while short enough to prevent indefinite blocking of background functions.
Layer 2 — Retry with exponential back-off
fetchWithRetry attempts the call up to 3 times (1 initial attempt + 2 retries) using exponential back-off:
- Initial delay: 200 ms
- Jitter: ± 20 % to spread retries under load
- Retried: 5xx responses, network errors, and timeout aborts
- Not retried: 4xx responses (bad API key, invalid recipient address, etc.) — these are permanent failures and surfaced immediately via
HttpStatusError
Layer 3 — Circuit Breaker
The outermost layer is a CircuitBreaker instance (circuitBreakers.resend) configured with a lenient threshold, because email delivery is best-effort and an in-app notification fallback is available:
| Parameter | Value | Rationale |
|---|---|---|
failureThreshold | 10 | Tolerate minor transient blips before tripping |
windowMs | 60 000 ms (60 s) | Failure count window |
resetAfterMs | 30 000 ms (30 s) | Resume delivery quickly once Resend recovers |
When the circuit is OPEN, sendEmail() rejects the call immediately (fast-fail) without waiting for a network response, logs a warning, and returns null — the same graceful-degradation contract as when RESEND_API_KEY is absent.
Health Endpoint
The Resend circuit breaker state is now included in the platform's health status response. The getAllCircuitStatus() function in src/lib/circuit-breaker.ts returns:
{
"hmrc": { ... },
"agentos": { ... },
"truelayer": { ... },
"resend": {
"state": "CLOSED",
"failureCount": 0,
"lastFailureTime": null
}
}
Monitor the resend key to detect sustained Resend API degradation before it impacts HMRC deadline notification delivery.
Error Handling Behaviour
| Scenario | Outcome |
|---|---|
| Resend responds successfully | Returns { id: string } |
| Resend stalls > 10 s | Request aborted; retried up to 2 more times; returns null after exhaustion |
| Resend returns 5xx | Retried up to 2 more times with back-off; returns null after exhaustion |
| Resend returns 4xx (e.g. 401, 422) | Not retried; logs client error; returns null immediately |
| Circuit is OPEN (≥ 10 failures in 60 s) | Fast-fail; logs warning; returns null immediately |
RESEND_API_KEY / RESEND_FROM_DOMAIN not set | Skipped; logs warning; returns null immediately |
In all failure cases the function returns null — callers should treat a null response as a failed-but-handled delivery and rely on the in-app notification fallback for critical HMRC alerts.
No Breaking Changes
The sendEmail() signature is unchanged:
export async function sendEmail(opts: {
to: string;
subject: string;
html: string;
from?: string;
}): Promise<{ id: string } | null>
Existing callers require no modifications.