The Event
A cold-email automation system was configured, as many are, with a deliberately low daily send limit — five messages per day during a warmup period, with the limit scheduled to rise gradually over the following weeks. Everything about the setup was conservative. A daily cap. A small queue. A restricted set of sending days. A disciplined warmup schedule.
On a Sunday evening, over a five-hour span, that system sent 309 emails.
Not 50. Not 100. Three hundred and nine, against a stated daily cap of five.
The incident was discovered only by accident — a newly-added observability metric happened to surface a number that couldn't possibly be right. Had the metric not been added that afternoon, it's possible the blast would have gone unnoticed for days.
What follows is a breakdown of how the failure happened, what architectural decisions allowed it, and what every cold-email sender can put in place to make sure their equivalent is impossible — not just unlikely.
Failure Point 1: Timezone Mismatch Between Writer and Reader
The system's daily-cap enforcement worked by counting messages sent "today." A common pattern: the send service writes each sent message to a log with a timestamp, and a separate counter reads the log and returns the count of messages with today's date.
Writer and reader disagreed on what "today" meant.
The writer timestamped every message in UTC. The counter queried for messages matching the current date in Pacific Time. For most of the day, these overlap enough that the counter is roughly correct. But there's a seven-hour window — from around 5pm Pacific through midnight Pacific — when UTC has already rolled over to the next day while Pacific is still on the previous one.
During that window, the counter asks the log: "how many rows have today's Pacific date?" The writer's recently-logged rows all have tomorrow's UTC date. The counter returns zero. The cap check compares zero against the daily limit of five — and cheerfully allows five more sends. Then five more, the next time the cron runs. Then five more.
The pattern: Any system that writes timestamps in one timezone and reads them in another will eventually produce bugs. Pick one — UTC everywhere, or local everywhere — and enforce it. The fastest way to catch this in your own code is to grep for any place that handles dates with a default timezone, because the default is usually what changes between environments.
Failure Point 2: The Counter Was Stateless
The broader design issue: the daily-send counter had no in-memory state. Every time the cap check ran (every five minutes, on a cron), it re-queried the log to count today's sends. If the log query returned the wrong answer — for any reason — the cap reset.
A belt-and-suspenders design would have included both checks: an in-memory quota that decrements on every send, AND a log-based query that verifies against history. If either check fails, the system refuses to send.
The in-memory quota would have survived a log query that returned zero. It wouldn't have reset every five minutes. Yes, it would reset if the process restarted — but in that case the log query would become the safety net.
Two independent checks make the whole system tolerant of either one being wrong. One check alone is only as good as the quality of that check.
Failure Point 3: Queue Depth Had No Cap
The third failure — and the one that turned a bug into a disaster — was the pending queue.
The system generated outreach on a separate schedule, independent of sending. Leads discovered on any given day were queued up with a scheduled send date. Nothing about the queueing process consulted the daily send limit; it just marked each lead "ready to send on day X."
Over a weekend, the queueing process ran multiple times and stacked hundreds of leads onto the following Monday. By the time Monday arrived, the queue contained far more ready-to-send messages than the daily cap could possibly process — but nothing in the architecture prevented the queue from growing that large in the first place.
When the cap check failed to hold (failure points 1 and 2), the queue was already full. The sender just drained it.
A simple queue-depth cap would have contained this. "No more than N messages may be in the pending queue at any time" — where N is a small multiple of the daily send limit — means that even if the sender decides to send everything available, the total damage is bounded. Five bad sends instead of 309. The design philosophy is called "small blast radius": when something goes wrong, how much can it break before anyone notices?
Failure Point 4: A Config Change Nobody Thought Was Risky
A few days before the incident, an operator widened the sending schedule from two days per week (Tuesday and Thursday) to the full business week (Monday through Friday).
In isolation, this change was fine. All it did was allow the system to send on more days. It didn't change the daily cap. It didn't change the counter logic. It didn't touch any of the code paths where the bugs lived.
What it did change was the shape of the queue. Before the change, the queueing process only scheduled sends on Tuesdays and Thursdays. The maximum pending queue at any time was small, because there were only two valid target days per week. After the change, any weekday became a valid target — and over a weekend, the queue could grow much larger than before.
The latent timezone bug had existed for weeks without causing damage. What made Sunday evening special was that it was the first Sunday after the schedule widening, meaning it was the first time the pending queue had accumulated that many messages aimed at a single upcoming weekday.
The incident required three preconditions to align: the timezone bug (always present), the newly-expanded schedule (recent), and the weekly boundary (inevitable). Separately, each was harmless. Together, they ignited.
The lesson: "Safe-looking" config changes can be ammunition for latent bugs. Before expanding any throughput parameter — send schedule, queue depth, concurrency, rate limits — walk through the mental checklist: what bug, if it exists, would this change make worse? It doesn't prove anything, but it surfaces the ugly combinations that are otherwise invisible until they happen.
The Diagnosis — What Investigation Looked Like
The incident was discovered through observability, not alerting. Nothing in the system detected the blast in real time. What surfaced it was a newly-deployed metric showing an unsubscribe rate of 18%, a number that was way too high for the actual send history. Digging into the unsubscribe list to understand the rate revealed the second finding: hundreds of unsubscribe entries created within a few hours, pointing at sends that should never have happened.
Several observations from diagnosing the incident that transfer to any cold-email operation:
The "Sent" Confirmation Was in Multiple Places
Because the send service wrote to a log before calling the email API, and the email provider kept its own record of every outbound message, there were two independent sources of truth for "was this actually sent?" Cross-checking them confirmed the 309 count exactly — no ghost writes, no lost tracking. The bug was in the cap logic, not the send logic. This saved hours of diagnosis.
Automated Scanners Created False Unsubscribe Signals
A secondary finding: many enterprise email-security scanners crawl all links in inbound mail, including CAN-SPAM unsubscribe links. They do this with garbled, scanner-generated "email addresses" as query parameters. Any unsubscribe handler that accepts those without filtering will rack up artificial unsubscribes — inflating the metric and masking real user intent. Any unsubscribe analytics should filter by source to separate genuine human unsubscribes from scanner noise.
The Sender-Domain Configuration Was Silently Ignored
The sends were supposed to originate from a warmup-isolated domain used only for cold outreach. They actually went from the primary business domain. The reason: the email provider silently ignored the "from alias" header because the sending account wasn't authorized to send-as that address. Every message went from the main account instead. Reputation damage landed where it hurt most — on the active primary domain, not the throwaway warmup one.
Time-Filtered Mail Searches Hid the Incident
When the operator first searched their own Sent folder for recent outbound, the filter looked clean. The reason: Gmail's date filter uses the user's local timezone, not UTC. With the operator in a timezone well ahead of UTC, a date filter of "before today" excluded exactly the window when the blast happened. A five-hour blast in another timezone can be completely invisible to a local-timezone filter — even though the messages are sitting right there in the folder.
What Every Cold-Email System Should Have in Place
These are the architectural patterns that, in combination, would have prevented this incident entirely — or bounded it to five bad sends instead of 309:
Timezone Consistency
All timestamp comparisons across reads and writes use the same timezone. UTC is the safest default for internal machinery; convert to local only for human-facing display.
Redundant Rate Limiting
Two independent rate limit checks. An in-memory quota that can't miss. A log-based query that survives process restarts. If either fails, refuse to send.
Queue Depth Cap
The pending queue has a maximum size, expressed as a multiple of the daily send limit. If the queue is at its cap, the discovery or queueing process refuses to add more. A failed rate limit check cannot send more than the queue's worth.
Sender Identity Verification
Before trusting that a "from alias" will actually be used, send a test message and inspect the resulting message headers. Configuration isn't verification. The header is.
Real-Time Alerting on Anomalous Volume
The system should alert if daily volume exceeds some reasonable threshold — say, three times the daily cap. Even if the cap check fails silently, the volume alert catches it within minutes instead of hours.
Observability That Reveals Uncomfortable Truths
Metrics that only surface the numbers you expect to see are not serving you. A good metric is one that catches your attention when something is wrong, even if you weren't specifically looking for that failure mode. This incident was discovered by a metric that shipped for a completely unrelated reason.
Kill Switches at Multiple Layers
One emergency flag in an environment variable that can be flipped to stop all outbound sends. Stopping the underlying service entirely is cleaner but takes longer; a kill flag can be set in seconds. Test the kill switch in staging, not for the first time during an incident.
A Pre-Production Checklist for Cold Email Systems
Before Sending Even One Production Email
- All timestamp comparisons use a single, explicit timezone
- Daily send cap is enforced by at least two independent mechanisms
- Pending queue has a hard size cap
- Sender-from-alias configuration is verified by inspecting real message headers
- Kill switch environment variable exists and has been tested in staging
- Alerting fires on daily volume above a conservative threshold (e.g. 2× the cap)
Every Week During Warmup
- Verify send count matches expected volume; investigate any gap
- Check actual message headers of sent emails to confirm the from-alias is working
- Spot-check the unsubscribe list for scanner-cascade patterns
- Review rate-limit alerts and any near-misses
Before Any Config Change
- Ask: what latent bug, if it exists, would this change make worse?
- If the change expands throughput in any dimension (days, concurrency, queue depth), add a conservative alert threshold to catch blow-ups
- Deploy during hours when you're awake and able to intervene
Conclusion
A 309-email misfire sounds like a dramatic failure, but it started life as three small decisions each of which looked perfectly reasonable. A timezone shortcut. A stateless counter. An uncapped queue. A Friday afternoon config change. None of them, in isolation, were alarming. What made Sunday evening catastrophic was that all three preconditions aligned for the first time.
The protection isn't paranoia about any single design choice. It's defense in depth. Multiple independent checks. Bounded queues. Alerting that fires on anomalies, not just on errors. Verification of configuration through real headers, not just config files. Kill switches that can stop a runaway process in seconds.
If you operate any kind of cold email automation — for yourself or on behalf of clients — treat this incident as the question: could my system do the same thing, tonight, if the right three things happened to align? If the honest answer is "probably not, but I'm not 100% sure," that's the prompt to shore up the architecture before the combination you didn't anticipate finds you.