AGI, Inc.: Cache Warming Guide for CDNs and Application Caches

Overview

Cache warming (also called cache warmup, prewarming, or priming) proactively seeds caches. The goal is that real users avoid cold‑start misses and slow first‑hit responses.

The mechanics work best when aligned with HTTP standards. RFC 9111 on HTTP caching defines semantics for Cache-Control, ETag, and Vary. RFC 9110’s HTTP semantics underpin how requests and responses should be interpreted by intermediaries. When your warmup cache requests are consistent with these rules, you maximize hit rate and minimize wasted origin work.

Higher cache hit ratios generally reduce origin bandwidth and improve tail latency. For SREs, backend engineers, and technical SEOs, the outcome is straightforward. You get lower p95/p99 TTFB, smoother traffic spikes, and more predictable cost envelopes.

This guide shows when to use cache warming, how to do it safely at CDN and application layers, and how to measure ROI.

Definition and purpose of cache warming

Cache warming is the practice of preloading content into one or more cache layers. These may include a CDN, reverse proxy, application cache, or database page cache. The aim is simple. The first real user should see a cache hit rather than a cold miss.

A warmup cache request is an automated HTTP request (or key fetch for data caches). It primes the cache using headers that match real traffic. The goal is to reduce first‑hit latency, smooth spikes, and stabilize TTFB across deployments and regional traffic surges.

The fundamentals of what can be cached, for how long, and under what revalidation rules come from standards and widely accepted docs. If you’re new to these concepts, anchor on RFC semantics first. Then layer in operational practices like scheduling, throttling, and monitoring hit rates.

The best warmers respect freshness directives, avoid overloading the origin, and are easy to roll back.

When cache warming is and isn’t necessary: a decision framework

Warming helps most when content is popular, cacheable, and experiences costly cold misses. Dynamic pages with heavy database or API fan‑out are common examples.

It is less necessary when you already use patterns like stale‑while‑revalidate (SWR), incremental static regeneration (ISR, such as Next.js ISR), write‑through caches, or event‑driven updates. These approaches keep caches hot with minimal user‑visible latency.

Use this quick mental model. Choose cache warming when your p95/p99 TTFB spikes after deploys. It also helps when seasonal or campaign traffic concentrates on a known hotset, your CDN hit ratio drops after TTL expiry, or your origin autoscaling thrashes on cold hits.

Prefer SWR/ISR or write‑through when content can be generated ahead‑of‑time. They also fit when freshness tolerance allows background updates or when you can invalidate precisely on data changes.

When in doubt, run a small canary warmup for top URLs. Compare p95/p99 latency and hit ratio against SWR/ISR alone before committing.

HTTP caching mechanics that affect warming

Effective warmers work with HTTP, not around it. Two principles matter most. First, how cache keys are computed, including Vary semantics. Second, how freshness and revalidation are expressed with Cache-Control, ETag, and Last‑Modified.

See MDN’s Cache-Control reference for directive behavior and edge cases.

A standards‑aligned warmer reduces origin load by sending conditional requests. It also matches real‑user headers so the warmed variant is the one users hit.

Misaligned warmers inflate traffic. For example, they may miss cache because of mismatched Accept-Language. They can also keep stale content around too long.

Invest a few minutes in header design. You will save hours of troubleshooting later.

Cache keys, Vary, and header design

The cache key determines whether two requests map to the same stored response. While URL is always part of the key, most CDNs and proxies also include host and a subset of headers specified by Vary.

Overly broad Vary values, like Vary: Cookie, fragment the cache and crater hit ratios. Too narrow values risk serving the wrong variant.

Normalize and whitelist only the headers that truly affect representation. Commonly this includes Accept-Encoding and sometimes Accept-Language or Device‑Type. Strip nondeterministic headers such as Date and tracing IDs.

Design a header policy at the edge that makes keys predictable across regions. For instance, normalize Accept-Encoding to a small set (br, gzip). Map language diversity into a limited set of supported locales.

Keep Vary small and stable. If you must vary on cookies, partition by a signed, scope‑limited cookie rather than per‑user session cookies.

Cache-Control directives, ETag/Last-Modified, and revalidation

Cache-Control governs how long responses are fresh with max-age. It also tells whether shared caches may store them with public vs private. Directives also guide how to handle stale content using SWR and SIE.

ETag and Last‑Modified enable conditional requests via If-None-Match and If-Modified-Since. CDNs can revalidate cheaply with 304 responses rather than refetching the body.

A good warmer uses these semantics. It tops up freshness and revalidates hot URLs without blasting the origin with full downloads.

In practice, warmers should first try conditional requests against the origin or shield. They should fall back to full fetches only on change.

This pattern increases hit ratio and reduces egress. If your origin does not emit ETag or Last‑Modified, add them. Revalidation is the difference between a sustainable warmup and a bandwidth‑heavy crawler.

Stale-While-Revalidate and Stale-If-Error interactions

stale-while-revalidate allows serving cached content briefly beyond max-age while refreshing in the background. stale-if-error allows serving stale when the origin fails.

These directives complement cache warming. They cover long‑tail URLs and sudden spikes you didn’t pre-seed.

Warming still helps tail latency. It ensures your highest‑traffic URLs are hot across regions at deploy time or during seasonal campaigns.

When SWR is available, reduce the warmup set to the truly critical hotset. Rely on SWR for the remainder.

Keep SWR windows short enough to meet freshness SLAs. Pair with clear invalidation when correctness matters, such as price changes.

Choosing URLs to warm from real traffic

The best warmup lists come from production signals rather than guesses. Mine access logs or analytics to identify the smallest set of URLs responsible for the majority of requests and revenue.

Blend in sitemaps or catalog feeds to catch long‑tail essentials. Build pipelines that avoid handling PII and only use the minimal fields necessary, such as path, method, status, and bytes.

Operate on recency windows, for example the last 24–72 hours. This helps the warmer adapt to trends.

Monitor how the selected hotset affects hit ratio at p80/p90 traffic share. Adjust size to balance benefit with cost.

If your site is bursty, such as news, incorporate short‑interval windows. Dampening sudden spikes avoids thrashing.

Log mining and percentile-based hotsets

Extract candidate URLs by counting request frequency over a sliding window. Then select the smallest set covering, say, 80–90% of traffic.

This percentile‑based hotset often includes category pages, top SKUs, and evergreen articles.

To handle bursts, cap per‑URL additions per interval. Require sustained popularity across multiple windows before inclusion.

Validate that chosen URLs are cacheable. Cache-Control must permit storage. Confirm that their Vary policies are stable.

Where possible, include the specific variants you intend to serve, such as language locales or device classes. This avoids priming the wrong representation.

Sitemaps, product feeds, and seasonal lists

Sitemaps and CMS feeds provide a reliable baseline of canonical, crawl‑worthy URLs that change infrequently. Product or content feeds add rapidly changing inventory or trending pieces.

During seasonal events, maintain a short supplemental list for campaign landers and featured items. This ensures instantaneous hits.

Blend these sources with your mined hotset to reduce misses across both head and long‑tail content. Keep the combined list deduplicated, capped, and time‑boxed so the warmer’s size remains proportional to its impact.

Avoiding Vary explosions and key mismatches

Vary mistakes silently dilute your warmup. Use this short checklist to prevent fragmentation and unintended misses:

Whitelist only stable, representation‑affecting headers in Vary (e.g., Accept-Encoding, a bounded Accept-Language).
Normalize Accept-Encoding to a small set and compress consistently.
Avoid Vary: Cookie unless scoped to a signed, partitioning cookie.
Ensure content negotiation (language, device) maps to a bounded set of variants.
Confirm CDN cache key configuration matches your Vary policy.

Run spot checks by fetching the same URL with your warmup headers and with a real browser’s headers. Confirm key alignment before scaling up.

CDN and multi‑region seeding strategies

Warmers should seed caches in a way that minimizes duplicate origin fetches across points of presence (PoPs). Tiered caching and an origin shield reduce origin load by consolidating revalidations and misses through a single upstream.

The concept is well described in the Fastly origin shield guide. The approach applies broadly. Fetch once to a shield, then fan out to edges.

Seeding order matters. Instead of firing requests from every PoP, stage your warmup. Seed the shield, then a few regional edges, then the remainder.

This allows revalidation traffic to be absorbed at the shield. Edges pull from the intermediate tier rather than the origin.

Tiered caching, origin shield, and PoP order

Think of seeding as a controlled rollout:

Warm the origin shield or mid‑tier cache first with conditional requests.
Seed one or two high‑traffic regions to verify headers, TTLs, and hit‑rate uplift.
Fan out to remaining regions in waves, prioritizing where your users are.

Between waves, validate hit ratios and origin QPS. If error rates or egress spike, pause and debug before proceeding.

Per‑region throttles and cost caps

Guardrails prevent thundering herds and surprise bills. Rate‑limit warmup QPS per region and cap total daily warmup bytes.

If your CDN supports per‑PoP concurrency limits, set them conservatively and ramp gradually. Maintain a global budget for warmup egress.

If you approach the cap or see elevated 5xx rates, back off with exponential delay. Resume later when conditions normalize.

Combining PoP staging with throttles keeps your origin healthy. You still achieve high hit ratios where it matters most.

Authenticated and personalized content: safe warming patterns

Authenticated or personalized pages require extra care to avoid leaking PII or cross‑user data. The rule of thumb is strict. Never cache what must not be cached.

When caching is allowed, ensure the cache key isolates users or cohorts. The OWASP REST Security Cheat Sheet’s caching guidance reinforces these principles.

Warmers for authenticated content should use signed, scope‑limited tokens or cookies that represent a safe cohort. Examples include locale or plan tier.

Caches must be configured with strict Vary and TTLs. Always respect Cache-Control and avoid touching endpoints that declare no-store.

Header whitelists, signed cookies, and token-scoped warmers

Design your warmer to include only the headers that produce the intended variant. That may include Accept-Language, a device hint, and a signed cookie indicating plan tier.

Use a signed cohort cookie rather than a per‑user session cookie. The cache key should not collide across users.

On the edge, whitelist this cookie for cache key computation. Strip all other cookies to prevent accidental fragmentation.

Test by requesting the same URL with and without the cohort token. Verify isolation before broad use.

If a response contains user‑specific data, do not warm or cache it at all. Prefer per‑user local caches or client‑side storage.

Preventing PII leakage and honoring no-store

Resources marked with Cache-Control: no-store must not be cached per RFC 9111 (HTTP caching). Treat any endpoint that returns personal data or payment information as non‑cacheable by default unless explicitly designed for shared caches.

Mitigate risk by adding an allowlist of cacheable paths. Enforce response header checks and deny on no-store or private.

Scrub or hash path parameters in logs. Record audit entries when a warmer accesses authenticated endpoints.

Regularly review the allowlist with security and privacy stakeholders.

Stack‑specific implementations and examples

The safest warmers are close to the edge. They respect your platform’s cache key configuration and are easy to ship, roll back, and observe.

Below are platform‑aware patterns with practical defaults, independent of vendor preference. Keep them simple. Fetch with the right headers, revalidate conditionally, throttle, and log outcomes.

In all cases, start with a canary warmup for a small hotset and a single region. Verify header normalization and hit‑rate impact before expanding.

Maintain per‑platform runbooks so on‑call engineers can triage failures quickly.

Cloudflare Workers and Cache API warmers

A Worker can pull a curated URL list from durable storage. It can make conditional fetches with If-None-Match or If-Modified-Since and write to the Edge Cache via the platform’s Cache API.

Normalize headers at the Worker, such as Accept-Encoding, to match user traffic. Ensure you honor Cache-Control and ETag.

Use KV or durable objects to coordinate per‑region throttles and progress checkpoints.

For authenticated cohorts, attach only a signed, scope‑limited cookie to the warmup cache request. Strip all other cookies.

Log cache statuses, such as HIT, MISS, and REVALIDATED, to estimate hit‑rate gains.

Fastly VCL and Edge warmers

Use a controlled client to issue warmup fetches through Fastly. Let VCL logic handle cache key normalization, such as header whitelists and device hints, and shielding.

Enable a shield and ensure your warmup requests hit it first. Configure grace and stale-while-revalidate windows so background updates cover long‑tail pages while you seed the hotset.

Track X-Cache and Age headers to validate that your warmup is populating shield and edges as expected. If you see Vary-driven fragmentation, adjust your request headers and VCL normalizations before expanding.

AWS CloudFront with Lambda@Edge

Warm through CloudFront with viewer or origin request functions that normalize headers and cookies consistently. Attach an origin shield region near your origin and seed it first.

Use Lambda@Edge to drop nondeterministic headers. Scope any cookies used for cohort‑based warmups.

Observe CloudFront’s cache statistics and origin metrics. Confirm that most warmup traffic terminates at the shield rather than the origin.

If 304 rates are low, add or fix ETag or Last‑Modified responses at the origin.

Nginx/Varnish configurations

For Nginx with proxy_cache or for Varnish, define a deterministic cache key. Combine host, path, and normalized headers. Keep Vary minimal.

In Varnish, ensure vcl_hash maps only the headers that matter. In Nginx, use map directives to normalize Accept-Encoding and locales.

Enable health checks and graceful failover so the warmer does not exacerbate origin incidents.

Use grace or stale-while-revalidate equivalents to blend background refresh with targeted warmups. Monitor upstream response codes and backend timing to tune concurrency safely.

Redis/Memcached priming

For application data caches, prime hot keys that are expensive to compute. Pull a hotset from recent production keys.

Backfill them via a job that respects rate limits and retries. Coordinate with the source datastore to avoid stampedes by using request coalescing, locks, or single‑flight primitives.

Tune eviction policies and TTLs so warmed keys survive real traffic periods. Measure hit ratio and CPU savings on the application tier to ensure priming is worth the cost.

WordPress/WooCommerce and Next.js ISR coordination

On WordPress/WooCommerce, schedule a task via WP‑Cron or an external scheduler to warm top posts, category pages, and key product pages. Match real Accept-Encoding and language headers.

Pair this with a CDN that honors Cache-Control and ETag from your site.

For Next.js ISR, rely on regeneration for broad freshness. Use a focused CDN warmup for hero pages and key routes right after deploys or campaign launches.

Keep warmers small and precise. ISR plus SWR often covers the rest.

Scheduling, rate limiting, and orchestration

Warmers should be orchestrated like any production job. They must be scheduled predictably, rate‑limited, and resilient to failures.

Tie warmups to deployments and seasonal calendars. Caches should be hot when it matters, not hours later.

Make work idempotent and checkpoint progress so retries do not duplicate load.

Run warmers in queues rather than as one big batch. A queue lets you apply per‑region QPS caps, backpressure on errors, and circuit breakers if origin health degrades.

Cron, queues, and backoff with jitter

A simple, safe pattern looks like this:

A scheduled trigger enqueues URL batches per region with size and concurrency caps.
Workers fetch with conditional requests first; on transient failure, retry with exponential backoff and jitter.
Checkpoint progress and pause when error thresholds or budget caps are reached.

This avoids synchronized thundering herds and spreads load over time. Jitter prevents aligned retries from amplifying spikes.

Rolling deploys, autoscaling, and warm starts

Coordinate warmers with rolling deploys by pre‑seeding the hotset just before shifting traffic to new instances or a new edge configuration. For origins that autoscale, warm after capacity has scaled out to avoid triggering additional bursts.

If your platform supports instance warm starts, populate app or DB caches during instance initialization before admitting traffic. This approach minimizes cold capacity and reduces p95/p99 regressions observed immediately after deploys or failovers.

Monitoring and measuring effectiveness

Measure what matters. Focus on cache hit ratio, p95/p99 TTFB, origin QPS or egress, and revalidation rates.

Tie these to user‑centric performance outcomes as outlined in Google’s performance fundamentals. A warmer that looks busy but doesn’t move these KPIs is wasted cost.

Instrument warmers to emit success and error counts, conditional vs full fetch rates, bytes transferred, and timeouts. Correlate changes to deploys, config updates, and campaigns to understand causality, not just correlation.

Cache hit ratio, p95/p99 latency, and origin load

A rising cache hit ratio with flat or lower p95/p99 latency indicates effective warming. If hit ratio climbs but tail latency doesn’t improve, investigate key fragmentation or origin bottlenecks elsewhere.

Track origin QPS and egress. Successful warmers shift work from origin to edge, often reducing origin CPU and IO.

Balance freshness with performance by tuning TTLs and SWR windows. If freshness SLAs are strict, rely more on revalidation and targeted warmups than on long max-age.

Alerting on warming failures and backoff policies

Alert on sustained drops in hit ratio for the warmed hotset. Also watch for spikes in 5xx from the origin or shield and repeated warmup retries.

Set circuit breakers that pause warmers when error or budget thresholds are exceeded. Resume with cautious backoff when health recovers.

Define clear SLOs for warming jobs. For example, 95% of the hotset successfully warmed within 15 minutes of deploy. Page on‑call only when user‑visible impact is likely.

Cost and ROI modeling

Warmers consume requests and egress. The ROI comes from reducing expensive origin work and improving conversions tied to performance.

Pricing varies by provider and region, as in AWS CloudFront pricing. Build a simple model with your numbers, then validate with a canary.

Track the warmup’s marginal costs separately from baseline CDN usage. That helps you defend budgets and tune warmup scope without conflating with organic traffic.

Estimating egress, request costs, and breakeven

Estimate monthly net savings using:

Savings = (Miss_cost_before − Miss_cost_after) − Warmup_cost
Miss_cost = Origin_CPU_cost + Origin_egress_cost + Request_processing_cost
Warmup_cost = (Warmup_requests × $/request) + (Warmup_bytes × $/GB)

Inputs to gather: hit‑rate delta before vs after warming, average object size, egress $/GB by region, backend CPU or time per miss, and CDN request pricing.

As an example, improving hit ratio from 70% to 88% on a 1 TB/month property might avoid ~180 GB of origin egress. At $0.09/GB, that’s ~$16 of egress saved per TB, plus backend CPU.

Subtract the cost of your warmup requests and bytes to find breakeven. If ROI is marginal, shrink the hotset or rely more on SWR.

Budgeting warmup bandwidth and setting caps

Set daily and regional caps for warmup bytes and requests based on the ROI model. Implement automated pauses when you hit 80–90% of budget or when revalidation rates drop below a threshold. Low revalidation may indicate poor headers or staleness logic.

Re‑evaluate hotset size monthly. Adjust to changing traffic patterns and pricing.

This ensures you control spend proactively. It also prevents warmers from running unchecked during incidents.

SRE runbooks, reliability, and compliance

Treat warmers as first‑class production services. Document health checks, retries, idempotency keys, WAF allowlist entries, and audit logging.

Keep data collection to the minimum needed for selection and measurement. Ensure any processing of authenticated endpoints is reviewed for privacy and compliance risk.

Runbooks should enable rapid mitigation. You should be able to pause, drain, back off, and safely resume.

Include one‑page quick starts and deep dives so on‑call engineers can choose the right path under pressure.

Health checks, idempotency, and retries

Build reliability with a few guardrails:

Health checks for origin, shield, and representative edge PoPs before large warmups.
Idempotency keys for batch jobs to prevent duplicate work on retries.
Exponential backoff with jitter and a circuit breaker to pause under elevated errors.

Add synthetic verification, such as spot checks of headers and cache status, after each wave. This helps catch regressions early.

GDPR/CCPA considerations and audit logging

For authenticated endpoints, document purpose and lawful basis for warming. Honor user consent signals and avoid processing or storing PII in warmup pipelines.

Limit retention of access logs and selection data to what’s needed for analysis. Redact or hash identifiers to reduce risk.

Maintain audit logs of warmer access to protected resources. Include who approved the allowlist, when it last changed, and how failures were handled.

Review these controls periodically with security and privacy teams.

Case studies and reproducible benchmarks

Benchmarking your own stack builds confidence and uncovers surprises. Define a method you can repeat.

Pick a hotset size, such as 1k and 10k URLs. Measure a baseline week. Run a controlled warmup for a week. Then compare hit ratio, p95/p99 TTFB, 304 rates, and origin CPU/IO.

Keep all other variables constant. Document headers, TTLs, and populations used.

Run the test in at least two regions to validate multi‑region behavior. Publish your method internally so future teams can replicate after major platform changes.

Hotset sizes vs hit‑rate gains

Expect diminishing returns as hotset size grows. A 1k‑URL hotset on a media site might lift edge hit ratio from 72% to 86%. Expanding to 10k URLs may lift it further to 90% but at four times the warmup cost.

The “right size” is where incremental hit‑rate gain justifies additional requests and bytes. Many teams find p80–p90 coverage a sweet spot.

If your traffic is highly skewed, smaller hotsets deliver outsized gains. Use percentile‑based selection and revisit quarterly. Hotsets drift with seasons and catalogs.

Impact on TTFB and origin CPU/IO

Successful warmups reshape tail latency. In e‑commerce, we often see p95 TTFB drop 20–40% on warmed category and product pages.

Origin CPU is often reduced by double digits as fewer dynamic renders occur. Watch p99 as well. Spiky misses often hide here and respond best to seeding and SWR.

Attribute improvements carefully. Tag warmed URLs and compare their metrics to a control group. This ensures you measure the warmer’s impact, not unrelated deploys.

Troubleshooting guide

When warmers don’t move the needle, the root cause is usually key mismatch, Vary fragmentation, or bypassed caches. Troubleshoot systematically. Verify headers, inspect cache status, and confirm that the warmed variant matches real traffic.

Resist the temptation to just “warm more.” Fix alignment first.

Keep a short, prioritized checklist in your runbook. Run it after each config change or platform upgrade. Small header mistakes create big cost and latency regressions.

Diagnosing low hit‑rate after warmup

Start by fetching a warmed URL with your warmer’s headers and with a real browser’s headers. Compare cache keys and Vary logic.

Inspect response headers for Cache-Control, ETag or Last‑Modified, and Age. Confirm freshness and revalidation behavior.

Check TTLs. Too short and you’ll churn. Too long and you may serve stale.

Look for 200 full‑body responses instead of 304 revalidated responses during warmups. If 304 rates are low, add conditional headers or implement ETag or Last‑Modified at the origin.

Confirm no intermediate is bypassing cache due to cookies or authorization headers you didn’t account for.

Header bloat and cache‑key mismatches

Non‑deterministic or overly broad headers are frequent culprits. Trim Accept-Language to supported locales. Normalize Accept-Encoding. Remove tracing headers from the key.

Avoid Vary: Cookie unless it’s a scoped, signed cohort cookie. Ensure your CDN or app cache uses the same whitelist.

After normalization, re‑run a small canary warmup. Validate that the cache key is stable across regions and time.

Only then expand to the full hotset and multi‑region rollout.