How PgBeam's Query Cache Works

PgBeam's query cache is designed for one goal: serve repeated read queries from the nearest region without touching the upstream database. This post explains the architecture in detail.

Caching also pays off when the client is an AI agent. Agents re-ask the same questions constantly, so the cache absorbs the repeats. It pairs with the safe-access guardrails PgBeam enforces on every agent query.

Two-layer cache

PgBeam's cache architecture supports two layers, both operating within a single region:

L1: Process-local cache

Every proxy process maintains a high-performance concurrent in-memory cache. L1 is always enabled. Lookups cost single-digit microseconds.

The L1 cache is per-process, so different proxy instances in the same region may have different cache contents. This is acceptable because L1 is a hot-path optimization, not a source of truth.

L2: Regional shared cache

L2 is an optional shared cache layer backed by a regional data store. When enabled, it provides sub-millisecond lookups and ensures that a cache entry populated by one process is available to all processes in the same region. When L2 is not configured, each process operates independently with L1 only.

The lookup order when both layers are active: L1, then L2, then upstream database. On a cache miss at both layers, the result from the upstream is stored in both L1 and L2.

Stale-while-revalidate

PgBeam caches use stale-while-revalidate (SWR) semantics, similar to HTTP cache headers:

maxAge: how long a cached result is considered fresh. During this window, the cache serves the result directly.
swr (stale-while-revalidate): after maxAge expires, the stale result is still served to the client, but a background refresh is triggered. Once the refresh completes, subsequent requests get the updated result.

/* @pgbeam:cache maxAge=300 swr=60 */
SELECT * FROM products WHERE active = true;

This query's results are fresh for 5 minutes. After that, stale results are served for up to 60 more seconds while a background query refreshes the cache. After 6 minutes total, the cache entry expires completely.

For frequently accessed queries, SWR avoids cache-miss latency spikes. The background refresh happens asynchronously while the stale result is served.

Early refresh

PgBeam adds a proactive optimization on top of SWR: if a cache entry has been hit 3 or more times and its age exceeds 75% of maxAge, PgBeam triggers a background refresh before the entry goes stale. This keeps popular queries permanently fresh without any client ever seeing stale data.

For example, with maxAge=300: after 225 seconds, if the entry is still being actively read, PgBeam preemptively refreshes it. The entry never enters the stale window, and the application sees fresh results continuously.

Cache key computation

PgBeam computes cache keys as a 128-bit hash derived from five components: tenant ID, database name, username, SQL text, and parameter values. Two independent hash digests prevent collisions.

Before hashing, PgBeam strips its own annotation comments (@pgbeam:cache, @pgbeam:replica) from the SQL text. The remaining SQL is hashed as-is, including whitespace. This means the cache is exact-match: two queries must be textually identical (after annotation stripping) to share a cache entry.

-- These share a cache entry (same text, annotation stripped before hashing):
/* @pgbeam:cache maxAge=300 */ SELECT * FROM products WHERE active = true;
SELECT * FROM products WHERE active = true;

-- These do NOT share a cache entry (different whitespace):
SELECT * FROM products WHERE active = true;
SELECT *   FROM   products   WHERE   active = true;

-- These do NOT share a cache entry (different parameter values):
SELECT * FROM products WHERE id = 1;
SELECT * FROM products WHERE id = 2;

The tenant and credential components mean that cached results are never shared between different projects or database users, even if the queries are identical. This keeps cache isolation strict across tenants.

Cache bypass conditions

Caching is automatically bypassed for:

Transactions: any query inside BEGIN/COMMIT is never cached.
Write operations: INSERT, UPDATE, DELETE, DDL statements bypass the cache.
Explicit bypass: /* @pgbeam:cache noCache */ annotation or SET pgbeam.cache = off.

This is intentional. PgBeam only caches reads, and only when explicitly enabled. Writes and transactional reads always go directly to the upstream database.

No cross-region sync

Each region's cache is independent. A cache entry in us-east-1 does not propagate to ap-southeast-1. Each region populates its cache independently from upstream queries.

This is a deliberate design choice. Cross-region cache synchronization would add complexity and latency without meaningful benefit for most workloads. The cache's value comes from serving local reads quickly, not from maintaining global consistency.

Configuration

Cache rules can be configured three ways:

Dashboard Cache Rules: enable caching for specific query patterns from the dashboard.
SQL annotations: inline cache directives in your queries: /* @pgbeam:cache maxAge=300 swr=60 */.
Session variables: SET pgbeam.cache = on to enable for all queries in a session.

The dashboard approach is recommended for production. It gives you visibility into which queries are cached and their hit rates, without requiring code changes.

The priority chain when multiple configurations overlap: SQL annotations take precedence over session variables, which take precedence over dashboard rules, which take precedence over project-level defaults. An explicit noCache annotation always wins.

Debugging cache behavior

PgBeam supports a debug mode that sends cache status as PostgreSQL NOTICE messages:

SET pgbeam.debug = on;

/* @pgbeam:cache maxAge=60 swr=30 */
SELECT * FROM products WHERE active = true;
-- NOTICE: pgbeam:cache hit age=1.2s ttl=60s swr=30s

The NOTICE includes whether the result was a hit, miss, or stale serve, along with the entry's age and configured TTL. This makes it easy to verify cache behavior during development without checking external dashboards.

When to cache

Query caching is most effective for:

Read-heavy workloads: dashboard queries, product catalogs, configuration lookups.
Queries with stable results: data that changes infrequently (seconds to minutes).
High-frequency queries: the same query executed hundreds or thousands of times per minute.

It's less useful for:

Write-heavy workloads: writes always bypass the cache.
Queries requiring strong consistency: if you need the absolute latest data on every read.
Unique queries: queries that are never repeated (different parameters each time).

See the caching documentation for configuration details and the full list of supported annotations.

If you're deploying on Vercel, query caching pairs well with connection pooling to eliminate both connection overhead and repeated read latency.

Try PgBeam or check the live benchmarks.

How PgBeam's Query Cache Works

Get started with PgBeam