Massive Latency Drop: Optimization Cuts API Response Time from 1.4 Secs to 42 ms

May 31, 2026
Massive Latency Drop: Optimization Cuts API Response Time from 1.4 Secs to 42 ms
  • Initial symptom was an API latency spike to 1.4 seconds caused by too many concurrent connections and blocked cache-table updates, despite 45% idle connections, indicating a concurrency-model issue rather than pure resource limits."

  • In reflection, the bottleneck was row rewrites; data layout changes like Parquet sharding and streaming merges had more impact than language choice, underscoring storage-layer improvements first and cautioning against overloading operators with Prometheus knobs while prioritizing measurable latency and memory metrics.

  • To reduce contention and improve data access, the team shifted scoring off Postgres and restructured data by replacing a large JSONB column with a day- and hunt ID–sharded Parquet file, adopting an immutable log approach where hunts write Parquet files read by the API, using the arrow2 library for efficient Parquet-to-IPC reads, and maintaining Redis caching with fewer contention points."

  • Root causes traced to overly generous default Postgres settings (max_connections) and suboptimal data layout, which produced high contention and numerous lock waits, as 89 queries were blocked during cache updates and JSONB row sizes ballooned from 2 MB to 180 MB."

  • In the two-week window, optimization yielded dramatic performance gains: p99 latency fell from 1.4 seconds to 42 milliseconds, the Postgres pool stabilized at 28 active connections, rogue idle sessions were removed by eliminating idle_in_transaction_timeout, and memory usage stayed within reasonable bounds (Rust RSS around 220 MiB with a brief 380 MiB spike during a large 4 GB Parquet merge).

  • The Veltrix treasure-hunt engine served top results by relevance after reading JSON blobs from S3, reaching 2.3 million daily active users and enduring recurring latency spikes at 02:47 alongside Postgres pool failures."

  • Implementation specifics included running the Rust worker on the same Kubernetes node as Postgres to cut cross-AZ latency, allocating a 400 Mi memory request with a 100 ms soft cap, and using tokio with branches for new Parquet files, SIGTERM, or a 250 ms timer."

  • Initial mitigation added a Redis cache in front of Postgres, cutting median latency from 45 ms to 8 ms, but caused a cache stampede at 02:47 as TTLs expired and thousands of keys were recomputed, reloading pressure back onto Postgres."

  • Lessons learned emphasized starting with storage-layer changes, budgeting memory for streaming merges, and focusing on a small set of actionable metrics rather than chasing many noisy ones."

  • Economically, removing extra cache layers and optimizing the stack cut the cost per 100k hunts from $0.14 to $0.07."

  • Profiling pinpointed hot allocations in serde_json::Value; switching to simd-json reduced allocations by 44%, boosting efficiency, while Postgres cache hit rate improved from 67% to 94% and autovacuum ran faster; network usage on the Rust worker stayed light, freeing Redis for actual caching."

Summary based on 1 source


Get a daily email with more Tech stories

Source

When the Default Postgres Pool Died at 3 AM

DEV Community • May 31, 2026

When the Default Postgres Pool Died at 3 AM

More Stories