ServicesPhoenix SearchOperate

Post-Migration Results

Production results after moving preview user search to Phoenix Search

👤 Sai Tharun

Post-Migration Results

This page captures the production outcome after moving preview user detail search onto Phoenix Search and the user_details Elasticsearch projection.

The screenshots capture the production IN dashboard and API debugging views used for this Phase 1 result note.


Outcome Summary

AreaBeforeAfter
Search fan-outPreview user detail search required roughly 6-7 backend/search lookups per searchPhoenix Search serves the search through one ES search request after session/scope resolution
Identifier lookupSeparate paths for LRF / lab-report identifiers, manual sample ID, bill ID, order number, and patient identitySame identifiers are denormalized into user_details and queried through one mapped ES document
Search mappingLegacy lookup-specific callsAnalyzer-backed ES mapping with exact, prefix, suffix, segment, and search_as_you_type fields
Index storageOld userdetails index: 257.3 GB total, 129.2 GB primaryNew user_details index: 135.75 GB total, 67.61 GB primary
Storage reduction-About 47% lower total storage, roughly 50% in practical terms
ES query latencyMultiple lookups per search made end-to-end latency harder to reason aboutES p50 about 2.5 ms, p95 about 4.75 ms, p99 about 4.95 ms in the saved dashboard window
Operational visibilityHarder to connect request, query, and CDC freshnessOpenTelemetry connects API, ES, MySQL, CDC, logs, and dashboards

The important migration result is not only raw latency. The bigger win is that the search path is simpler: one shaped ES query over a purpose-built document instead of several lookup calls that each need their own timeout, error handling, and result merge behavior.


Storage Result

The old cluster data shared for the legacy index:

userdetails
3 primaries / 1 replica
157,661,460 documents
257.3 GB total
129.2 GB primary

The new production index screenshot shows:

user_details
6 primaries / 1 replica
157,441,825 documents
135.75 GB total
67.61 GB primary
MetricOld userdetailsNew user_detailsChange
Primary shards36More primary shards for the new search layout
Replicas11Same replica factor
Documents157,661,460157,441,825Same order of data volume
Total storage257.3 GB135.75 GBAbout 47.2% lower
Primary storage129.2 GB67.61 GBAbout 47.7% lower

This is the basis for saying the migration reduced index storage by roughly 50% while keeping the same production-scale document volume.

user_details index overview showing storage, health, shards, and document count

Latency and Traffic Result

SignalObserved Production Value
API request error rate0% in the HTTP service dashboard
Main endpointPOST /api/v1/users/search
Main endpoint shareAbout 97.39% of endpoint time
Search endpoint request rateAbout 242.5 req/min in the top endpoints table
Search endpoint median latencyAbout 22.76 ms
Search endpoint p95 latencyAbout 47.67 ms
Overall request latencyMedian roughly 21-23 ms; p95 roughly 43-48 ms
ES query latencyp50 about 2.5 ms, p95 about 4.75 ms, p99 about 4.95 ms
MySQL query latencyp50 about 2.5 ms, p95 about 4.75 ms
ES hits per queryAround 4.6 in the dashboard tooltip
CDC health1
ES data freshnessAround 3.6s in the dashboard tooltip

The search service stayed well under the practical ES target of 10 ms for most observed ES queries in the saved dashboard window. API latency is higher than raw ES latency because it includes auth/session work, scope resolution, query construction, response shaping, and transport/runtime overhead.

Phoenix Search HTTP service dashboard showing request error rate, throughput, top endpoints, and request latency Phoenix Search API production dashboard showing ES query latency, query volume, MySQL latency, auth outcomes, CDC health, and freshness

Mapping and Analyzer Result

The migration works because the ES document is intentionally shaped for the search cases that used to need separate calls.

Search NeedNew Index Support
Patient namesearch_as_you_type on full_name
Contact / alternate contactkeyword plus prefix analyzer fields
Lab patient IDExact, prefix, suffix, and segment fields
Manual sample IDExact, prefix, and segment fields
Order numberExact, prefix, and segment fields
Lab bill IDExact and prefix fields
National IDs / passportExact, prefix, and segment fields depending on identifier format
Org, referral, branch filtersDenormalized arrays in the same ES document

This lets Phoenix Search build one shape-routed query for the user input instead of issuing independent LRF/manual-sample/bill/patient lookups and merging them afterward.


OpenTelemetry Result

OpenTelemetry is a migration advantage because the new path emits query, request, dependency, and CDC freshness signals from one service boundary.

CapabilityWhat Phoenix Search Emits
API tracesFastAPI spans with route-level request context
Search span attributessearch.lab_id, search.search_key, search.query_shape, search.routing, search.is_multi_center, search.hit_count, search.zero_results
ES correlationThe active trace ID is passed as the Elasticsearch opaque_id
ES metricssearch.es.query.duration, search.es.query.total, search.es.hits, search.zero.results.total
MySQL metricssearch.mysql.query.duration, search.mysql.query.total
Auth metricssearch.auth.total
Error metricssearch.app.errors.total, search.unhandled.errors.total
CDC freshnesssearch.data.age, search.cdc.healthy
CDC consumer metricscdc.flow1.latency, cdc.flow2.latency, cdc.consumer.lag, cdc.messages.processed, cdc.dlq.sent
LogsTrace and span IDs are added into structured logs

Compared with the old preview path, this makes the production question easier to answer: for a slow or empty search, check one trace and see the API route, auth outcome, scope/routing, ES query latency, hit count, MySQL scope lookup, and CDC freshness.


Debugging Result

The post-migration debugging path is now trace-first. A frontend search request exposes the trace ID, HyperDX resolves that trace across the API, and the Elasticsearch span shows the actual search operation and query body. The reusable runbook lives in API Debugging.

Debug CheckEvidenceWhy This Is Useful
Browser requestPOST https://phoenix-search-in.crelio.solutions/api/v1/users/search returns 200 and exposes X-Trace-IdSupport or engineering can start from the exact failed or slow browser request
Trace lookupHyperDX opens the same trace ID and shows POST /api/v1/users/search with child spans and no trace errorsWe can separate API time, auth/scope work, ES time, and runtime overhead without guessing
ES spanThe trace includes an Elasticsearch search span for the user_details index with db.query.text and db.response.status_code=200We can confirm the generated query shape, target index, response status, and ES duration for the real request
Query correlationPhoenix Search adds search attributes and sends the active trace ID as Elasticsearch opaque_idAPI logs, HyperDX traces, and Elasticsearch request context line up around the same request
Operational outcomeThe observed trace has no errors and the ES span is around the low-millisecond rangeDebuggability improved while keeping the hot search path fast
Browser network request for Phoenix Search showing a successful search response and X-Trace-Id header HyperDX trace timeline for POST /api/v1/users/search showing child spans and no trace errors HyperDX Elasticsearch span details showing the user_details search operation and query body

Elasticsearch Cluster Evidence

The Elasticsearch production IN dashboard shows the cluster stayed healthy during the observed window:

Cluster SignalObserved Value
Cluster healthGREEN
Active data nodes3
user_details statusOpen, Healthy
Search operation ratePeriodic per-node peaks near the 100K-140K chart range
Indexing operation ratePeriodic per-node peaks near the 300K-400K chart range
Data node CPUAround 2-4% in the node table
Coordinator CPUAround 17-22% in the node table
Elasticsearch production IN dashboard showing cluster health, data nodes, operation rates, JVM, disk, CPU, and slow query panels

Routing Check

The new index requires Elasticsearch routing by lab_id. When checking a document directly, always include the route:

curl -s -u elastic:<PASSWORD> \
  "https://<ES_HOST>:9200/user_details/_doc/<USER_DETAILS_ID>?routing=<LAB_ID>"

A document can exist under one routing key and look missing under another. That is expected Elasticsearch behavior with required custom routing.

On this page