Start Here

Use this page first. Phoenix Search has a read side and a sync side, and mixing them makes the docs hard to follow.

If You Need To Understand	Read
What Phoenix Search is and where it fits	Phoenix Search Overview
How the real system is wired	Architecture
How frontend authentication and endpoints work	Auth and Endpoints
How search requests and query shapes work	API Reference
How MySQL changes reach Elasticsearch	CDC
How data is migrated, repaired, or reindexed	CDC Tools and Backfill
How to run, monitor, or debug production	Operations
How to debug one API request end to end	API Debugging
What improved after migration	Post-Migration Results

Mental Model

The frontend does not call a Phoenix Search login endpoint. It calls crelio-app to mint a short-lived Phoenix Search token, then uses that token against the Phoenix Search API.

The API does not own source-of-truth patient data. It searches Elasticsearch and uses MySQL for detail lookup and session-scope resolution.

Elasticsearch documents are routed by lab_id. Search requests derive routing from the authenticated session, CDC writes use the document's current lab_id, and backfill writes use user_meta.lab_id.

CDC is not a separate service in the docs. It is the write-side sync subsystem that keeps Phoenix Search's Elasticsearch index current.

Phase 1 Why / Context

This is the Phase 1 reasoning the docs assume. The goal is not only to move code into a new service; it is to make preview user detail search simpler to operate, easier to debug, and cheaper to serve at production scale.

Question	Answer
Why was Phoenix Search introduced in Phase 1?	Preview user detail search had too much fan-out. One user search could require roughly `6-7` lookup paths across patient identity, LRF / lab-report identifiers, manual sample ID, bill ID, and order identifiers. Phoenix Search turns that into one routed Elasticsearch search over `user_details`.
Which legacy search paths are replaced first?	The Phase 1 target is preview user detail search: patient identity lookup plus identifier lookups that previously needed separate backend/search calls and merge behavior.
Why use one `user_details` Elasticsearch document?	The search input can match many identifiers, but the user-facing result is still a user/patient entity. Denormalizing the searchable identifiers, org/referral/branch filters, and display fields into one projection lets the API return bucketed results without calling multiple lookup APIs.
Why use analyzer-backed mappings?	Exact keyword matching alone is not enough for patient and identifier search. The mapping supports exact, prefix, suffix, segment, and `search_as_you_type` fields so Phoenix Search can handle partial names, phone-like input, lab patient IDs, manual sample IDs, bills, orders, and national IDs in one query shape.
Why is CDC required?	Elasticsearch is a read projection, not the source of truth. MySQL remains the source. CDC keeps `user_meta` and `user_details` current after the initial backfill without making the frontend wait on source-table joins during every search.
Why Debezium, Redpanda, and the Phoenix CDC consumer?	Debezium reads MySQL binlog changes, Redpanda gives a replayable Kafka-compatible transport, and the Python consumer owns Phoenix-specific materialization rules. That split lets operators inspect connector state, topic lag, DLQ records, consumer health, and Elasticsearch sync separately.
Why is backfill still needed if CDC exists?	CDC keeps changes current from a point in time. Backfill migrates the existing historical dataset into `user_meta` and Elasticsearch so the index starts complete, then CDC handles ongoing inserts, updates, deletes, and lab reroutes.
Why is Elasticsearch routing by `lab_id` required?	Search scope is lab/session driven. Routing documents by `lab_id` keeps reads and writes scoped to the same shard route, makes point lookups deterministic, and avoids cross-lab scatter for the hot search path. Any direct ES document check must include `routing=<LAB_ID>`.
Why use ephemeral tokens from crelio-app?	LiveHealth already has the user session. crelio-app mints a short-lived Phoenix Search token containing session and lab context, and Phoenix Search validates that token plus Redis session state. The frontend does not need a separate Phoenix login flow.
Why is the Phase 1 endpoint `https://phoenix-search-in.crelio.solutions/api`?	The current frontend fallback and crelio-app token response point to the IN Phoenix Search base URL. The docs therefore describe the real Phase 1 endpoint instead of inventing region segregation that is not present in the current integration.
Why OpenTelemetry now?	The new service boundary makes tracing valuable. A single request can expose the browser `X-Trace-Id`, API route, search attributes, Elasticsearch query span, MySQL dependency timing, errors, CDC freshness, and logs around the same trace.
What proves Phase 1 worked?	The post-migration evidence shows one ES search request replacing the previous multi-lookup pattern, roughly `50%` lower index storage, `0%` observed HTTP request error rate, ES p50 around `2.5 ms`, ES p95 around `4.75 ms`, and trace-level debugging from frontend request to ES query.

Known Phase 1 implementation facts from code:

Fact	Evidence
Frontend gets an ephemeral Phoenix token from crelio-app	`livehealth-frontend/src/services/phoenixSearch/axios/tokenManager.ts`
crelio-app signs the token using `PHOENIX_SEARCH_JWT_SECRET`	`crelio-app/core/views/phoenix_search_token.py`
Phoenix Search validates it using `SEARCH_EPHEMERAL_JWT_SECRET`	`search/search/services/auth/ephemeral.py`
Frontend default Phoenix Search URL is `https://phoenix-search-in.crelio.solutions/api`	`livehealth-frontend/src/services/phoenixSearch/axios/config.ts`
Search calls use `/v1/users/search` under that base URL	`livehealth-frontend/src/services/phoenixSearch/search.ts`

Folder Structure

phoenix-search/
├── index                 # service overview
├── start-here            # reading path + Phase 1 context
├── architecture          # high-level and low-level system design
├── read-path/
│   ├── auth-and-endpoints
│   └── api-reference
├── sync-path/
│   ├── cdc
│   └── backfill
└── operate/
    ├── operations
    ├── debugging
    └── post-migration-results

Keep new docs in the same split:

Content Type	Folder
Frontend integration, auth, endpoints, request shapes	`read-path/`
Debezium, Redpanda, consumer behavior, migration, backfill	`sync-path/`
On-call, metrics, dashboards, deploy, debug	`operate/`
System-wide design and boundaries	`architecture.mdx`

Start Here

Start Here

Mental Model

Phase 1 Why / Context

Folder Structure

On this page