Architecture

Phoenix Search Architecture

Phoenix Search has two tightly coupled planes:

Plane	Primary Job	Main Runtime
Read plane	Serve authenticated user search and user-detail lookup	FastAPI API service
Sync plane	Keep `user_details` in Elasticsearch fresh from MySQL changes	Debezium, Redpanda, CDC consumer, `user_meta`, ES ingest pipeline

The API reads from Elasticsearch for search, but MySQL remains the source of truth for patient details and search scope resolution. CDC is the write-side projection system that keeps Elasticsearch usable for low-latency search.

High-Level Architecture

System Boundaries

Boundary	Owned By Phoenix Search	External Dependency
HTTP API	FastAPI app, auth middleware integration, user search routes, health and metrics	ALB / caller applications
Search index	`user_details` index shape, query builder, routing strategy	Elasticsearch cluster
Source data	Read access to `userDetails` for details and resolver lookups	LiveHealth MySQL schema and application writes
CDC materialization	Consumer handlers, `user_meta` projection contract, ingest pipeline	Debezium Connect and Redpanda infrastructure
Sessions	Session model validation and scope extraction	Django session data in Redis
Operations	`cdc-ctl`, backfill, dashboards, runbooks	ECS, EC2, MySQL, Redpanda, ES, HyperDX

Runtime Topology

Runtime	Process	Key Entrypoint	Main Dependencies
API	FastAPI / Uvicorn or Gunicorn worker	`search.web.application:get_app`	Redis, MySQL, Elasticsearch
CDC consumer	Python async Kafka consumer	`cdc/consumers/consumer.py`	Redpanda, MySQL, Elasticsearch
Debezium Connect	Kafka Connect worker	Connector JSON in `tools/cdc-ctl/connectors/`	MySQL binlog, Redpanda
Backfill	Go binary	`backfill/main.go`	MySQL, Elasticsearch
cdc-ctl	Go operator CLI	`tools/cdc-ctl/main.go`	Kafka Connect REST, Redpanda, MySQL, ES

The API and CDC consumer are separate runtimes. They share dependencies and the same target index, but they do not call each other directly. The API only observes CDC through Elasticsearch freshness metrics and health checks.

High-Level Data Flow

Read Path

Write / Sync Path

Low-Level API Architecture

Application Assembly

The FastAPI app is assembled in search/web/application.py.

Step	Code	Responsibility
1	`configure_logging()`	Configure process logging
2	`sentry_sdk.init(...)`	Enable Sentry when `SEARCH_SENTRY_DSN` exists
3	`FastAPI(..., lifespan=lifespan_setup)`	Create the application with startup/shutdown lifecycle
4	`app.include_router(monitoring_router)`	Register `/`, `/health`, `/health/live`, `/health/ready`, `/metrics`
5	`app.include_router(api_router, prefix="/api")`	Register `/api/v1/users/search` routes
6	`register_exception_handlers(app)`	Normalize application errors
7	`SessionAuthMiddleware`	Authenticate non-guest routes
8	`CORSMiddleware`	Apply CORS policy
9	`StripPrefixMiddleware`	Remove `/phoenix-search` before route matching
10	`trace_id_header` middleware	Refresh rate-limit cache, stamp load-test attributes, return `X-Trace-ID`

Starlette middleware runs in reverse add order, so StripPrefixMiddleware is the first custom middleware to see production requests with the ALB prefix.

Startup and Shutdown

search/web/lifespan.py owns process lifecycle.

Route Registration

File	Route Layer	Registered Paths
`search/web/api/router.py`	Versioned API router	`/api/v1/*`
`search/domains/user_search/router.py`	User search domain router	`/api/v1/users/search`
`search/web/api/monitoring/views.py`	Monitoring router	`/`, `/health`, `/health/live`, `/health/ready`, `/metrics`

Development and test environments also register debug routes under /api/debug.

Search Request Internals

Search Context

SearchContext is the immutable bundle that moves through the search pipeline:

Field	Source	Why It Matters
`query`	Sanitized request body	Drives query shape classification
`size`	`SEARCH_DEFAULT_SIZE`	Caps hit count
`search_key`	Request body	Selects a `SEARCH_MAPPING` field set
`filters`	Session scope	Enforces lab/org/branch/referral access
`routing`	Resolved lab IDs	Targets Elasticsearch shards
`search_fields`	Optional request body	Narrows allowed searchable fields
`date_format_locale`	Session	Parses DOB queries in lab-specific format

Scope Resolution

Login / Search Shape	Scope Behavior	Code
Normal lab user	`term lab_id = resolved lab`	`build_session_filters`
Doctor login	`referral_ids` when referral session is present	`_extra_filter`
Branch login	`branch_ids` or `org_ids` depending on `search_type`	`_branch_login_filter`
Collection center org login	`org_ids` using org plus sub-org expansion	`_resolve_org_ids`, `_extra_filter`
Multi-center search	`terms lab_id` across related labs, still with login-specific scope	`_resolve_lab_ids`, `build_session_filters`

The caller cannot choose lab_id directly. Lab and org scope always comes from the authenticated session and MySQL lookup helpers.

Query Builder

search/domains/user_search/query.py is intentionally shape-routed:

Shape	Main Clause Families	Avoids
`phone`	Contact exact/prefix, identity, buckets, patient ID, numeric IDs	Broad name matching
`numeric`	Patient ID, identity, bucket IDs, numeric IDs, DOB	Broad name matching
`alpha`	Patient ID, identity, buckets, full name	Numeric-only clauses
`mixed`	Structured IDs, buckets, full name, DOB if parseable	Wildcards

Important query rules:

SEARCH_MAPPING selects the allowed logical fields for each search_key.
search_fields intersects with the selected mapping; no overlap returns an empty result.
There are no wildcard queries and no fallback catch-all query.
Search results sort by _score and then last_updated_time desc.
matched_field comes from ES highlights or named queries.
Buckets come from named query tags, then the service splits multi-center hits into other_labs.

Elasticsearch Repository

search/domains/user_search/repository.py owns the ES call:

Concern	Behavior
Circuit breaker	`_es_search` is wrapped by `es_breaker`
Trace propagation	Active trace ID is passed as `opaque_id`
Routing	Lab routing is passed to ES when available
Timeout	Uses `SEARCH_ES_SEARCH_TIMEOUT`
Metrics	Records query duration, query count, hit count, and zero-result count
Circuit open	Returns empty hits rather than failing the API response
Other ES failures	Propagate after recording error metrics

User Detail Lookup Internals

User detail lookup is intentionally MySQL-backed, not ES-backed.

This path uses MySQL so detail views are source-of-truth even if Elasticsearch is temporarily stale.

Low-Level CDC Architecture

Connector Layer

Connector	Captures	Topic Prefix	Partition Routing	Purpose
`phoenix-source-existing`	`userDetails`, `billing`, `labReportRelation`	`phoenix`	`id` or `userDetailsId_id`	Source table changes into Redpanda
`phoenix-source-projection`	`user_meta`	`phoenix`	`user_details_id`	Projection changes into Redpanda for ES sync

Production connector properties that define the architecture:

Property	Why It Matters
`database.include.list = livehealthapp`	Connector reads the source database binlog
`table.include.list`	Restricts emitted records to search-relevant tables
`snapshot.mode = when_needed`	Recovers schema/offset gaps without normal full table snapshots
`snapshot.select.statement.overrides ... WHERE 1=0`	Captures schema without letting Debezium own historical data load
`PartitionRouting` SMT	Keeps all events for the same user on the same partition number
`signal.enabled.channels = source`	Enables controlled Debezium signal-table actions
`heartbeat.action.query`	Updates `debezium_heartbeat` so operators can detect stalled binlog reads
`errors.deadletterqueue.topic.name`	Sends connector/SMT failures to the Kafka Connect DLQ

Consumer Layer

The consumer is at-least-once. Offsets are stored only after successful handling or after DLQ publication. Replays are expected, so handlers use idempotent writes and ordering guards.

Phase Routing

Phase	Detection	Source Topic Behavior	Projection Topic Behavior
`migration`	`user_meta.full_name` exists	Materialize full denormalized rows into `user_meta`	Forward projection row to ES through ingest pipeline
`running`	`user_meta.full_name` absent	Keep slim projection updated and resolve identity live	Compose full ES doc from projection + live `userDetails`

CDC_PHASE_OVERRIDE can force a phase for recovery. The override must be migration or running.

Source Table Handler Paths

Topic	Running Handler	MySQL Write	Direct ES Write
`userDetails`	`_process_running_user_details`	Upsert `lab_id`, `last_updated_time`, and selected projection metadata	Yes, after `FieldResolver` composes a full document
`billing`	`_process_running_source_mysql` -> `_handle_billing`	Append `lab_bill_ids`, `order_numbers`, `referral_ids`, `org_ids`, `branch_ids`	No
`labReportRelation`	`_process_running_source_mysql` -> `_handle_lab_report`	Append `manual_sample_ids`	No
`user_meta`	`_process_running_projection`	None	Yes, after identity is resolved from live `userDetails`

Field Resolver

cdc/consumers/field_resolver.py composes ES documents during running phase.

Event Source	Identity Fields	Aggregate Fields	Reason
`userDetails` event	Live `userDetails` lookup	Current `user_meta` lookup	Avoid stale envelope identity on DLQ replay
`user_meta` event	Live `userDetails` lookup	Event `after` image	Projection event is the aggregate change being indexed

The resolver treats userDetails.labId_id = -1 as missing. That prevents merged-away patient rows from leaking sentinel data into Elasticsearch.

Retry and Ordering Model

Mechanism	Code Path	Purpose
Partition routing	Debezium `PartitionRouting` SMT	Same user, same Kafka partition across all CDC topics
Per-partition worker	`consumer.py::_partition_worker`	Sequential handling within a partition
Handler retry	`consumer.py::_handle_with_retries`	Recover transient MySQL/ES/Kafka issues
MySQL deadlock retry	`router.py::_retry_on_deadlock`	Retry OperationalError 1213 with jittered backoff
Consumer DLQ	`consumer.py::_send_to_dlq`	Preserve poison messages after retries
CAS guard	`handlers.py::_handle_user_details_running`	Reject stale identity updates by `last_updated_time`
CSV dedupe	`_upsert_csv_field`, `_append_csv_fields`	Make repeated billing/sample events safe
Tombstone guard	`_is_tombstoned_user`, `FieldResolver`	Prevent merged-away patients from being recreated

Data Architecture

Source Tables and Projection

Store	Object	Role
MySQL	`userDetails`	Authoritative identity, demographics, lab routing, patient state
MySQL	`billing`	Order, bill, referral, org, and branch aggregate source
MySQL	`labReportRelation`	Manual sample ID aggregate source
MySQL	`user_meta`	Search projection table used by CDC and backfill
Elasticsearch	`user_details`	Search-optimized document index

Field Ownership

Field Family	Source of Truth	Projection / Index Behavior
Identity and demographics	`userDetails`	Indexed directly by CDC resolver or backfill join
`lab_id`	`userDetails.labId_id`	Used as ES routing and access filter
Patient IDs and identity IDs	`userDetails`	Searched through exact, prefix, suffix, or segment fields
Billing IDs and order numbers	`billing`	Aggregated into CSV in `user_meta`, transformed to arrays for ES
Org, referral, branch IDs	`billing`	Aggregated into recency-ordered CSV fields
Manual sample IDs	`labReportRelation` via `billing`	Aggregated into `manual_sample_ids`
CDC freshness	ES newest `last_updated_time`	Probed by API background task and exposed in `/health`

Elasticsearch Document Shape

The ES document combines identity plus aggregate arrays:

{
  "id": 101,
  "lab_id": 1,
  "full_name": "John Doe",
  "lab_patient_id": "P-1001",
  "contact": "9999999999",
  "manual_sample_ids": ["S-1001"],
  "order_numbers": ["ORD-1001"],
  "lab_bill_ids": ["5001"],
  "org_ids": [20],
  "referral_ids": [77],
  "branch_ids": [10],
  "last_updated_time": "2024-01-02T10:00:00Z"
}

The API search path depends on ES routing by lab_id. If a patient moves labs, CDC deletes the old routed document and indexes the new routed document.

Elasticsearch Routing Model

Phoenix Search uses Elasticsearch custom routing on the user_details index. The index mapping declares _routing.required = true, so every write, delete, and point lookup must pass the same routing key that was used when the document was indexed.

Concern	Behavior
ES document ID	`user_details_id`, stored in ES as `id`
ES routing key	`lab_id` as a string
Source of routing	`userDetails.labId_id`, denormalized into `user_meta.lab_id`
Normal search routing	Current session lab ID
Multi-center search routing	Comma-separated related lab IDs resolved from MySQL
Filter endpoint routing	Current session lab ID
Detail endpoint	MySQL-backed, not an ES point lookup

Routing is not the only security boundary. The API also adds lab_id / org / branch / referral filters from the authenticated session. Routing targets the relevant ES shard or shards; filters enforce the allowed result scope.

Write paths must use the same rule:

Writer	Routing Behavior
CDC userDetails handler	Resolves full ES doc, indexes with `routing=str(lab_id)`
CDC user_meta handler	Resolves projection doc, indexes/deletes with `routing=str(lab_id)`
CDC lab move / reroute	Deletes old doc with old `lab_id`, then indexes new doc with new `lab_id`
CDC merge tombstone	Uses the previous `lab_id` from the `before` image to delete the old routed doc
Backfill bulk indexer	Sets bulk item `Routing` from `user_meta.lab_id`
Backfill targeted repair	Calls ES index with `WithRouting(lab_id)`
Backfill verify	Reads `GET /user_details/_doc/<id>?routing=<lab_id>`

Debugging must include routing. A document can exist under one routing key and appear missing under another:

curl -s -u elastic:<PASSWORD> \
  "https://<ES_HOST>:9200/user_details/_doc/<USER_DETAILS_ID>?routing=<LAB_ID>"

Consistency Model

Concern	Guarantee
Search freshness	Eventually consistent from MySQL through CDC into ES
Detail lookup	Stronger source-of-truth read from MySQL
Per-user ordering	Preserved by partition routing and per-partition workers
Global ordering	Not guaranteed across different users or partitions
Duplicate delivery	Expected; handlers are designed to be idempotent
CDC outage	API can still serve existing ES results, but freshness age increases
ES outage	Search API readiness degrades; CDC retries then DLQs failed writes
Redis outage	Auth/session and rate-limit flows are affected
MySQL outage	Detail lookup, scope resolution, CDC materialization, and resolver reads are affected

The important rule is that Elasticsearch is a projection, not the source of truth. When ES and MySQL disagree, MySQL wins and CDC/backfill should repair ES.

Failure Domains

Failure	Immediate Symptom	First Debug Page
API dependency down	`/health/ready` returns 503	Operations
Search stale	`/health` has stale CDC body status	CDC
Debezium connector failed	Redpanda topics stop receiving source events	CDC Tools and Backfill
Consumer lag high	`cdc_consumer_lag` rises	Operations
Projection corrupt or incomplete	ES disagrees with `user_meta` / MySQL	CDC Tools and Backfill
Query returns unexpected results	Wrong `search_key`, `search_fields`, session scope, or ES routing	API Reference

Source References

Area	Files
App assembly	`search/web/application.py`, `search/web/api/router.py`
Startup and dependency lifecycle	`search/web/lifespan.py`, `search/services/*/lifespan.py`
Auth and session context	`search/services/auth/middleware.py`, `search/services/auth/dependencies.py`, `search/services/auth/schemas.py`
Search service path	`search/domains/user_search/router.py`, `service.py`, `repository.py`, `query.py`, `filters.py`, `context.py`
MySQL lookup path	`search/domains/user_search/queries.py`
CDC consumer orchestration	`cdc/consumers/consumer.py`
CDC routing and handlers	`cdc/consumers/router.py`, `handlers.py`, `field_resolver.py`, `phase_detector.py`
Debezium connector configs	`tools/cdc-ctl/connectors/source-connector-existing.production.json`, `source-connector-projection.production.json`
Backfill architecture	`backfill/main.go`, `scanner.go`, `indexer.go`, `migrate.go`, `row.go`, `transform.go`

Architecture

On this page