Architecture
High-level and low-level architecture for Phoenix Search
Phoenix Search Architecture
Phoenix Search has two tightly coupled planes:
| Plane | Primary Job | Main Runtime |
|---|---|---|
| Read plane | Serve authenticated user search and user-detail lookup | FastAPI API service |
| Sync plane | Keep user_details in Elasticsearch fresh from MySQL changes | Debezium, Redpanda, CDC consumer, user_meta, ES ingest pipeline |
The API reads from Elasticsearch for search, but MySQL remains the source of truth for patient details and search scope resolution. CDC is the write-side projection system that keeps Elasticsearch usable for low-latency search.
High-Level Architecture
System Boundaries
| Boundary | Owned By Phoenix Search | External Dependency |
|---|---|---|
| HTTP API | FastAPI app, auth middleware integration, user search routes, health and metrics | ALB / caller applications |
| Search index | user_details index shape, query builder, routing strategy | Elasticsearch cluster |
| Source data | Read access to userDetails for details and resolver lookups | LiveHealth MySQL schema and application writes |
| CDC materialization | Consumer handlers, user_meta projection contract, ingest pipeline | Debezium Connect and Redpanda infrastructure |
| Sessions | Session model validation and scope extraction | Django session data in Redis |
| Operations | cdc-ctl, backfill, dashboards, runbooks | ECS, EC2, MySQL, Redpanda, ES, HyperDX |
Runtime Topology
| Runtime | Process | Key Entrypoint | Main Dependencies |
|---|---|---|---|
| API | FastAPI / Uvicorn or Gunicorn worker | search.web.application:get_app | Redis, MySQL, Elasticsearch |
| CDC consumer | Python async Kafka consumer | cdc/consumers/consumer.py | Redpanda, MySQL, Elasticsearch |
| Debezium Connect | Kafka Connect worker | Connector JSON in tools/cdc-ctl/connectors/ | MySQL binlog, Redpanda |
| Backfill | Go binary | backfill/main.go | MySQL, Elasticsearch |
| cdc-ctl | Go operator CLI | tools/cdc-ctl/main.go | Kafka Connect REST, Redpanda, MySQL, ES |
The API and CDC consumer are separate runtimes. They share dependencies and the same target index, but they do not call each other directly. The API only observes CDC through Elasticsearch freshness metrics and health checks.
High-Level Data Flow
Read Path
Write / Sync Path
Low-Level API Architecture
Application Assembly
The FastAPI app is assembled in search/web/application.py.
| Step | Code | Responsibility |
|---|---|---|
| 1 | configure_logging() | Configure process logging |
| 2 | sentry_sdk.init(...) | Enable Sentry when SEARCH_SENTRY_DSN exists |
| 3 | FastAPI(..., lifespan=lifespan_setup) | Create the application with startup/shutdown lifecycle |
| 4 | app.include_router(monitoring_router) | Register /, /health, /health/live, /health/ready, /metrics |
| 5 | app.include_router(api_router, prefix="/api") | Register /api/v1/users/search routes |
| 6 | register_exception_handlers(app) | Normalize application errors |
| 7 | SessionAuthMiddleware | Authenticate non-guest routes |
| 8 | CORSMiddleware | Apply CORS policy |
| 9 | StripPrefixMiddleware | Remove /phoenix-search before route matching |
| 10 | trace_id_header middleware | Refresh rate-limit cache, stamp load-test attributes, return X-Trace-ID |
Starlette middleware runs in reverse add order, so StripPrefixMiddleware is the first custom middleware to see production requests with the ALB prefix.
Startup and Shutdown
search/web/lifespan.py owns process lifecycle.
Route Registration
| File | Route Layer | Registered Paths |
|---|---|---|
search/web/api/router.py | Versioned API router | /api/v1/* |
search/domains/user_search/router.py | User search domain router | /api/v1/users/search |
search/web/api/monitoring/views.py | Monitoring router | /, /health, /health/live, /health/ready, /metrics |
Development and test environments also register debug routes under /api/debug.
Search Request Internals
Search Context
SearchContext is the immutable bundle that moves through the search pipeline:
| Field | Source | Why It Matters |
|---|---|---|
query | Sanitized request body | Drives query shape classification |
size | SEARCH_DEFAULT_SIZE | Caps hit count |
search_key | Request body | Selects a SEARCH_MAPPING field set |
filters | Session scope | Enforces lab/org/branch/referral access |
routing | Resolved lab IDs | Targets Elasticsearch shards |
search_fields | Optional request body | Narrows allowed searchable fields |
date_format_locale | Session | Parses DOB queries in lab-specific format |
Scope Resolution
| Login / Search Shape | Scope Behavior | Code |
|---|---|---|
| Normal lab user | term lab_id = resolved lab | build_session_filters |
| Doctor login | referral_ids when referral session is present | _extra_filter |
| Branch login | branch_ids or org_ids depending on search_type | _branch_login_filter |
| Collection center org login | org_ids using org plus sub-org expansion | _resolve_org_ids, _extra_filter |
| Multi-center search | terms lab_id across related labs, still with login-specific scope | _resolve_lab_ids, build_session_filters |
The caller cannot choose lab_id directly. Lab and org scope always comes from the authenticated session and MySQL lookup helpers.
Query Builder
search/domains/user_search/query.py is intentionally shape-routed:
| Shape | Main Clause Families | Avoids |
|---|---|---|
phone | Contact exact/prefix, identity, buckets, patient ID, numeric IDs | Broad name matching |
numeric | Patient ID, identity, bucket IDs, numeric IDs, DOB | Broad name matching |
alpha | Patient ID, identity, buckets, full name | Numeric-only clauses |
mixed | Structured IDs, buckets, full name, DOB if parseable | Wildcards |
Important query rules:
SEARCH_MAPPINGselects the allowed logical fields for eachsearch_key.search_fieldsintersects with the selected mapping; no overlap returns an empty result.- There are no wildcard queries and no fallback catch-all query.
- Search results sort by
_scoreand thenlast_updated_time desc. matched_fieldcomes from ES highlights or named queries.- Buckets come from named query tags, then the service splits multi-center hits into
other_labs.
Elasticsearch Repository
search/domains/user_search/repository.py owns the ES call:
| Concern | Behavior |
|---|---|
| Circuit breaker | _es_search is wrapped by es_breaker |
| Trace propagation | Active trace ID is passed as opaque_id |
| Routing | Lab routing is passed to ES when available |
| Timeout | Uses SEARCH_ES_SEARCH_TIMEOUT |
| Metrics | Records query duration, query count, hit count, and zero-result count |
| Circuit open | Returns empty hits rather than failing the API response |
| Other ES failures | Propagate after recording error metrics |
User Detail Lookup Internals
User detail lookup is intentionally MySQL-backed, not ES-backed.
This path uses MySQL so detail views are source-of-truth even if Elasticsearch is temporarily stale.
Low-Level CDC Architecture
Connector Layer
| Connector | Captures | Topic Prefix | Partition Routing | Purpose |
|---|---|---|---|---|
phoenix-source-existing | userDetails, billing, labReportRelation | phoenix | id or userDetailsId_id | Source table changes into Redpanda |
phoenix-source-projection | user_meta | phoenix | user_details_id | Projection changes into Redpanda for ES sync |
Production connector properties that define the architecture:
| Property | Why It Matters |
|---|---|
database.include.list = livehealthapp | Connector reads the source database binlog |
table.include.list | Restricts emitted records to search-relevant tables |
snapshot.mode = when_needed | Recovers schema/offset gaps without normal full table snapshots |
snapshot.select.statement.overrides ... WHERE 1=0 | Captures schema without letting Debezium own historical data load |
PartitionRouting SMT | Keeps all events for the same user on the same partition number |
signal.enabled.channels = source | Enables controlled Debezium signal-table actions |
heartbeat.action.query | Updates debezium_heartbeat so operators can detect stalled binlog reads |
errors.deadletterqueue.topic.name | Sends connector/SMT failures to the Kafka Connect DLQ |
Consumer Layer
The consumer is at-least-once. Offsets are stored only after successful handling or after DLQ publication. Replays are expected, so handlers use idempotent writes and ordering guards.
Phase Routing
| Phase | Detection | Source Topic Behavior | Projection Topic Behavior |
|---|---|---|---|
migration | user_meta.full_name exists | Materialize full denormalized rows into user_meta | Forward projection row to ES through ingest pipeline |
running | user_meta.full_name absent | Keep slim projection updated and resolve identity live | Compose full ES doc from projection + live userDetails |
CDC_PHASE_OVERRIDE can force a phase for recovery. The override must be migration or running.
Source Table Handler Paths
| Topic | Running Handler | MySQL Write | Direct ES Write |
|---|---|---|---|
userDetails | _process_running_user_details | Upsert lab_id, last_updated_time, and selected projection metadata | Yes, after FieldResolver composes a full document |
billing | _process_running_source_mysql -> _handle_billing | Append lab_bill_ids, order_numbers, referral_ids, org_ids, branch_ids | No |
labReportRelation | _process_running_source_mysql -> _handle_lab_report | Append manual_sample_ids | No |
user_meta | _process_running_projection | None | Yes, after identity is resolved from live userDetails |
Field Resolver
cdc/consumers/field_resolver.py composes ES documents during running phase.
| Event Source | Identity Fields | Aggregate Fields | Reason |
|---|---|---|---|
userDetails event | Live userDetails lookup | Current user_meta lookup | Avoid stale envelope identity on DLQ replay |
user_meta event | Live userDetails lookup | Event after image | Projection event is the aggregate change being indexed |
The resolver treats userDetails.labId_id = -1 as missing. That prevents merged-away patient rows from leaking sentinel data into Elasticsearch.
Retry and Ordering Model
| Mechanism | Code Path | Purpose |
|---|---|---|
| Partition routing | Debezium PartitionRouting SMT | Same user, same Kafka partition across all CDC topics |
| Per-partition worker | consumer.py::_partition_worker | Sequential handling within a partition |
| Handler retry | consumer.py::_handle_with_retries | Recover transient MySQL/ES/Kafka issues |
| MySQL deadlock retry | router.py::_retry_on_deadlock | Retry OperationalError 1213 with jittered backoff |
| Consumer DLQ | consumer.py::_send_to_dlq | Preserve poison messages after retries |
| CAS guard | handlers.py::_handle_user_details_running | Reject stale identity updates by last_updated_time |
| CSV dedupe | _upsert_csv_field, _append_csv_fields | Make repeated billing/sample events safe |
| Tombstone guard | _is_tombstoned_user, FieldResolver | Prevent merged-away patients from being recreated |
Data Architecture
Source Tables and Projection
| Store | Object | Role |
|---|---|---|
| MySQL | userDetails | Authoritative identity, demographics, lab routing, patient state |
| MySQL | billing | Order, bill, referral, org, and branch aggregate source |
| MySQL | labReportRelation | Manual sample ID aggregate source |
| MySQL | user_meta | Search projection table used by CDC and backfill |
| Elasticsearch | user_details | Search-optimized document index |
Field Ownership
| Field Family | Source of Truth | Projection / Index Behavior |
|---|---|---|
| Identity and demographics | userDetails | Indexed directly by CDC resolver or backfill join |
lab_id | userDetails.labId_id | Used as ES routing and access filter |
| Patient IDs and identity IDs | userDetails | Searched through exact, prefix, suffix, or segment fields |
| Billing IDs and order numbers | billing | Aggregated into CSV in user_meta, transformed to arrays for ES |
| Org, referral, branch IDs | billing | Aggregated into recency-ordered CSV fields |
| Manual sample IDs | labReportRelation via billing | Aggregated into manual_sample_ids |
| CDC freshness | ES newest last_updated_time | Probed by API background task and exposed in /health |
Elasticsearch Document Shape
The ES document combines identity plus aggregate arrays:
{
"id": 101,
"lab_id": 1,
"full_name": "John Doe",
"lab_patient_id": "P-1001",
"contact": "9999999999",
"manual_sample_ids": ["S-1001"],
"order_numbers": ["ORD-1001"],
"lab_bill_ids": ["5001"],
"org_ids": [20],
"referral_ids": [77],
"branch_ids": [10],
"last_updated_time": "2024-01-02T10:00:00Z"
}The API search path depends on ES routing by lab_id. If a patient moves labs, CDC deletes the old routed document and indexes the new routed document.
Elasticsearch Routing Model
Phoenix Search uses Elasticsearch custom routing on the user_details index. The index mapping declares _routing.required = true, so every write, delete, and point lookup must pass the same routing key that was used when the document was indexed.
| Concern | Behavior |
|---|---|
| ES document ID | user_details_id, stored in ES as id |
| ES routing key | lab_id as a string |
| Source of routing | userDetails.labId_id, denormalized into user_meta.lab_id |
| Normal search routing | Current session lab ID |
| Multi-center search routing | Comma-separated related lab IDs resolved from MySQL |
| Filter endpoint routing | Current session lab ID |
| Detail endpoint | MySQL-backed, not an ES point lookup |
Routing is not the only security boundary. The API also adds lab_id / org / branch / referral filters from the authenticated session. Routing targets the relevant ES shard or shards; filters enforce the allowed result scope.
Write paths must use the same rule:
| Writer | Routing Behavior |
|---|---|
| CDC userDetails handler | Resolves full ES doc, indexes with routing=str(lab_id) |
| CDC user_meta handler | Resolves projection doc, indexes/deletes with routing=str(lab_id) |
| CDC lab move / reroute | Deletes old doc with old lab_id, then indexes new doc with new lab_id |
| CDC merge tombstone | Uses the previous lab_id from the before image to delete the old routed doc |
| Backfill bulk indexer | Sets bulk item Routing from user_meta.lab_id |
| Backfill targeted repair | Calls ES index with WithRouting(lab_id) |
| Backfill verify | Reads GET /user_details/_doc/<id>?routing=<lab_id> |
Debugging must include routing. A document can exist under one routing key and appear missing under another:
curl -s -u elastic:<PASSWORD> \
"https://<ES_HOST>:9200/user_details/_doc/<USER_DETAILS_ID>?routing=<LAB_ID>"Consistency Model
| Concern | Guarantee |
|---|---|
| Search freshness | Eventually consistent from MySQL through CDC into ES |
| Detail lookup | Stronger source-of-truth read from MySQL |
| Per-user ordering | Preserved by partition routing and per-partition workers |
| Global ordering | Not guaranteed across different users or partitions |
| Duplicate delivery | Expected; handlers are designed to be idempotent |
| CDC outage | API can still serve existing ES results, but freshness age increases |
| ES outage | Search API readiness degrades; CDC retries then DLQs failed writes |
| Redis outage | Auth/session and rate-limit flows are affected |
| MySQL outage | Detail lookup, scope resolution, CDC materialization, and resolver reads are affected |
The important rule is that Elasticsearch is a projection, not the source of truth. When ES and MySQL disagree, MySQL wins and CDC/backfill should repair ES.
Failure Domains
| Failure | Immediate Symptom | First Debug Page |
|---|---|---|
| API dependency down | /health/ready returns 503 | Operations |
| Search stale | /health has stale CDC body status | CDC |
| Debezium connector failed | Redpanda topics stop receiving source events | CDC Tools and Backfill |
| Consumer lag high | cdc_consumer_lag rises | Operations |
| Projection corrupt or incomplete | ES disagrees with user_meta / MySQL | CDC Tools and Backfill |
| Query returns unexpected results | Wrong search_key, search_fields, session scope, or ES routing | API Reference |
Source References
| Area | Files |
|---|---|
| App assembly | search/web/application.py, search/web/api/router.py |
| Startup and dependency lifecycle | search/web/lifespan.py, search/services/*/lifespan.py |
| Auth and session context | search/services/auth/middleware.py, search/services/auth/dependencies.py, search/services/auth/schemas.py |
| Search service path | search/domains/user_search/router.py, service.py, repository.py, query.py, filters.py, context.py |
| MySQL lookup path | search/domains/user_search/queries.py |
| CDC consumer orchestration | cdc/consumers/consumer.py |
| CDC routing and handlers | cdc/consumers/router.py, handlers.py, field_resolver.py, phase_detector.py |
| Debezium connector configs | tools/cdc-ctl/connectors/source-connector-existing.production.json, source-connector-projection.production.json |
| Backfill architecture | backfill/main.go, scanner.go, indexer.go, migrate.go, row.go, transform.go |