ServicesClickStack HyperDX Infra
Debug Setup
How to debug services with OpenTelemetry traces in HyperDX
👤 Sai Tharun
Debug Setup
HyperDX debugging should start from a trace, not from unrelated logs. The goal is to follow one request from entrypoint to dependency calls and understand what happened without guessing.
What Was Missing Before
| Previous Gap | Debugging Impact |
|---|---|
| No reliable request timeline | Engineers could not see the exact order of API, auth, database, cache, and search work |
| No child spans for dependencies | Slow requests were hard to split between application code and external systems |
| Weak log correlation | Logs existed, but matching them to one request required manual timestamp matching |
| Missing domain context | A trace or log did not always explain which lab, route, query shape, or scope was involved |
| No consistent trace ID handoff | Frontend, backend, logs, and dashboards were not always connected by one identifier |
Required Request Flow
Every instrumented service should make this path possible:
| Step | Where | Requirement |
|---|---|---|
| 1 | Client or caller | Capture the request that failed or was slow |
| 2 | Response headers or logs | Find the request trace ID, preferably X-Trace-Id for HTTP APIs |
| 3 | HyperDX | Search the trace ID |
| 4 | Trace timeline | Inspect the root route span and child spans |
| 5 | Dependency span | Check Redis, MySQL, Elasticsearch, HTTP, queue, or other external calls |
| 6 | Domain attributes | Confirm the business context that shaped the behavior |
What To Inspect In HyperDX
| Signal | What It Answers |
|---|---|
| Root span duration | How long the API request or job took end to end |
| Child span duration | Which dependency or internal block consumed time |
| Span status | Whether the failure is attached to a specific span |
| Error events | Exception type, message, and where the error was recorded |
| Trace ID | Shared correlation key for frontend, logs, traces, and backend investigation |
| Service name | Which service emitted the span |
| Route or operation name | Which endpoint, worker job, or operation ran |
| Span attributes | Domain-specific context needed to explain the behavior |
Debugging Common Issues
| Symptom | First Check | Follow-up |
|---|---|---|
| Slow API request | Compare root route span duration with dependency span durations | If dependencies are fast, inspect auth/session work, response shaping, serialization, and runtime overhead |
| Empty or wrong search result | Inspect domain attributes and search dependency spans | Confirm query shape, routing, filters, hit count, and target index |
| Auth failure | Inspect auth/session span and logs for the same trace | Confirm token/session values, Redis lookup, lab context, and rejected reason |
5xx response | Open the failed trace by trace ID | Find the span with error status, then inspect logs and Sentry for the same trace ID |
| Stale data | Check freshness metrics and the request trace together | Confirm whether the read path is healthy before moving to CDC or sync runbooks |
| Dependency timeout | Inspect dependency span status and duration | Check retry behavior, timeout config, and downstream health dashboards |
Phoenix Search Example
Phoenix Search should expose the trace ID on search responses and emit enough span context to explain slow, empty, or failed searches.
Important Phoenix Search attributes:
| Attribute | Why It Matters |
|---|---|
search.lab_id | Confirms the lab context used by the request |
search.search_key | Shows which search mode was requested |
search.query_shape | Explains how the input was classified |
search.routing | Confirms Elasticsearch routing |
search.is_multi_center | Shows whether related lab scope was used |
search.hit_count | Shows how many results came back |
search.zero_results | Makes empty-result traces searchable |
Important Elasticsearch span fields:
| Field | Why It Matters |
|---|---|
db.system.name | Confirms the dependency is Elasticsearch |
db.operation.name | Confirms the operation, usually search |
db.operation.parameter.index | Confirms the target index |
db.query.text | Shows the generated query body when query capture is enabled |
db.response.status_code | Separates Elasticsearch failures from application failures |
Debug Setup Checklist
| Check | Expected Result |
|---|---|
| Service exports OTLP | Traces appear in HyperDX under the correct OTEL_SERVICE_NAME |
| Framework instrumentation is enabled | HTTP route spans are created automatically |
| Dependency instrumentation is enabled | Redis, MySQL, Elasticsearch, HTTP, or queue spans appear as child spans |
| Logs include trace context | Logs carry trace_id and span_id |
| HTTP response exposes trace ID | Operators can copy X-Trace-Id from the failing request |
| Domain attributes are added | HyperDX traces explain business context, not only technical timing |
| Errors are recorded on spans | Failed traces show the failing span and exception context |
Do not call the setup complete until a real request can be opened in HyperDX and the timeline shows the route span, dependency spans, logs, and required domain attributes.