Design Decisions

This page documents the significant architectural and design choices made during the implementation of Sample Rerun, along with their rationale.

1. DocumentDB for rerun records instead of MySQL

Decision: SampleRerun and SampleRerunValues use DocumentDB (MongoDB) via the DocumentDBModelBase mixin, not regular Django models backed by MySQL.

Rationale:

Rerun records are transactional and ephemeral — they exist only during the rerun lifecycle and are deleted upon confirmation or cancellation
The requested_parameters and fulfilled_parameters fields are arrays of varying length — MongoDB's native array support avoids junction tables
No joins are needed with other models — rerun data is always fetched by lab_report_id and rendered independently
High write throughput during busy lab hours (especially when instruments trigger reruns automatically)

Trade-off: Adds a DocumentDB dependency; DOCUMENT_DB_ENABLED must be True for the feature to work. The feature silently degrades (returns empty) when DocumentDB is disabled.

2. sampleRedrawFlag on LabReportRelation

Decision: A single integer field sampleRedrawFlag on the report controls rerun state, rather than a separate status table or a foreign key to the DocumentDB record.

Rationale:

The flag serves as a coarse-grained lock — it blocks save/sign operations and filters reports in waiting lists
Many existing queries already filter on LabReportRelation fields; adding a new field to the same table avoids expensive joins
The flag values (0, 3, 4) are simple enough that they don't warrant a full state machine in the relational DB
ES sync mirrors this field, enabling waiting list queries without database hits

Trade-off: Legacy values 1 (partial redraw) and 2 (full redraw) coexist with the rerun values 3 and 4. This requires careful handling in queries — some use sampleRedrawFlag = 0, others use sampleRedrawFlag ∈ [0, 1, 2, 4], and the completed-tests query has a specific exclusion list.

3. Redis hash cache for rerun numbers

Decision: Active rerun numbers are cached in a Redis hash keyed by sample_rerun_{lab_id}, mapping lab_report_id → rerun_number.

Rationale:

The interfacing app's device waiting list calls get_rerun_cache_bulk() for every refresh — this is a hot path
The hash structure allows HMGET for batch lookups and HSET/HDEL for single operations without full cache invalidation
1-day TTL prevents stale entries from orphaned reruns (e.g., if the cancel API was never called)

Trade-off: A cache miss falls back to a DocumentDB query via fill_rerun_cache(). During cache warm-up after a Redis restart, there may be brief periods where rerun numbers are not available until the first bulk query triggers the backfill.

4. Fusion webhook for auto-rerun (instead of direct invocation)

Decision: AutoSampleRerunCheck does not call SampleRerunRequestView directly. Instead, it sends a Fusion webhook that eventually hits the /request endpoint.

Rationale:

Decoupling: The auto-check runs in the context of a device data ingestion request; creating the rerun record in a separate request avoids transaction entanglement
Retry logic: Fusion provides built-in retry and failure handling — if the rerun request fails, it can be retried without re-processing the device data
Audit trail: The webhook creates a clear, traceable boundary between "qualification passed" and "rerun requested"
Internal request marker: The webhook sets x-is-internal-request: True, which the view uses to set login_user = -1 (system-initiated)

Trade-off: Added latency — the auto-rerun is not instantaneous; there is a round-trip through Fusion. In practice, this is acceptable because the rerun request does not need to be synchronous with the device data ingestion.

5. Sentinel value -1 for instrument-triggered reruns

Decision: rerun_number = -1 is a sentinel value that indicates the instrument re-sent data without being asked (INSTRUMENT_TRIGGERED_RERUN_NUMBER = -1).

Rationale:

Manual reruns use positive integers (1, 2, 3, …) for iteration tracking
Using -1 creates a clear, non-overlapping domain — there is no ambiguity between a manual rerun #1 and an instrument rerun
The interfacing app's qualify_rerun() function sets this value, and the backend routes to register_machine_triggered_rerun() based on it

Trade-off: The interfacing waiting list badge checks rerunNumber !== 0 && rerunNumber !== -1 to exclude instrument-triggered reruns from the visual badge, since those are handled differently (immediate fulfilment, no user-initiated request).

6. Partial fulfilment tracking for manual reruns

Decision: Manual reruns track requested_parameters and fulfilled_parameters separately, allowing partial fulfilment when not all requested parameters arrive simultaneously.

Rationale:

Instruments may not process all requested parameters in a single batch — some parameters may require different analysers or run times
The is_rerun_fulfilled() method compares the two arrays and only sets sampleRedrawFlag = 4 when all requested parameters have been received
Until fulfilment is complete, the banner shows "Rerun Requested" (sampleRedrawFlag = 3), not "Rerun Received" (4)

Trade-off: The frontend must handle the intermediate state where some rerun values exist but the rerun is not yet ready for confirmation. This is handled by only showing the Review tab when sampleRedrawFlag ∈ {3, 4}.

7. Value replacement on confirm (delete + insert)

Decision: When confirming rerun values, the system deletes the existing ReportValue rows matching the confirmed indices and inserts new rows — rather than updating in place.

Rationale:

The user may choose the rerun value for some parameters and the original for others — this makes an UPDATE cumbersome (each row needs different logic)
A delete-and-insert approach ensures clean state — no risk of partial updates or stale fields
The CONFIRMATION_REPORT_KEYS list defines exactly which fields are transferred, preventing unintended data carryover

Trade-off: The delete-then-insert is not atomic unless wrapped in a transaction. If the insert fails after the delete, report values would be lost. The implementation should (and does) wrap this in a database transaction.

8. Manual rerun flag at the parameter level (via ReportFormat.meta)

Decision: The manual_rerun flag is stored in the meta JSON field of ReportFormat (per parameter), not at the test or device level.

Rationale:

Different parameters of the same test may have different rerun eligibility — e.g., only haemoglobin may need rerun capability, not all CBC parameters
Storing in meta allows the flag to be set from both LabAdmin (test configuration) and Device Management (device test mapping), converging on the same field
ReportFormat is already the canonical source of parameter-level configuration

Trade-off: The meta field is a JSON text field — there is no schema enforcement. The shouldPreserveMetaForRerun() helper in the frontend explicitly checks for and preserves these fields during report format saves to prevent accidental loss.

9. Waiting list ES query includes sampleRedrawFlag = 3

Decision: The waiting list Elasticsearch query uses a should clause to include reports where sampleRedrawFlag = 0 OR sampleRedrawFlag = 3 (rerun requested).

Rationale:

Reports with an active rerun must appear in the device waiting list so the interfacing app can send re-test commands to the instrument
Including sampleRedrawFlag = 3 in the query (rather than a separate query) keeps the waiting list unified
The rerun_params metadata enrichment in prepare_reports() attaches instrument machine names to these reports, enabling targeted re-testing

Trade-off: Reports with sampleRedrawFlag = 4 (values received, awaiting confirmation) are excluded from the device waiting list — they no longer need instrument interaction. This is intentional but could confuse users who expect to see all rerun-related reports in one place. The "Active Reruns" sidebar view shows all rerun states.

Design Decisions

On this page