Product EngineeringFeaturesFile Type Report Upload

Design Decisions

Architectural rationale, dual-path reality, transport vs domain responsibilities, and explicit caveats for File Type Report Revamp.

👤 Rucha Mahesh Kulkarni📅 Updated: Mar 18, 2026📁 File Type Report Upload

Design Decisions

This page states what was decided, what was traded away, and what must not be confused when reading livehealth-frontend, livehealthapp, and crelio-app together. It complements Overview facts with engineering intent and deployment-shape caveats.

Architectural Rationale

The legacy file-type upload path treated the application server as a byte pipe:

  • large POST bodies
  • memory spikes decoding base64
  • CPU spent validating/restructuring binaries inside the web worker lifecycle
  • tighter effective ceilings on “how big a report PDF can be” because the transport hop was the bottleneck

The revamp’s primary intent is storage-transport decoupling:

move the binary from “HTTP through Django” to “HTTP directly into object storage,” while keeping domain transitions (report value write, completion flags, audit, search, realtime) on trusted backend code.

That is materially different from “make uploads async” or “delete backend work” - the backend still performs the authoritative state mutation in the observed stack.

Robustness, Reusability, and Tradeoffs

DimensionHow this design behaves
RobustnessExplicit stages (presign → S3 → reconcile → finalize) surface partial failures instead of silent inconsistency; reconcile closes the SM log loop; encrypted PDFs fail fast on the client.
ReusabilityStorage Manager centralizes presigned contracts, logging, path rules, and vendor abstractions so file-type upload is not a one-off S3 hack; FileTypeReport category is reused by device/interfacing flows, not only the modal.
TradeoffsMore moving parts in the browser (orchestration, retries, UX for orphan objects if finalize fails after S3 success); header/footer keeps a legacy high-payload path for parity; PY-3 + Lambda improve decoupling but infra and auth become part of the correctness story.

Architecture / Design Decisions

Documented choices and implications for the revamp, the dual finalization surfaces (PY-2 vs PY-3/Lambda), and operator-visible forks (plain upload vs header/footer).

1: presigned POST as the default happy-path contract

Choice: Storage Manager issues a short-lived presigned upload (fields + URL) rather than accepting the file on uploadFileTypeReport in the main UI flow.

Rationale:

  • removes app servers from the GB-scale failure domain of buffering uploads
  • aligns with S3’s strengths (high-throughput object ingest)
  • centralizes account selection, extension policy, and logging at SM boundary

Tradeoff:

  • two-phase client orchestration (presign → POST → reconcile → finalize) increases FE complexity and failure surfaces (partial success if finalize fails after S3 success).
  • eventual consistency window requires reconcile semantics and sometimes short delays.

2: keep PY-2 as the observed domain finalizer

Choice: Current livehealth-frontend calls uploadFileTypeReport/ on PY-2 with initialFilePath, not exclusively PY-3 report-value updaters.

Rationale:

  • PY-2 already owned Pusher, patient ES reindex, activity logs, and radiology / permission guards in production-hardened code paths.
  • Presigned transport is orthogonal to where the domain transaction commits first in a phased rollout.

Implication for architecture discussions:

  • “We moved to PY-3 for file reports” may be true in some environments, but is not proven by the inspected FE happy path alone.
  • Engineers should speak precisely: “binary off PY-2 HTTP; domain finalize still PY-2 in observed FE.”

3: introduce PY-3 Lambda-facing helpers without requiring Lambda in-repo

Choice: Expose authenticated endpoints that:

  • translate temporary uploaded paths to canonical SM paths
  • update report_results and related lab report state
  • rewrite SM logs across temp → final

Rationale:

  • enables async validation / normalization (virus scan, PDF repair, page rasterization) without blocking the browser session
  • enables path normalization (including extension policy) closer to Storage Manager v2 rules

Explicit non-decision in this repo snapshot:

  • the Lambda handler, IAM, S3 event wiring, and idempotency strategy are out of tree here.
  • documentation describes contracts; runtime behavior requires the infra repo or AWS console truth.

4: retain a legacy header/footer path

Choice: “Apply header and footer” continues to use uploadFileTypeReportWithHeader → backend uploadFileWithHeader with base64 payload semantics.

Rationale:

  • header/footer composition may still depend on server-side PDF manipulation libraries and templates not ported to a pure-presigned flow.
  • feature parity > uniform transport for a lower-volume branch.

Tradeoff:

  • this branch reintroduces payload pressure; large files will still stress the legacy path.
  • operators may report “presigned works for plain upload but not with headers” - that is expected given the fork.

5: client-side encrypted-PDF rejection

Choice: reject encrypted PDFs in the browser before presign.

Rationale:

  • downstream storage and print pipelines historically assume decrypt-at-rest is not required for operator uploads.
  • failing fast avoids paying presign + S3 + reconcile costs for guaranteed-bad outcomes.

Tradeoff:

  • false positives/negatives depend on the client PDF probe quality; edge PDFs may disagree with server tooling.

6: treat file-type reports as a cross-surface pattern

Choice: reuse file_category="FileTypeReport" outside manual UI (e.g. interfacing/device flows).

Rationale:

  • “file type report” is a data shape (path as result), not only a modal feature.

Implication:

  • SM logs and S3 layout debugging must not assume only UploadFileTypeReportModal writers.

Temporary path vs canonical path (design axis)

Temporary path (frontend-generated FileTypeReports/... style):

  • easy to correlate with a single upload attempt
  • keeps presign logs namespaced away from patient-canonical prefixes until business finalization

Canonical path (IN/<lab>/<patient>/FileTypeReport/... style):

  • aligns with patient document conventions, search, retention, and cross-feature linking

Observed PY-2 path may persist the temporary path as ReportValue.value unless a later process rewrites - engineers must verify actual stored strings per deployment.

PY-3 helper path is explicitly designed to rewrite temp → final and update logs accordingly - when Lambda is engaged.

Failure-mode philosophy

The system prefers explicit multi-step failure over silent inconsistency:

  • presign failure → no S3 object; no finalize
  • S3 success + reconcile failure → logs incomplete; operators/debuggers look at SM ES
  • S3 success + finalize failure → orphan object risk; mitigation is operational (cleanup jobs) + client retry UX

Documenting these stages separately (see Workflow Guide) is intentional - it matches observability boundaries.

Final design statement

The File Type Report Revamp is best categorized as:

Transport decoupling + preserved domain authority + optional async normalization layer.

It is not “remove Django from file reports.” It is “stop using Django as a CDN for multi-megabyte PDFs on the default operator path, while Django (PY-2 today) still decides what ‘done’ means for the lab report row and its indexes.”

On this page