Product EngineeringFeaturesMS Word IntegrationBackend

Design Decisions

Architectural choices, trade-offs, and rationale for MS Word Integration implementation

👤 Aditya Naresh📅 Updated: Mar 18, 2026🏷️ reporting

Design Decisions

This document outlines the key architectural decisions, trade-offs, and rationale behind the MS Word Integration feature implementation.

Core Architecture Decisions

1. Windows-Only Implementation

Decision: Restrict the feature to Windows operating systems only.

Rationale:

  • Microsoft Word's winword: protocol handler is Windows-specific
  • Word automation APIs are more mature on Windows
  • Primary user base (radiology labs) predominantly uses Windows
  • Cross-platform alternatives (LibreOffice, Google Docs) have compatibility issues

Trade-offs:

  • Pro: Native Word integration, reliable protocol handling
  • Con: Platform limitation, excludes Mac/Linux users
  • Mitigation: Clear system requirements documentation

2. Session-Based Architecture

Decision: Implement session-based editing with exclusive access control.

Rationale:

  • Prevents concurrent edits that could cause data loss
  • Maintains data integrity during external editing
  • Provides clear ownership and responsibility
  • Enables real-time collaboration features

Alternatives Considered:

  • File Locking: Simple but doesn't handle network disconnections
  • Version Control: Complex for real-time editing
  • Concurrent Editing: Risk of merge conflicts

3. S3-Based File Storage

Decision: Use AWS S3 for all file storage operations.

Rationale:

  • Scalable, durable, and cost-effective storage
  • Built-in security features (encryption, access control)
  • Integration with existing CrelioHealth infrastructure
  • Presigned URLs for secure temporary access

Trade-offs:

  • Pro: Reliable, secure, globally distributed
  • Con: Vendor lock-in, potential latency for large files
  • Mitigation: Abstraction layer allows future storage provider changes

Technical Implementation Choices

1. Placeholder Replacement Strategy

Decision: Use regex-based placeholder replacement with {{variable}} syntax.

Rationale:

  • Simple, readable syntax familiar to users
  • Easy to implement and maintain
  • Compatible with Word's text processing
  • Allows for complex data transformations

Implementation:

PLACEHOLDER_PATTERN = re.compile(r"{{\s*(\w+)\s*}}")

Alternatives Considered:

  • Template Engines: Jinja2, Django templates (too complex for Word documents)
  • XML Manipulation: Direct OOXML editing (fragile, version-dependent)
  • Custom Syntax: Alternative delimiters (less intuitive)

2. JWT-Based Authentication

Decision: Use JWT tokens for session authentication.

Rationale:

  • Stateless authentication suitable for distributed systems
  • Built-in expiration handling
  • Can encode user permissions and session context
  • Industry standard for API authentication

Token Payload:

{
  "user_id": 123,
  "lab_report_id": 456,
  "session_id": 789,
  "permissions": ["read", "write"],
  "exp": 1640995200
}

3. Asynchronous File Processing

Decision: Process file uploads asynchronously using background tasks.

Rationale:

  • Large file uploads don't block the API
  • Better user experience with immediate feedback
  • Allows for complex processing (PDF conversion, validation)
  • Scales better under load

Implementation: Celery tasks for file processing pipeline.

Data Model Decisions

1. Separate External Report Edits Table

Decision: Create dedicated external_report_edits table instead of extending existing tables.

Rationale:

  • Clean separation of concerns
  • Avoids polluting core lab report tables
  • Enables future extensibility
  • Better audit trail and reporting

Schema Design:

  • Foreign key to labReportRelation
  • File paths stored as text fields
  • Session metadata (author, timestamps)
  • Status flags for session management

2. Denormalized Placeholder Data

Decision: Pre-compute and cache placeholder data during session creation.

Rationale:

  • Faster template processing
  • Reduces database queries during editing
  • Handles complex data relationships
  • Improves user experience

Data Structure:

FIELDS_FOR_PLACEHOLDERS = [
    ("patient_name", "userDetailsId.fullName"),
    ("test_name", "reportID_id.testName"),
    # ... more mappings
]

Security Decisions

1. Token-Based File Access

Decision: Use presigned S3 URLs with short expiration for file access.

Rationale:

  • No permanent public URLs
  • Time-limited access reduces attack surface
  • No server-side file serving overhead
  • Compatible with Word's file opening mechanisms

Security Measures:

  • 15-minute URL expiration
  • User-specific access tokens
  • IP-based restrictions where possible

2. File Type Validation

Decision: Strict file type validation using magic bytes and extension checking.

Rationale:

  • Prevents malicious file uploads
  • Ensures compatibility with downstream processing
  • Protects against file-based attacks

Validation Rules:

  • DOCX: Check OOXML signature
  • PDF: Check PDF header
  • Maximum file sizes enforced

Performance Decisions

1. Streaming File Processing

Decision: Use streaming for large file operations.

Rationale:

  • Memory-efficient processing of large documents
  • Handles files up to 100MB without issues
  • Better scalability for concurrent users

Implementation:

def replace_placeholders_in_docx_stream(docx_stream: BytesIO, context: dict) -> BytesIO:
    # Process in memory without loading entire file

2. Database Indexing Strategy

Decision: Composite index on (lab_report_id, author_id) for performance.

Rationale:

  • Frequent queries by lab report
  • Author-based access control
  • Supports session conflict resolution

Error Handling Decisions

1. Graceful Degradation

Decision: Fail gracefully with user-friendly error messages.

Rationale:

  • Better user experience during failures
  • Clear guidance for issue resolution
  • Prevents system crashes from propagating to users

Error Categories:

  • Validation Errors: Clear messages for user correction
  • System Errors: Generic messages with support contact
  • Network Errors: Retry mechanisms with user feedback

2. Comprehensive Logging

Decision: Log all operations for debugging and audit purposes.

Rationale:

  • Essential for troubleshooting production issues
  • Required for compliance and security audits
  • Enables performance monitoring and optimization

Log Levels:

  • INFO: Normal operations (session creation, file uploads)
  • WARN: Recoverable errors (network timeouts, validation failures)
  • ERROR: System failures requiring attention

Future-Proofing Decisions

1. Modular Architecture

Decision: Design with clear separation between components.

Rationale:

  • Easy to extend with new features
  • Allows for alternative implementations
  • Supports different file formats in the future

Modular Components:

  • Session management (pluggable storage)
  • File processing (extensible format support)
  • Authentication (configurable providers)

2. API Versioning

Decision: Use /api-v3/ prefix for all endpoints.

Rationale:

  • Consistent with existing API versioning
  • Allows for future API evolution
  • Clear deprecation path for old versions

Migration and Deployment Decisions

1. Zero-Downtime Deployment

Decision: Design for backward compatibility during deployments.

Rationale:

  • Minimizes impact on production operations
  • Allows for gradual feature rollout
  • Supports canary deployments

Implementation:

  • Database migrations with rollback capability
  • Feature flags for gradual enablement
  • Monitoring for performance regression

2. Database Migration Strategy

Decision: Use Django migrations with proper rollback scripts.

Rationale:

  • Automated schema changes
  • Version-controlled database evolution
  • Safe rollback in case of issues

Migration Example:

# Forward migration
operations = [
    migrations.CreateModel(
        name='ExternalReportEdits',
        fields=[...]
    ),
]

# Reverse migration
operations = [
    migrations.DeleteModel(
        name='ExternalReportEdits',
    ),
]

On this page