Design Decisions
Architectural choices, trade-offs, and rationale for MS Word Integration implementation
Design Decisions
This document outlines the key architectural decisions, trade-offs, and rationale behind the MS Word Integration feature implementation.
Core Architecture Decisions
1. Windows-Only Implementation
Decision: Restrict the feature to Windows operating systems only.
Rationale:
- Microsoft Word's
winword:protocol handler is Windows-specific - Word automation APIs are more mature on Windows
- Primary user base (radiology labs) predominantly uses Windows
- Cross-platform alternatives (LibreOffice, Google Docs) have compatibility issues
Trade-offs:
- Pro: Native Word integration, reliable protocol handling
- Con: Platform limitation, excludes Mac/Linux users
- Mitigation: Clear system requirements documentation
2. Session-Based Architecture
Decision: Implement session-based editing with exclusive access control.
Rationale:
- Prevents concurrent edits that could cause data loss
- Maintains data integrity during external editing
- Provides clear ownership and responsibility
- Enables real-time collaboration features
Alternatives Considered:
- File Locking: Simple but doesn't handle network disconnections
- Version Control: Complex for real-time editing
- Concurrent Editing: Risk of merge conflicts
3. S3-Based File Storage
Decision: Use AWS S3 for all file storage operations.
Rationale:
- Scalable, durable, and cost-effective storage
- Built-in security features (encryption, access control)
- Integration with existing CrelioHealth infrastructure
- Presigned URLs for secure temporary access
Trade-offs:
- Pro: Reliable, secure, globally distributed
- Con: Vendor lock-in, potential latency for large files
- Mitigation: Abstraction layer allows future storage provider changes
Technical Implementation Choices
1. Placeholder Replacement Strategy
Decision: Use regex-based placeholder replacement with {{variable}} syntax.
Rationale:
- Simple, readable syntax familiar to users
- Easy to implement and maintain
- Compatible with Word's text processing
- Allows for complex data transformations
Implementation:
PLACEHOLDER_PATTERN = re.compile(r"{{\s*(\w+)\s*}}")Alternatives Considered:
- Template Engines: Jinja2, Django templates (too complex for Word documents)
- XML Manipulation: Direct OOXML editing (fragile, version-dependent)
- Custom Syntax: Alternative delimiters (less intuitive)
2. JWT-Based Authentication
Decision: Use JWT tokens for session authentication.
Rationale:
- Stateless authentication suitable for distributed systems
- Built-in expiration handling
- Can encode user permissions and session context
- Industry standard for API authentication
Token Payload:
{
"user_id": 123,
"lab_report_id": 456,
"session_id": 789,
"permissions": ["read", "write"],
"exp": 1640995200
}3. Asynchronous File Processing
Decision: Process file uploads asynchronously using background tasks.
Rationale:
- Large file uploads don't block the API
- Better user experience with immediate feedback
- Allows for complex processing (PDF conversion, validation)
- Scales better under load
Implementation: Celery tasks for file processing pipeline.
Data Model Decisions
1. Separate External Report Edits Table
Decision: Create dedicated external_report_edits table instead of extending existing tables.
Rationale:
- Clean separation of concerns
- Avoids polluting core lab report tables
- Enables future extensibility
- Better audit trail and reporting
Schema Design:
- Foreign key to
labReportRelation - File paths stored as text fields
- Session metadata (author, timestamps)
- Status flags for session management
2. Denormalized Placeholder Data
Decision: Pre-compute and cache placeholder data during session creation.
Rationale:
- Faster template processing
- Reduces database queries during editing
- Handles complex data relationships
- Improves user experience
FIELDS_FOR_PLACEHOLDERS = [
("patient_name", "userDetailsId.fullName"),
("test_name", "reportID_id.testName"),
# ... more mappings
]Security Decisions
1. Token-Based File Access
Decision: Use presigned S3 URLs with short expiration for file access.
Rationale:
- No permanent public URLs
- Time-limited access reduces attack surface
- No server-side file serving overhead
- Compatible with Word's file opening mechanisms
Security Measures:
- 15-minute URL expiration
- User-specific access tokens
- IP-based restrictions where possible
2. File Type Validation
Decision: Strict file type validation using magic bytes and extension checking.
Rationale:
- Prevents malicious file uploads
- Ensures compatibility with downstream processing
- Protects against file-based attacks
Validation Rules:
- DOCX: Check OOXML signature
- PDF: Check PDF header
- Maximum file sizes enforced
Performance Decisions
1. Streaming File Processing
Decision: Use streaming for large file operations.
Rationale:
- Memory-efficient processing of large documents
- Handles files up to 100MB without issues
- Better scalability for concurrent users
def replace_placeholders_in_docx_stream(docx_stream: BytesIO, context: dict) -> BytesIO:
# Process in memory without loading entire file2. Database Indexing Strategy
Decision: Composite index on (lab_report_id, author_id) for performance.
Rationale:
- Frequent queries by lab report
- Author-based access control
- Supports session conflict resolution
Error Handling Decisions
1. Graceful Degradation
Decision: Fail gracefully with user-friendly error messages.
Rationale:
- Better user experience during failures
- Clear guidance for issue resolution
- Prevents system crashes from propagating to users
Error Categories:
- Validation Errors: Clear messages for user correction
- System Errors: Generic messages with support contact
- Network Errors: Retry mechanisms with user feedback
2. Comprehensive Logging
Decision: Log all operations for debugging and audit purposes.
Rationale:
- Essential for troubleshooting production issues
- Required for compliance and security audits
- Enables performance monitoring and optimization
Log Levels:
- INFO: Normal operations (session creation, file uploads)
- WARN: Recoverable errors (network timeouts, validation failures)
- ERROR: System failures requiring attention
Future-Proofing Decisions
1. Modular Architecture
Decision: Design with clear separation between components.
Rationale:
- Easy to extend with new features
- Allows for alternative implementations
- Supports different file formats in the future
Modular Components:
- Session management (pluggable storage)
- File processing (extensible format support)
- Authentication (configurable providers)
2. API Versioning
Decision: Use /api-v3/ prefix for all endpoints.
Rationale:
- Consistent with existing API versioning
- Allows for future API evolution
- Clear deprecation path for old versions
Migration and Deployment Decisions
1. Zero-Downtime Deployment
Decision: Design for backward compatibility during deployments.
Rationale:
- Minimizes impact on production operations
- Allows for gradual feature rollout
- Supports canary deployments
Implementation:
- Database migrations with rollback capability
- Feature flags for gradual enablement
- Monitoring for performance regression
2. Database Migration Strategy
Decision: Use Django migrations with proper rollback scripts.
Rationale:
- Automated schema changes
- Version-controlled database evolution
- Safe rollback in case of issues
Migration Example:
# Forward migration
operations = [
migrations.CreateModel(
name='ExternalReportEdits',
fields=[...]
),
]
# Reverse migration
operations = [
migrations.DeleteModel(
name='ExternalReportEdits',
),
]