Refactor OCR Service Using Factory Pattern In TicketScanModal
In this comprehensive guide, we'll explore how to refactor the Optical Character Recognition (OCR) service implementation within the TicketScanModal component. Specifically, we will implement the Factory Pattern using OCRServiceFactory to decouple the modal from concrete OCR service implementations. This article will walk you through the problem, proposed solution, implementation plan, benefits, and acceptance criteria.
Problem: The Tight Coupling of OCR Service in TicketScanModal
Currently, the TicketScanModal component directly instantiates GeminiVisionOCRService and requires knowledge of implementation details such as API keys and model names. This tight coupling introduces several challenges, making the system less flexible and harder to maintain.
Understanding the Current Implementation
The existing code in TicketScanModal.tsx (lines 23-25) demonstrates this tight coupling:
// Current implementation in TicketScanModal.tsx (lines 23-25)
const apiKey = import.meta.env.VITE_GEMINI_API_KEY
const model = import.meta.env.VITE_GEMINI_MODEL || 'gemini-2.0-flash'
const ocrService = new GeminiVisionOCRService(apiKey, model)
This direct instantiation leads to several issues:
- Tight Coupling: The
TicketScanModalis tightly coupled to the concrete OCR service implementation (GeminiVisionOCRService). Any changes to the OCR service directly impact the modal. - Single Responsibility Principle (SRP) Violation: The modal takes on configuration responsibilities, which do not belong to it. The modal's primary responsibility should be handling the user interface and interactions, not configuring services.
- Testing Difficulties: Testing the modal becomes challenging as tests need to mock environment variables to simulate different configurations. This adds complexity and fragility to the test suite.
- Lack of Portability: Switching OCR providers requires modifying the component's internal implementation. This makes it difficult to adopt new OCR technologies or switch between providers based on cost or performance considerations.
These issues highlight the need for a more flexible and maintainable solution, which leads us to the proposed solution of implementing a Factory Pattern.
Proposed Solution: Implementing the Factory Pattern
To address the issues of tight coupling and inflexibility, we propose implementing the Factory Pattern. This design pattern centralizes the creation and configuration of OCR services, decoupling the TicketScanModal from specific implementations. The Factory Pattern provides an interface for creating objects without specifying their concrete classes. This allows for greater flexibility and maintainability.
The core of this solution is the OCRServiceFactory, which will be responsible for creating instances of different OCR service implementations based on the application's configuration. This approach offers several benefits, including improved testability, easier maintenance, and the ability to switch OCR providers with minimal code changes.
Key Components of the Solution
- OCRService Interface: An interface defining the contract for all OCR service implementations. This ensures that all OCR services provide a consistent API.
- Concrete OCR Service Implementations: Classes such as
GeminiVisionOCRServiceandOllamaVisionOCRServicethat implement theOCRServiceinterface. - OCRServiceFactory: The factory class responsible for creating instances of OCR services based on the configured provider.
By introducing the Factory Pattern, we encapsulate the logic for creating OCR service instances, making the TicketScanModal more focused on its primary responsibility: handling the user interface for ticket scanning.
Implementation Plan: Step-by-Step Guide
To implement the Factory Pattern, we will follow a structured approach, breaking down the task into manageable steps. This plan ensures that the refactoring process is systematic and that each component is correctly implemented and tested.
Step 1: Create OCRServiceFactory
The first step is to create the OCRServiceFactory class. This class will contain the logic for instantiating different OCR service implementations based on the configured provider. The factory will read environment variables to determine which OCR service to create.
// src/infrastructure/factories/OCRServiceFactory.ts
import type { OCRService } from '../../application/ports/OCRService'
import { GeminiVisionOCRService } from '../services/ocr/GeminiVisionOCRService'
import { OllamaVisionOCRService } from '../services/ocr/OllamaVisionOCRService'
export class OCRServiceFactory {
static create(): OCRService {
const provider = import.meta.env.VITE_OCR_PROVIDER
switch (provider) {
case 'gemini': {
const apiKey = import.meta.env.VITE_GEMINI_API_KEY
const model = import.meta.env.VITE_GEMINI_MODEL || 'gemini-2.0-flash'
if (!apiKey) {
throw new Error('VITE_GEMINI_API_KEY is required for Gemini provider')
}
return new GeminiVisionOCRService(apiKey, model)
}
case 'ollama': {
const baseUrl = import.meta.env.VITE_OLLAMA_BASE_URL || 'http://localhost:11434'
const model = import.meta.env.VITE_OLLAMA_MODEL || 'llava'
return new OllamaVisionOCRService(baseUrl, model)
}
default:
throw new Error(`Unknown OCR provider: ${provider}. Supported: 'gemini', 'ollama'`)
}
}
}
This factory supports two OCR providers: gemini and ollama. It reads environment variables such as VITE_OCR_PROVIDER, VITE_GEMINI_API_KEY, VITE_GEMINI_MODEL, VITE_OLLAMA_BASE_URL, and VITE_OLLAMA_MODEL to configure the OCR service. If the provider is not recognized or the required environment variables are missing, the factory throws an error. This ensures that the application fails gracefully if the configuration is incorrect.
Step 2: Update TicketScanModal
The next step is to update the TicketScanModal component to use the OCRServiceFactory. This involves removing the direct instantiation of GeminiVisionOCRService and replacing it with a call to the factory.
// src/presentation/components/TicketScanModal.tsx
import { OCRServiceFactory } from '../../infrastructure/factories/OCRServiceFactory'
// Replace lines 23-25 with:
const ocrService = OCRServiceFactory.create()
const productRepository = new LocalStorageProductRepository()
By using the factory, the TicketScanModal no longer needs to know the implementation details of the OCR service. It simply requests an OCRService instance from the factory. This significantly reduces the coupling between the modal and the OCR service.
Step 3: Add Tests for the Factory
Testing the OCRServiceFactory is crucial to ensure that it correctly creates OCR service instances based on the configuration. We need to add tests to verify that the factory creates the correct service for each provider and that it handles missing or invalid configurations gracefully.
// src/test/infrastructure/factories/OCRServiceFactory.test.ts
describe('OCRServiceFactory', () => {
it('should create GeminiVisionOCRService when provider is gemini', () => {
// Test implementation
})
it('should create OllamaVisionOCRService when provider is ollama', () => {
// Test implementation
})
it('should throw error for unknown provider', () => {
// Test implementation
})
it('should throw error when Gemini API key is missing', () => {
// Test implementation
})
})
The tests should cover the following scenarios:
- Creating a
GeminiVisionOCRServicewhen the provider is set togemini. - Creating an
OllamaVisionOCRServicewhen the provider is set toollama. - Throwing an error when an unknown provider is specified.
- Throwing an error when the Gemini API key is missing.
These tests ensure that the factory is robust and that it correctly handles different configurations.
Benefits of the Refactoring
Refactoring the OCR service implementation using the Factory Pattern offers several significant benefits:
- ✅ Single Responsibility: The
TicketScanModalis now solely responsible for its primary function – handling the user interface. It no longer needs to handle the configuration of OCR services. - ✅ Open/Closed Principle: It's now easier to add new OCR providers without modifying the
TicketScanModal. This aligns with the Open/Closed Principle, which states that software entities should be open for extension but closed for modification. - ✅ Testability: The
OCRServiceFactorycan be tested independently, and the tests forTicketScanModalno longer need to mock environment variables. This simplifies testing and makes the test suite more robust. - ✅ Centralized Configuration: All OCR setup logic is now centralized in the
OCRServiceFactory, making it easier to manage and maintain. - ✅ Better Error Handling: The factory can validate the configuration upfront and throw meaningful errors if something is misconfigured.
These benefits collectively contribute to a more maintainable, flexible, and robust system.
Acceptance Criteria
To ensure that the refactoring is successful, we need to define clear acceptance criteria. These criteria serve as a checklist to verify that the implementation meets the requirements and that the system functions as expected.
- [ ] Create
OCRServiceFactorywith support for Gemini and Ollama providers. - [ ] Update
TicketScanModalto use the factory. - [ ] Add comprehensive tests for the factory.
- [ ] Update existing
TicketScanModaltests if needed. - [ ] All existing tests continue to pass (376 tests).
- [ ] App functionality remains unchanged.
Meeting these criteria ensures that the refactoring is complete and that the system remains stable and functional.
Related Files
src/presentation/components/TicketScanModal.tsx(lines 23-26)src/infrastructure/services/ocr/GeminiVisionOCRService.tssrc/infrastructure/services/ocr/OllamaVisionOCRService.ts
These files are central to the refactoring effort and should be carefully reviewed and updated.
Priority and Notes
The priority for this refactoring is Low, as it addresses technical debt and improves code quality rather than directly affecting functionality. However, it's an important step towards creating a more maintainable and scalable system.
This task is considered a good-first-issue for contributors familiar with the Factory Pattern. It provides an opportunity to contribute to a real-world codebase and gain experience with refactoring techniques.
Conclusion
Refactoring the OCR service implementation using the Factory Pattern in TicketScanModal is a crucial step towards improving the system's architecture. By decoupling the modal from concrete OCR service implementations, we enhance testability, maintainability, and flexibility. This article has provided a detailed guide on the problem, proposed solution, implementation plan, benefits, and acceptance criteria for this refactoring effort. By following this guide, developers can confidently implement the Factory Pattern and contribute to a more robust and scalable system.
For further reading on design patterns, particularly the Factory Pattern, you might find valuable resources on sites like Refactoring.Guru.