OpenCV Vs. ML Kit: A FOSS Android Scanner

by Alex Johnson 42 views

The Quest for True FOSS: Replacing ML Kit Document Scanner with OpenCV

In the world of Free and Open Source Software (FOSS), every line of code matters, and so does every dependency. For our Android application, we've been using the ML Kit Document Scanner API to bring you a seamless document edge detection experience in our Minimum Viable Product (MVP). It's been fantastic, offering top-notch accuracy and a user experience that feels just right. However, this convenience comes with a catch: a dependency on Google Play Services (GMS). This dependency, while powerful, puts a roadblock in our path toward becoming a fully FOSS application. We believe in empowering all users, regardless of their device or privacy preferences, and that means breaking free from proprietary chains.

This isn't just about ticking a box for FOSS compliance; it's about accessibility and user trust. Devices without Google Play Services, like many Huawei devices, or those running custom ROMs, are currently left out. Privacy-conscious users who prefer an entirely offline experience are also affected. Our goal is to ensure that everyone can enjoy the full capabilities of our app, making it truly universal and respecting individual choices about digital privacy. The current reliance on ML Kit, while technically sound, fundamentally clashes with the core FOSS philosophy that guides our development. It's about more than just the code; it's about the freedom and control we want to give back to our users. Therefore, we're embarking on a journey to replace the ML Kit Document Scanner with a robust, open-source alternative: OpenCV.

The Challenge: GMS Dependency and FOSS Principles

The primary hurdle we face is the GMS dependency inherent in using ML Kit. This dependency, though often invisible to the end-user, creates significant limitations. It renders the document scanning feature unusable on a range of devices, including popular Huawei devices which notoriously lack GMS. Furthermore, users who opt for custom ROMs or privacy-focused distributions of Android, often deliberately excluding Google services, will find this core functionality absent. This creates a fragmented user experience and limits our reach, directly contradicting our aim to be a truly inclusive application.

Beyond device compatibility, this dependency runs counter to the very ethos of FOSS. The principles of free and open-source software champion transparency, user freedom, and the right to modify and distribute software without proprietary encumbrances. By relying on a Google-provided, closed-source component, we compromise these ideals. Our commitment to FOSS isn't merely a technical choice; it's a philosophical one. We want our users to have confidence that the software they use respects their privacy and digital autonomy. The current situation, where a core feature is tied to a proprietary service, erodes that trust. Many users are increasingly concerned about data privacy and prefer solutions that operate entirely offline, without any communication with external services, especially those owned by large tech corporations. Our current implementation, while functional, doesn't fully align with these user expectations or our own vision for a privacy-first, offline-first application.

The Solution: Embracing OpenCV for Edge Detection

Our proposed solution is to implement a document edge detection mechanism using OpenCV, a powerful and widely-used open-source computer vision library. This approach, detailed in our roadmap for Version 1.1, aligns perfectly with our FOSS goals and offers a host of benefits. By leveraging OpenCV, we can achieve robust document scanning capabilities entirely on the device, eliminating the need for Google Play Services and ensuring our app remains GMS-free. This move will make our application compatible with all Android devices, from flagship models to those with minimal pre-installed services, and importantly, it will empower users who prioritize privacy and offline functionality.

The technical implementation will follow a well-established pipeline, drawing inspiration from proven methods. First, the image captured by the camera will undergo preprocessing. This typically involves converting the image to grayscale to simplify processing, applying a Gaussian blur to reduce noise and smooth out imperfections, and then using Otsu's thresholding to create a binary image, effectively separating the document from the background. Following this, Canny edge detection will be employed to precisely identify the edges of the document. We'll utilize standard thresholds (e.g., 50 and 150) to fine-tune the edge detection sensitivity.

Next, we'll use findContours() to detect closed curves within the image. These contours represent potential document boundaries. To ensure we're identifying the main document, we'll sort these contours by area, prioritizing the largest ones. For accuracy, we'll use approxPolyDP() to approximate the contours into polygons, with a small perimeter tolerance (around 2%) to smooth out jagged edges. Our primary target will be 4-sided polygons, which indicate a rectangular document shape. Once a suitable polygon is identified, a perspective transformation, also known as a 4-point transform, will be applied to