Fixing CSV Import Security: WebSocket Replacement Guide

by Alex Johnson 56 views

In this article, we'll dive deep into a critical issue identified within the CSV import workflow. The current implementation uses a custom WebSocket, which unfortunately violates the v2.4.1 API contract in several significant ways. We’re talking security vulnerabilities, protocol breaches, and reliability concerns. Specifically, we'll be focusing on replacing this problematic custom WebSocket with the robust and compliant WebSocketProgressManager. Let’s break down the problem, explore the root causes, and outline the solution to achieve a more secure and reliable system.

The Problem: A Breakdown of Critical Issues

The core issue revolves around the current custom WebSocket implementation used in the CSV import workflow. This custom approach has led to three primary violations of the v2.4.1 API contract, each with its own set of risks and impacts. These violations are not just minor hiccups; they represent significant problems that need immediate attention.

  1. πŸ”΄ SECURITY: Authentication Token Exposure

    The current system exposes the authentication token in the query parameters of the WebSocket connection URL. This seemingly small detail has major security implications. The URL, including the token, can be logged by servers, proxies, and even appear in error reports. Anyone with access to these logs could potentially steal the token and gain unauthorized access. This is a major security hole that needs to be plugged immediately.

  2. πŸ”΄ PROTOCOL: Missing HTTP/1.1 Enforcement

    The custom WebSocket implementation doesn't enforce the HTTP/1.1 protocol, which is a requirement for proper WebSocket upgrades. This is not merely a matter of protocol purity; it can lead to actual connection failures. The WebSocket protocol, as defined in RFC 6455, relies on specific HTTP headers for the upgrade process. Without enforcing HTTP/1.1, the upgrade may fail, and the WebSocket connection will not be established.

  3. πŸ”΄ RELIABILITY: Lack of Reconnection Support

    Unstable networks are a fact of life. The current custom WebSocket lacks any reconnection support, leading to a poor user experience. If the connection drops, the import process simply fails. There is no automatic retry mechanism, no exponential backoff, and no attempt to synchronize state with the backend. This means that users are left with a broken import and no clear path to recovery. This lack of reliability is a major usability issue.

Severity: These issues combined result in a HIGH severity rating due to the security vulnerabilities and protocol violations. The impact is broad, ranging from potential token leakage to connection failures and a generally unreliable experience.

Current Grade: Our API contract compliance score stands at a disappointing 5.5/10. This highlights the urgency of addressing these issues.

Root Cause: Why the Custom Implementation Failed

The root cause of these problems lies in the decision to implement a custom WebSocket in GeminiCSVImportView.swift. The relevant code spans lines 328-680. This custom implementation duplicates functionality that already exists in the WebSocketProgressManager, but unfortunately, it does so incorrectly. Instead of leveraging a proven and compliant solution, the custom code introduces vulnerabilities and protocol violations.

A clear comparison highlights the issue:

  • Shelf scan uses WebSocketProgressManager β†’ 8.5/10 compliance βœ…
  • CSV import uses custom WebSocket β†’ 5.5/10 compliance ❌

This stark contrast shows the value of using the established WebSocketProgressManager. It’s a battle-tested component designed to handle WebSocket connections securely and reliably.

Diving Deeper: The Critical Issues Explained

Let's break down each critical issue with specific code examples and explanations.

Issue 1: Insecure Authentication (CRITICAL)

Location: GeminiCSVImportView.swift:331-334

The problematic code snippet looks like this:

// ❌ CURRENT: Token in query string (SECURITY VIOLATION)
var components = URLComponents(string: "\(EnrichmentConfig.webSocketBaseURL)/ws/progress")!
components.queryItems = [
    URLQueryItem(name: "jobId", value: jobId),
    URLQueryItem(name: "token", value: token)  // ← Visible in logs!
]

This code constructs the WebSocket URL by embedding the authentication token directly in the query string. While this might seem like a convenient way to pass the token, it's a major security blunder. Query string parameters are often logged by servers, proxies, and even client-side JavaScript. This means the token is exposed in plain text, making it vulnerable to interception and misuse.

The risk is that the token becomes visible in server logs, proxy logs, and error reports. This exposure makes it significantly easier for malicious actors to steal the token and gain unauthorized access to the system.

Issue 2: Missing HTTP/1.1 Enforcement (CRITICAL)

Location: GeminiCSVImportView.swift:349-352

The following code is responsible for creating the WebSocket task:

// ❌ CURRENT: Allows HTTP/2 (breaks WebSocket)
let session = URLSession(configuration: .default)
let webSocketTask = session.webSocketTask(with: wsURL)

The issue here is that the URLSession is created with the default configuration, which allows HTTP/2. While HTTP/2 is generally a good thing, it's incompatible with the RFC 6455 WebSocket upgrade mechanism. WebSocket upgrades rely on specific HTTP headers that are defined in the HTTP/1.1 specification. When HTTP/2 is used, these headers may be handled differently, leading to a failed upgrade and a broken WebSocket connection. This problem is directly related to Issue #227 in our tracking system.

Issue 3: No Reconnection Support (HIGH)

Location: GeminiCSVImportView.swift:328-441

This issue isn't tied to a specific line of code, but rather to the overall lack of reconnection logic in the custom WebSocket implementation. The current code simply attempts to establish a connection once and then gives up if the connection fails. There's no attempt to reconnect, no exponential backoff, and no mechanism to synchronize the job state with the backend. Specifically, the following are missing:

  • No reconnect=true parameter for backend state sync
  • No exponential backoff retry mechanism
  • Falls back to HTTP polling instead of WebSocket retry, which is less efficient and reliable

This lack of reconnection support leads to a frustrating user experience, especially on unstable networks. Users may experience import failures without a clear understanding of what went wrong or how to fix it.

The Recommended Solution: Embrace WebSocketProgressManager

The recommended solution is straightforward: replace the custom WebSocket implementation (lines 328-680) with the WebSocketProgressManager. This component is specifically designed to handle WebSocket connections securely and reliably. It already incorporates best practices for authentication, protocol compliance, and reconnection.

The code to implement this change is surprisingly concise:

// βœ… Reuse compliant, battle-tested implementation (~30 lines)
private func startWebSocketProgress(jobId: String, token: String) {
    webSocketTask = Task {
        let wsManager = WebSocketProgressManager()
        
        // Handles auth + HTTP/1.1 + reconnection automatically
        _ = try await wsManager.establishConnection(jobId: jobId, token: token)
        try await wsManager.configureForJob(jobId: jobId)
        try await wsManager.sendReadySignal()
        
        wsManager.setProgressHandler { [weak self] progress in
            // Handle progress updates
        }
    }
}

This snippet demonstrates how easily the WebSocketProgressManager can be integrated. It handles authentication, HTTP/1.1 enforcement, and reconnection automatically, significantly simplifying the code and improving its reliability.

Key Benefits of Using WebSocketProgressManager

Switching to WebSocketProgressManager brings a wealth of benefits, addressing the critical issues we've identified and improving the overall quality of the CSV import workflow.

Security

  • βœ… Secure Sec-WebSocket-Protocol header authentication: The WebSocketProgressManager uses the Sec-WebSocket-Protocol header to securely transmit the authentication token, avoiding the vulnerabilities associated with query string parameters.
  • βœ… Keychain token storage: The component is designed to work with secure token storage mechanisms like the Keychain, further enhancing security.
  • βœ… No token leakage: By using secure authentication methods and avoiding query string parameters, the risk of token leakage is significantly reduced.

Protocol Compliance

  • βœ… HTTP/1.1 enforcement: The WebSocketProgressManager explicitly enforces the HTTP/1.1 protocol, ensuring proper WebSocket upgrades.
  • βœ… Proper WebSocket upgrade: By adhering to the HTTP/1.1 standard, the component guarantees a successful WebSocket upgrade, establishing a reliable connection.

Reliability

  • βœ… Automatic reconnection: The WebSocketProgressManager includes a robust reconnection mechanism with exponential backoff, ensuring the connection is re-established even on unstable networks. It attempts up to 5 retries with increasing delays between each attempt.
  • βœ… State sync via reconnect=true: The component utilizes the reconnect=true parameter to synchronize the job state with the backend upon reconnection, ensuring data consistency.
  • βœ… Connection limit handling: The WebSocketProgressManager is designed to handle connection limits gracefully, preventing issues caused by excessive connections.

Code Quality

  • βœ… Eliminates ~350 lines of duplicated code: Replacing the custom WebSocket with WebSocketProgressManager removes a significant amount of duplicated code, making the codebase cleaner and easier to maintain.
  • βœ… Consistent with shelf scan workflow: Using the same component as the shelf scan workflow ensures consistency and reduces the risk of introducing new bugs.

Acceptance Criteria: Ensuring a Successful Implementation

To ensure that the solution is implemented correctly and addresses all the identified issues, the following acceptance criteria must be met:

  • [ ] Replace custom WebSocket with WebSocketProgressManager in GeminiCSVImportView.swift (lines 328-680).
  • [ ] Verify that the authentication token is NOT visible in debug logs or network traffic.
  • [ ] Thoroughly test reconnection functionality on a poor or unstable network connection.
  • [ ] Verify that assumesHTTP3Capable = false is explicitly set to prevent HTTP/3 usage, ensuring compatibility.
  • [ ] Test the CSV import process from start to finish, verifying that it completes successfully.
  • [ ] Test error handling scenarios to ensure the application behaves gracefully in case of failures.

Implementation Estimate and Risk Assessment

Effort: The estimated effort to implement this solution is 4-6 hours. This is a relatively small investment considering the significant security and reliability improvements it brings.

Risk: The risk associated with this change is low. The WebSocketProgressManager is a proven component that has been successfully used in the shelf scan workflow. This reduces the likelihood of introducing new issues.

Conclusion: A Path to a More Secure and Reliable System

Replacing the custom WebSocket implementation with WebSocketProgressManager is a critical step towards improving the security, reliability, and maintainability of the CSV import workflow. By addressing the identified security vulnerabilities, protocol violations, and lack of reconnection support, we can create a more robust and user-friendly experience.

This solution aligns with the recommendations from the API Contract v2.4.1 Compliance Review and is considered a HIGH priority due to the security implications. The target is to move from a 5.5/10 to an 8.5/10 API compliance score.

By following the recommended solution and meeting the acceptance criteria, we can ensure a successful implementation and a significantly improved CSV import process.

For more information on WebSocket security best practices, you can visit the OWASP WebSocket Security Cheat Sheet.