Enhancing Gateway & Core Loop Flows For Better DX

Dec 3, 2025 by Alex Johnson 50 views

Improving the Developer Experience (DX) is crucial for the efficiency and usability of any system. This document outlines enhancements to the Gateway/WorldService logging and CLI in the context of paper and Core Loop flows, aiming to make these flows more intuitive and less prone to user confusion. The goal is to address existing pain points and ensure that developers have a smoother and more informative experience when working with these tools. Let's dive in, shall we?

The Problem: Current DX Artefacts

The current state of the Gateway/WorldService flows, particularly in the beta-factory v0 experience, presents several unfriendly DX artefacts that need addressing. These issues hinder the developer workflow and can lead to unnecessary frustration and time wasted. Understanding these problems is the first step toward finding solutions. One significant issue is the generation of full stack traces when a GET /worlds/__default__/describe request returns a 404, especially during the initial setup phase. This error, while often recoverable, can appear alarming and confusing to new users. Another point of concern is the Gateway startup warnings related to commit-log writers being disabled in local development environments. While this behavior is expected, the warnings can be misinterpreted as serious errors, leading to unnecessary debugging efforts. Furthermore, inconsistencies in naming conventions for snapshots and log messages, specifically those related to metrics, P&L, and series lengths (e.g., *_sample_count), pose a challenge. These inconsistencies make it difficult to interpret the data and integrate it with downstream tools. Addressing these issues will create a better experience.

The Core Loop Roadmap and Evaluation Run

Leveraging the Core Loop roadmap and the evaluation run/metrics design is vital. These documents, such as docs/ko/architecture/core_loop_roadmap.md and docs/ko/design/worldservice_evaluation_runs_and_metrics_api.md, provide the blueprint for what should happen within the system. However, the focus now shifts to how these processes are presented through logs and CLI in paper/backtest flows. The aim is to ensure that developers can easily understand the status and performance of their evaluations. The goal is to make these interactions more transparent and user-friendly, allowing developers to quickly understand the state of their systems.

Goals: Making Flows Visibly Healthy

The primary goals are to ensure that the first-run and paper-mode flows appear healthy and user-friendly. This involves several key improvements: One of the crucial goals is to prevent 404 errors for worlds not yet created from appearing as critical failures. The system should recognize this as a normal part of the initialization process. Also, it's essential to clearly label expected local-only behaviors, such as the commit-log being disabled, so that developers understand these messages are not a cause for concern. Aligning log and CLI wording with the Core Loop and evaluation run model is another significant goal. This alignment will allow operators to easily track the progress of each evaluation. Operators should quickly understand which world, strategy, and run they are evaluating. Also, they should immediately determine if the evaluation is ongoing or completed. The last one is the standardization of terminology for snapshot and metric outputs. This standardization will enable downstream tools to rely on predictable keys, improving data interpretation and integration.

Alignment with Core Loop and Evaluation Run Model

Ensuring that the logging and CLI outputs align with the Core Loop and evaluation run model is paramount. This alignment helps operators track the progress and status of their evaluations efficiently. When evaluation runs are created, it is vital to log specific identifiers such as evaluation_run_id, world_id, strategy_id, and status transitions in a structured manner. This structured approach allows for easier monitoring and debugging. Furthermore, the CLI should surface these identifiers and status vocabulary in outputs like qmtl world run-status and qmtl world snapshot. This way, developers can easily correlate logs and CLI outputs. Finally, consistent field names in snapshots and metrics are critical. For instance, using consistent names like returns_sample_count, pnl_sample_count, and *_head will ensure that both logs and .qmtl_snapshots/*.json match the design. This standardization simplifies data analysis and integration.

Proposed Direction: A Step-by-Step Approach

To achieve the outlined goals, the following high-level direction is proposed:

World Bootstrap & Default World Ergonomics

The initial interaction with the system can be improved by treating GET /worlds/__default__/describe 404 errors as a standard case where a world has not yet been created. This means logging the error at the INFO/WARN level instead of generating a full exception stack trace. The system should optionally emit a concise message like world __default__ not found; creating with sandbox policy. Consider providing a CLI helper, such as qmtl world ensure-default --policy sandbox. Documenting this helper as part of the local paper-mode recipe would streamline the setup process for new users.

Logging Levels and Messages for Local Dev

It is essential to triage noisy stack traces and warnings on the happy path. The focus should be on mitigating stack traces and warnings that are triggered during the normal execution flow, especially for local environments. This includes the commit-log writers being disabled in local setups and transient 404/409 conditions expected during first-run evaluation. Explicit phrases like for local dev this is expected should be included in messages that are safe to ignore in non-production environments. This approach will reduce developer confusion and streamline the debugging process.

Align Logs/CLI with Evaluation Run & Metrics Model

Integrating the evaluation run and metrics model into the logging and CLI output will significantly improve usability. When evaluation runs are created, the logs should include evaluation_run_id, world_id, strategy_id, and status transitions. The CLI output should surface these identifiers and status vocabulary in commands such as qmtl world run-status and qmtl world snapshot. Consistent field names in snapshots/metrics (e.g., returns_sample_count, pnl_sample_count, *_head) should be adopted so that logs and .qmtl_snapshots/*.json match the design. This consistency simplifies data analysis and integration with other tools.

Docs Tweaks for Paper/Core Loop Flows

A short section should be added to the ops/guides, such as "First paper run" or "Local dev paper-mode." This section should provide a clear sequence of steps (start WS/GW → submit to __default__/sandbox world → inspect run-status/metrics/snapshot). It should also call out which warnings can be safely ignored and which indicate real misconfiguration. This documentation will make it easier for new users to get started and troubleshoot issues.

Open Questions: Addressing Uncertainties

Several open questions remain that require careful consideration:

The Balancing Act of INFO vs. WARN

Deciding where to draw the line between INFO and WARN levels for first-run 404s and local-only disabled components is critical. The aim is to provide enough information without overwhelming users with unnecessary alerts. Finding the right balance will improve the user experience.

Separate Logger Category for DX-Related Messages

Should a separate logger category (e.g., qmtl.dx) be introduced for developer-experience-related messages? This could help separate DX-related messages from other system logs, making it easier for developers to filter and focus on relevant information.

Exposing Evaluation Run IDs in CLI Output

How should evaluation_run IDs be exposed in CLI output without overwhelming users with IDs? Providing too much information can be counterproductive. The key is to present the IDs in a way that is helpful without cluttering the output. This could involve using concise formatting or providing options to display more detailed information.

Acceptance Criteria: Ensuring Improvements

To ensure that the proposed changes are effective, the following acceptance criteria (draft) have been defined:

When executing qmtl submit against a fresh local stack, the system should no longer emit intimidating stack traces for the normal "world does not exist yet" case. Instead, the logs should clearly state that a default world/policy is being created.
Local dev setups with disabled commit-log writers should emit a clear, non-alarming message indicating that this behavior is expected.
Snapshot/metrics outputs should use consistent field names (e.g., *_sample_count) and align with the evaluation run/metrics design document.
The basic docs/guides should reference the improved CLI/log behavior for paper-mode Core Loop flows.

By addressing these acceptance criteria, the team can ensure that the DX improvements are tangible and beneficial to developers.

In conclusion, enhancing the Gateway/WorldService logging and CLI for paper and Core Loop flows is essential for improving the overall developer experience. By addressing the identified issues and implementing the proposed solutions, the team can create a more user-friendly and efficient workflow. The open questions highlight the areas that need further discussion and refinement, but the proposed direction provides a solid foundation for achieving these goals.

For more in-depth information on Core Loop, you can check out the Core Loop design documents here. This will help you to understand more about the architecture and the way that the system works.