Streamlined Thinking UI For Reasoning Models
In the realm of reasoning models and AI-driven interfaces, user experience is paramount. One crucial aspect of this experience is how the model's "thinking" process is presented to the user. This article delves into the implementation of a streamlined, collapsible thinking UI for reasoning models, drawing inspiration from the intuitive design of OpenWebUI. We'll explore the challenges of displaying the model's thought process, the proposed solutions, and the technical details involved in bringing this feature to life.
The Challenge: Displaying the "Thinking" Phase
Reasoning models, which mimic human thought processes to solve complex problems, often go through a "thinking" phase before arriving at a final answer. During this phase, the model explores various possibilities, evaluates different solutions, and refines its reasoning. Traditionally, in GUI chat interfaces, this "thinking" phase isn't streamed to the user. Instead, the system waits for the entire process to complete and then streams the final output. This approach can leave users in the dark, wondering what the model is doing and how it's arriving at its conclusions.
The goal is to mirror the user-friendly experience of platforms like OpenWebUI, where the "thinking" part is displayed in a collapsible "Thinking" section. This allows users to see the model's thought process in real-time, providing valuable insights into its reasoning and building trust in its capabilities. Users can then choose to expand this section if they want to delve deeper into the model's thought process, or keep it collapsed for a cleaner interface.
Implementation Plan: A Step-by-Step Approach
To implement this feature, a multi-faceted approach is required, involving both frontend and backend modifications. Here’s a detailed breakdown of the implementation plan:
1. Update SSE Parser to Extract reasoning_content
The first step involves modifying the parseSSEStream function in useGglibRuntime.ts. Currently, this function parses Server-Sent Events (SSE) streams and extracts the main content. The updated function will need to yield an object containing both content and reasoningContent fields from each SSE chunk, instead of just a string. This will allow the frontend to differentiate between the main output and the reasoning content.
Why is this important? SSE is a crucial technology for streaming real-time data from the server to the client. By updating the SSE parser, we can ensure that the reasoning content is extracted and made available to the frontend for display.
2. Modify Runtime Adapter to Track Thinking Separately
Next, the runtime adapter needs to be modified to track the thinking content separately from the main content. This involves accumulating the thinkingContent alongside the mainContent. Additionally, start and end timestamps for the thinking phase need to be tracked to calculate the duration of the process. Finally, the thinking content will be embedded as <think>...</think> tags within the main content upon completion. This ensures that the thinking content is persisted and can be displayed even after the session is reloaded.
Why is this important? By tracking the thinking content separately, we can provide a more granular view of the model's reasoning process. The timestamps allow us to display the duration of the thinking phase, providing users with a sense of the model's processing time. Embedding the content in tags ensures persistence, allowing users to revisit the model's thought process at a later time.
3. Create Thinking Parser Utility
A new utility function, parseThinkingContent(text), will be created in src/utils/thinkingParser.ts. This function will be responsible for extracting the <think>...</think> block from stored messages. It will return an object with two properties: thinking (the thinking content as a string or null if not found) and content (the main content). The function will also handle inline <think> tags in streaming content as a fallback mechanism.
Why is this important? This utility function provides a centralized way to parse and extract the thinking content from stored messages. The fallback mechanism ensures that the thinking content is displayed even if the reasoning_content field is absent in the SSE stream.
4. Create ThinkingBlock Component
A new React component, ThinkingBlock, will be created to render the collapsible thinking section. This component will feature a chevron toggle for expanding and collapsing the section. During the streaming phase, it will display a "Thinking..." label. Once the thinking phase is complete, it will display a "Thought for Xs" label, where X represents the duration of the thinking process. The component will also use markdown rendering to display the thinking content inside the collapsible section. By default, the section will start collapsed.
Why is this important? This component provides a reusable and visually appealing way to display the thinking content. The collapsible design allows users to focus on the main output while still having access to the model's reasoning process. The markdown rendering ensures that the thinking content is displayed in a readable and consistent format.
5. Update AssistantMessageBubble
The AssistantMessageBubble component, which is responsible for displaying the model's responses, will be updated to incorporate the ThinkingBlock component. The updated component will use the parseThinkingContent function to extract the thinking content from stored messages. If thinking content is present, the ThinkingBlock component will be rendered above the main content.
Why is this important? This ensures that the thinking content is displayed in the correct context, alongside the model's main output. By integrating the ThinkingBlock component into the AssistantMessageBubble, we can provide a seamless user experience.
6. Add CSS Styles
Finally, CSS styles will be added to enhance the visual appearance and user experience of the thinking UI. This includes styles for the collapse/expand animation, the thinking block (muted background, border), and the chevron rotation transition.
Why is this important? CSS styles are crucial for creating a visually appealing and intuitive user interface. The animations and transitions provide visual feedback to the user, making the interaction more engaging and enjoyable.
Technical Details: Diving Deeper
To ensure a smooth implementation, several technical details need to be considered:
- Llama-server Integration: The llama-server, responsible for serving the reasoning models, extracts thinking content from various model formats (DeepSeek, Qwen3, GPT-OSS, etc.). It provides this content in the
delta.reasoning_contentfield during streaming. - Frontend Trust: The frontend trusts the server to handle model-specific format detection. This simplifies the frontend logic and allows for greater flexibility in supporting different models.
- Persistence: Thinking content is persisted as
<think>...</think>tags in the message content for display on reload. This ensures that the user can revisit the model's thought process even after the session is closed and reopened. - Fallback Parser: A fallback parser handles
<think>tags in content whenreasoning_contentis absent (e.g., when the--reasoning-format noneflag is used). This ensures that the thinking content is displayed even in cases where the server doesn't provide it in the standard format.
Benefits of a Streamlined Thinking UI
Implementing a streamlined thinking UI for reasoning models offers numerous benefits:
- Improved User Experience: By displaying the model's thought process in real-time, users gain a better understanding of how the model arrives at its conclusions. This transparency builds trust and confidence in the model's capabilities.
- Enhanced Debugging: The thinking UI can be a valuable tool for debugging and understanding the model's behavior. By examining the thinking content, developers can identify potential issues and improve the model's performance.
- Increased Engagement: The collapsible design allows users to delve deeper into the model's thought process when they are interested, while keeping the interface clean and uncluttered when they are not.
- Consistency: By mirroring the user-friendly experience of platforms like OpenWebUI, this feature ensures a consistent and intuitive user experience across different applications.
Conclusion: A Step Towards More Transparent AI
Implementing a streamed, collapsible thinking UI for reasoning models is a significant step towards creating more transparent and user-friendly AI systems. By providing users with insights into the model's thought process, we can build trust, enhance debugging capabilities, and create a more engaging and intuitive user experience. This feature, inspired by the design of OpenWebUI, represents a commitment to creating AI systems that are not only powerful but also understandable and accessible.
For further information on UI/UX design best practices, you can explore resources like the Nielsen Norman Group.