Knowledge-Aware Agent Architecture: Prevent Contamination

by Alex Johnson 58 views

Executive Summary

This article discusses how to transform QuestFoundry's agent prompt system from static, contamination-prone templates to a dynamic, knowledge-aware architecture. This new architecture enables agents to intelligently request only the information they need via tools, preventing example contamination and reducing token usage by 50-70%. By implementing a knowledge-aware system, we can ensure that agents are more efficient and accurate in their operations.

Background & Context

QuestFoundry Architecture Overview

QuestFoundry is a multi-agent interactive fiction authoring system with a unique "Cartridge Architecture" where specifications ARE executable code. The system has:

  • 15 specialized studio roles (Showrunner, Plotwright, Scene Smith, etc.) implemented as LLM agents
  • 6-layer specification architecture from vision (Layer 0) to runtime (Layer 6)
  • Hot/Cold Source of Truth pattern for managing creative work vs player-safe content
  • 12 production loops orchestrating agent collaboration

The current architecture of QuestFoundry is designed to facilitate collaborative storytelling through multiple agents, each with specialized roles. The layered architecture ensures a structured approach to content creation, from the initial vision to the final runtime execution. The Hot/Cold Source of Truth pattern helps in managing and differentiating between creative and player-safe content, ensuring a balanced and safe user experience. This complex system requires a robust and efficient way to manage knowledge and prevent contamination between projects.

Current State (As of 2025-11-24)

  • ✅ All 16 roles migrated to "thin template" architecture (commit 58769ce)
  • ✅ Static role knowledge separated into YAML prompt_content fields
  • ✅ Templates now purely structural, pulling content from YAML
  • ❌ Still includes examples that cause contamination
  • ❌ Prompts include too much context (~2500 tokens)
  • ❌ Agents can't selectively access spec knowledge

While significant progress has been made in migrating roles to a thin template architecture and separating static role knowledge, the system still faces challenges. The inclusion of examples in prompts leads to contamination, where project-specific details leak into other projects. Additionally, the excessive context in prompts results in high token usage, and agents lack the ability to selectively access specific knowledge. Addressing these issues is crucial for improving the efficiency and reliability of the system. The primary goal is to minimize contamination and optimize the use of resources, making the agents more effective in their respective roles.

The Contamination Problem

Issue: AI agents interpret examples as instructions, causing project-specific details (e.g., "badge scanners" from the Dock 7 project) to leak into other projects.

Root Cause: Examples mixed with instructions in prompts, without clear separation.

Contamination is a significant problem in AI agent systems, particularly when examples are mixed with instructions. This issue arises because AI agents tend to interpret examples as strict rules rather than mere illustrations. For instance, if a project includes specific details like "badge scanners," the agent might inadvertently incorporate this element into other unrelated projects. The root cause of this problem is the lack of clear separation between examples and instructions in the prompts. This mixing can lead to agents learning and applying project-specific details inappropriately, thereby compromising the integrity and uniqueness of different projects. To mitigate this, it's essential to establish a robust system that clearly distinguishes between instructional content and illustrative examples, ensuring that agents learn general principles without adopting specific project quirks.

Architectural Solution

Core Concept: Knowledge Registry with Tool-Based Access

Instead of embedding all knowledge in prompts, agents will:

  1. Start with minimal core instructions
  2. Have awareness of available knowledge
  3. Use tools to request specific information when needed
  4. Never see examples mixed with instructions

The core concept of the proposed architectural solution is to implement a knowledge registry with tool-based access. This approach aims to move away from embedding all knowledge directly into prompts, which often leads to contamination and inefficiency. Instead, agents will start with a minimal set of core instructions, providing them with a foundational understanding without overwhelming them with project-specific details. Agents will then be made aware of the available knowledge resources, acting as a directory of what information they can access. To retrieve specific information, agents will use specialized tools, ensuring they only access the data they need for the task at hand. Crucially, this system is designed to prevent agents from directly seeing examples mixed with instructions, thereby mitigating the risk of contamination. By adopting this approach, we can ensure that agents operate with greater precision, efficiency, and reduced risk of incorporating unintended biases or project-specific details.

Architecture Components

1. Knowledge Registry System

Central repository indexing all spec knowledge, categorized by:

  • Always Loaded: Core principles, spoiler hygiene (~200 tokens)
  • Role Specific: Charters, briefs, owned quality bars (~300 tokens)
  • Loop Catalog: List of available loops (awareness only) (~100 tokens)
  • On Demand: Schemas, protocols, examples, guides (via tools)

The Knowledge Registry System serves as the central repository for all specification knowledge within QuestFoundry. This system indexes and categorizes knowledge to ensure that agents can efficiently access the information they need without being overwhelmed by irrelevant data. The knowledge is divided into several categories:

  • Always Loaded: This category includes core principles and guidelines, such as spoiler hygiene, which are essential for all agents to maintain the integrity of the storytelling process. The token usage for this category is kept minimal, around 200 tokens, to ensure quick and efficient loading.
  • Role Specific: This category contains information relevant to specific agent roles, such as charters, briefs, and owned quality bars. This ensures that each agent has the necessary context for their particular tasks, using approximately 300 tokens.
  • Loop Catalog: This category provides a list of available loops, giving agents awareness of the different processes they can engage in. This awareness-only category uses about 100 tokens, focusing on providing an overview rather than detailed information.
  • On Demand: This category includes more detailed information such as schemas, protocols, examples, and guides. Access to this knowledge is provided via tools, allowing agents to request specific information as needed. This just-in-time approach minimizes the risk of contamination and keeps prompts concise.

By categorizing knowledge in this way, the Knowledge Registry System ensures that agents have access to the right information at the right time, optimizing their performance and reducing the risk of errors.

2. Knowledge Access Tools

Agents can invoke tools to retrieve specific knowledge:

  • list_available_loops() - Get all loops with descriptions
  • get_loop_details(loop_id) - Full loop definition
  • get_artifact_schema(type) - JSON schema for artifact
  • get_quality_bar_details(bar) - Quality bar criteria
  • search_glossary(term) - Look up terminology
  • find_relevant_examples(task) - Get MARKED reference examples

To facilitate efficient and targeted knowledge retrieval, Knowledge Access Tools are implemented within the system. These tools allow agents to invoke specific functions to obtain the precise information they need, enhancing their ability to perform tasks effectively and reducing the risk of information overload. The following tools are available:

  • list_available_loops(): This tool provides a comprehensive list of all available loops along with their descriptions, enabling agents to understand the different processes they can engage in.
  • get_loop_details(loop_id): Agents can use this tool to retrieve the full definition of a specific loop, identified by its loop_id. This ensures they have all the necessary details to execute the loop correctly.
  • get_artifact_schema(type): This tool allows agents to access the JSON schema for a particular type of artifact, ensuring they adhere to the correct structure and format when creating or modifying artifacts.
  • get_quality_bar_details(bar): Agents can use this tool to obtain the criteria for a specific quality bar, helping them to understand the standards they need to meet in their work.
  • search_glossary(term): This tool enables agents to look up terminology, ensuring they use consistent and correct language in their interactions and outputs.
  • find_relevant_examples(task): This tool is crucial for accessing reference examples related to a specific task. Importantly, these examples are MARKED to prevent them from being misinterpreted as instructions, thus mitigating the risk of contamination.

By providing these tools, the system ensures that agents can access the knowledge they need in a structured and controlled manner, promoting both efficiency and accuracy.

3. Dynamic Prompt Builder

Replaces static template rendering with context-aware assembly:

  • Builds minimal base prompt
  • Assigns role-appropriate tools
  • Tracks token usage per section
  • Maintains clear instruction/example boundary

The Dynamic Prompt Builder is a crucial component of the new architecture, designed to replace the static template rendering with a more flexible and context-aware assembly process. This builder constructs prompts dynamically based on the specific needs of the agent and the task at hand, ensuring that prompts are both efficient and effective. The key features of the Dynamic Prompt Builder include:

  • Builds Minimal Base Prompt: The builder creates a base prompt that includes only the essential instructions and context needed for the agent to start working. This minimizes the token usage and reduces the risk of overwhelming the agent with unnecessary information.
  • Assigns Role-Appropriate Tools: The builder assigns tools to the agent based on its role and the requirements of the task. This ensures that the agent has access to the specific tools it needs to retrieve additional knowledge and perform its functions effectively.
  • Tracks Token Usage Per Section: The builder tracks the token usage for each section of the prompt, allowing for optimization and ensuring that prompts stay within the token limits of the language model. This helps in managing costs and improving the efficiency of the system.
  • Maintains Clear Instruction/Example Boundary: One of the most critical functions of the builder is to maintain a clear boundary between instructions and examples. This prevents the agent from misinterpreting examples as instructions, thereby mitigating the risk of contamination and ensuring that the agent learns general principles rather than specific project details.

By implementing the Dynamic Prompt Builder, the system can create prompts that are tailored to the specific needs of each agent and task, promoting both efficiency and accuracy in the agents' operations. This context-aware approach is essential for building a robust and scalable knowledge-aware agent architecture.

Implementation Guide

See full implementation guide with code examples in the repository's NewIssue.md file.

Progress Tracking

Phase 1: Knowledge Infrastructure ⬜

  • [ ] Create lib/runtime/src/questfoundry/runtime/knowledge/ directory
  • [ ] Implement registry.py with KnowledgeRegistry class
  • [ ] Implement search.py with KnowledgeSearch class
  • [ ] Test knowledge indexing from spec/

The first phase of the implementation focuses on establishing the Knowledge Infrastructure, which is the foundation for the entire knowledge-aware agent architecture. This phase involves creating the necessary directory structure and implementing the core classes for the Knowledge Registry System. The key tasks in this phase include:

  • Creating the Directory: The directory lib/runtime/src/questfoundry/runtime/knowledge/ will be created to house all the knowledge-related components. This structured organization ensures that the codebase is maintainable and scalable.
  • Implementing registry.py: This file will contain the KnowledgeRegistry class, which is responsible for indexing and managing all the specification knowledge. The KnowledgeRegistry class will serve as the central repository for accessing knowledge within the system.
  • Implementing search.py: This file will contain the KnowledgeSearch class, which provides the functionality to search for specific knowledge within the registry. The KnowledgeSearch class will enable agents to efficiently find the information they need.
  • Testing Knowledge Indexing: Once the classes are implemented, the next step is to test the knowledge indexing process from the spec/ directory. This involves ensuring that all relevant knowledge is correctly indexed and accessible through the KnowledgeRegistry.

Completing this phase will lay the groundwork for the subsequent phases, ensuring that the system has a robust and efficient way to manage and access knowledge. The successful implementation of the Knowledge Infrastructure is crucial for the overall success of the knowledge-aware agent architecture.

Phase 2: Tool Implementation ⬜

  • [ ] Create lib/runtime/src/questfoundry/runtime/tools/knowledge_tools.py
  • [ ] Implement all 7 knowledge access tools
  • [ ] Add example framing logic
  • [ ] Test tool invocation

The second phase of the implementation focuses on the Tool Implementation, which involves creating and testing the knowledge access tools that agents will use to retrieve specific information. This phase is critical for enabling agents to interact with the Knowledge Registry System and access the knowledge they need in a structured and controlled manner. The key tasks in this phase include:

  • Creating knowledge_tools.py: A new file, knowledge_tools.py, will be created in the lib/runtime/src/questfoundry/runtime/tools/ directory. This file will house all the knowledge access tools.
  • Implementing Knowledge Access Tools: All seven knowledge access tools, such as list_available_loops(), get_loop_details(loop_id), get_artifact_schema(type), get_quality_bar_details(bar), search_glossary(term), and find_relevant_examples(task), will be implemented. Each tool will be designed to retrieve specific types of knowledge from the Knowledge Registry System.
  • Adding Example Framing Logic: This involves implementing logic to frame examples in a way that prevents them from being misinterpreted as instructions. This is crucial for mitigating the risk of contamination.
  • Testing Tool Invocation: Once the tools are implemented, they will be thoroughly tested to ensure they function correctly and retrieve the appropriate information. This includes testing the example framing logic to ensure it effectively prevents contamination.

Completing this phase will provide agents with the means to access knowledge in a targeted and controlled manner, which is essential for the overall efficiency and effectiveness of the knowledge-aware agent architecture. The successful implementation of these tools will significantly enhance the agents' ability to perform their tasks accurately and efficiently.

Phase 3: Dynamic Prompt Builder ⬜

  • [ ] Create lib/runtime/src/questfoundry/runtime/prompt/builder.py
  • [ ] Implement KnowledgeAwarePromptBuilder class
  • [ ] Update NodeFactory to use new builder
  • [ ] Test prompt generation

Phase 3 focuses on implementing the Dynamic Prompt Builder, a critical component for generating context-aware prompts for the agents. This phase aims to replace the static template rendering with a more flexible and efficient approach. The key tasks include:

  • Creating builder.py: A new file, builder.py, will be created in the lib/runtime/src/questfoundry/runtime/prompt/ directory to house the Dynamic Prompt Builder.
  • Implementing KnowledgeAwarePromptBuilder Class: This class will be implemented to handle the dynamic construction of prompts. It will build minimal base prompts, assign role-appropriate tools, track token usage, and maintain a clear boundary between instructions and examples.
  • Updating NodeFactory: The NodeFactory will be updated to use the new KnowledgeAwarePromptBuilder. This ensures that all prompts generated by the system are built dynamically, leveraging the new builder's capabilities.
  • Testing Prompt Generation: Thorough testing will be conducted to ensure that the Dynamic Prompt Builder generates prompts correctly. This includes verifying that prompts are minimal, tools are assigned appropriately, token usage is tracked, and the instruction/example boundary is maintained.

By completing this phase, the system will be equipped to generate prompts that are tailored to the specific needs of each agent and task. This dynamic approach will significantly enhance the efficiency and accuracy of the agents, ensuring they receive the context they need without being overwhelmed by unnecessary information. The successful implementation of the Dynamic Prompt Builder is crucial for the overall effectiveness of the knowledge-aware agent architecture.

Phase 4: Clean Role YAMLs ⬜

  • [ ] Remove examples from all 16 role YAMLs
  • [ ] Add knowledge_access configuration
  • [ ] Add contamination_guards
  • [ ] Validate against schema

Phase 4 involves cleaning the role YAML files to ensure they are free from examples and configured to leverage the new knowledge-aware architecture. This phase is crucial for preventing contamination and ensuring that agents access knowledge through the appropriate channels. The key tasks include:

  • Removing Examples: All examples will be removed from the 16 role YAML files. This eliminates the risk of agents misinterpreting examples as instructions and helps prevent project-specific details from leaking into other projects.
  • Adding knowledge_access Configuration: The knowledge_access configuration will be added to each role YAML file. This configuration specifies which knowledge access tools the agent is allowed to use, ensuring that agents can retrieve the knowledge they need in a controlled manner.
  • Adding contamination_guards: Contamination guards will be added to the role YAML files. These guards help prevent the agent from inadvertently using project-specific details in other contexts, further mitigating the risk of contamination.
  • Validating Against Schema: All changes will be validated against the relevant schema to ensure that the YAML files are correctly formatted and compliant with the system's requirements.

Completing this phase ensures that the role configurations are clean, secure, and optimized for the new knowledge-aware architecture. By removing examples and adding the necessary configurations, the system can effectively prevent contamination and ensure that agents operate efficiently and accurately.

Phase 5: Example Quarantine ⬜

  • [ ] Create spec/07-extensions/examples/ structure
  • [ ] Move all examples to quarantine
  • [ ] Create metadata.yaml with warnings
  • [ ] Ensure examples are never directly included

Phase 5 focuses on Example Quarantine, which involves isolating all examples from the main codebase to prevent contamination. This phase ensures that examples are stored separately and cannot be directly accessed by agents, thereby reducing the risk of misinterpretation and unintended application of project-specific details. The key tasks include:

  • Creating the examples/ Structure: A new directory structure, spec/07-extensions/examples/, will be created to house all examples. This dedicated structure ensures that examples are clearly separated from other types of content.
  • Moving Examples to Quarantine: All existing examples will be moved into the newly created examples/ directory. This centralizes the examples and removes them from the agent's direct line of sight.
  • Creating metadata.yaml: A metadata.yaml file will be created within the examples/ directory. This file will include warnings and instructions on how to use the examples correctly, emphasizing that they should not be directly included in prompts.
  • Ensuring Examples Are Never Directly Included: Measures will be put in place to ensure that examples are never directly included in prompts. This may involve updating the prompt generation logic and adding checks to prevent accidental inclusion.

By completing this phase, the system will effectively isolate examples, minimizing the risk of contamination and ensuring that agents do not misinterpret examples as instructions. This is a crucial step in building a robust and reliable knowledge-aware agent architecture.

Phase 6: Schema Updates ⬜

  • [ ] Update role_profile.schema.json
  • [ ] Add example.schema.json
  • [ ] Validate all changes against schemas

Phase 6 is dedicated to Schema Updates, which involves updating the existing schemas and adding new ones to reflect the changes in the system's architecture. This ensures that all configurations and data structures adhere to the defined standards, maintaining consistency and preventing errors. The key tasks include:

  • Updating role_profile.schema.json: The role_profile.schema.json file will be updated to reflect the new knowledge access configuration and other changes related to agent roles. This ensures that role profiles are correctly structured and validated.
  • Adding example.schema.json: A new schema file, example.schema.json, will be added to define the structure and requirements for examples. This schema will help ensure that examples are stored and managed consistently.
  • Validating All Changes: All changes made to the configuration files and data structures will be validated against the updated and new schemas. This ensures that the system remains consistent and that any errors are identified and addressed promptly.

By completing this phase, the system's schemas will be up-to-date and comprehensive, providing a solid foundation for data integrity and consistency. This is essential for the long-term maintainability and reliability of the knowledge-aware agent architecture.

Phase 7: Testing & Validation ⬜

  • [ ] Run with multiple test projects
  • [ ] Verify zero contamination between projects
  • [ ] Measure token usage reduction
  • [ ] Check tool invocation patterns
  • [ ] Validate output quality

Phase 7 is the final phase, focusing on Testing & Validation. This phase ensures that the implemented knowledge-aware agent architecture functions correctly and meets the defined requirements. Rigorous testing and validation are crucial to verify the system's effectiveness, efficiency, and reliability. The key tasks include:

  • Running with Multiple Test Projects: The system will be tested with multiple test projects to simulate real-world scenarios and ensure it performs effectively across different contexts.
  • Verifying Zero Contamination: A primary goal is to verify that there is zero contamination between projects. This involves checking that project-specific details do not leak into other projects, confirming the effectiveness of the contamination prevention measures.
  • Measuring Token Usage Reduction: Token usage will be measured to quantify the reduction achieved by the new architecture. This helps assess the efficiency of the system in terms of resource consumption.
  • Checking Tool Invocation Patterns: The patterns of tool invocation by agents will be analyzed to ensure that agents are using the tools correctly and efficiently to access knowledge.
  • Validating Output Quality: The quality of the output generated by the agents will be validated to ensure that the new architecture does not negatively impact the agents' performance and the overall quality of the results.

By completing this phase, the system will be thoroughly tested and validated, ensuring that it meets the required standards and performs effectively in real-world scenarios. This final step is crucial for the successful deployment and adoption of the knowledge-aware agent architecture.

Key Benefits

  1. Zero Contamination: Examples completely isolated from instructions
  2. Minimal Context: Base prompts 50-70% smaller
  3. Intelligent Access: Agents request only what they need
  4. Future-Proof: Easy to add genre guides, style libraries, etc.
  5. Traceable: Every knowledge access is logged via tools
  6. Scalable: Can add unlimited knowledge without prompt bloat

The implementation of the knowledge-aware agent architecture offers several key benefits that significantly enhance the system's performance and capabilities:

  1. Zero Contamination: By isolating examples from instructions, the architecture ensures that agents do not misinterpret examples as rules, preventing the leakage of project-specific details into other projects. This is crucial for maintaining the integrity and uniqueness of each project.
  2. Minimal Context: The dynamic prompt builder generates base prompts that are 50-70% smaller than the previous static templates. This reduction in prompt size minimizes token usage, leading to cost savings and faster processing times.
  3. Intelligent Access: Agents can intelligently request only the information they need through the knowledge access tools. This targeted approach ensures that agents are not overwhelmed with unnecessary details and can focus on the relevant knowledge for their tasks.
  4. Future-Proof: The architecture is designed to be future-proof, making it easy to add new knowledge resources such as genre guides and style libraries. This scalability ensures that the system can adapt to evolving needs and incorporate new information seamlessly.
  5. Traceable: Every instance of knowledge access is logged via the knowledge access tools. This traceability provides valuable insights into how agents are using knowledge, facilitating performance monitoring and optimization.
  6. Scalable: The architecture can handle unlimited knowledge without causing prompt bloat. This scalability ensures that the system can grow and evolve without compromising performance.

These benefits collectively contribute to a more efficient, reliable, and scalable system for interactive fiction authoring, positioning QuestFoundry as a leader in the field.

Implementation Notes

This architecture fundamentally reimagines agents as intelligent knowledge consumers rather than passive prompt recipients, eliminating contamination while enabling sophisticated, context-aware behavior.

This architecture represents a fundamental shift in how agents operate within QuestFoundry. By reimagining agents as intelligent knowledge consumers, the system moves away from the traditional model of passive prompt recipients. This new approach not only eliminates contamination but also enables sophisticated, context-aware behavior. Agents are now equipped to actively seek out and utilize the specific knowledge they need, resulting in more efficient and accurate performance. The focus on context-aware behavior ensures that agents can adapt to different situations and tasks, leveraging their knowledge in a dynamic and intelligent manner. This paradigm shift is crucial for building a robust and scalable system that can handle complex tasks and evolving requirements.

The complete implementation guide with detailed code examples, file structures, and testing procedures is available in the repository's NewIssue.md file which was created for this task.

For more information on AI agent architecture and contamination prevention, check out this resource on OpenAI's documentation. This external link provides additional context and best practices for developing intelligent agent systems.