Verifying Self-Driving State Machine With TLA+
In the rapidly evolving landscape of autonomous driving, ensuring the safety and reliability of self-driving systems is paramount. Formal verification, a rigorous technique for mathematically proving the correctness of a system's design, plays a crucial role in achieving this goal. This article delves into the application of TLA+, a powerful formal specification language, to verify the primary engagement/disengagement state machine within an open-source self-driving platform.
Overview: Formal Verification for Self-Driving Systems
Self-driving systems rely on complex state machines to manage transitions between different operational modes, such as disabled, preEnabled, and enabled. The correctness of these state machines is critical, as errors can lead to unpredictable and potentially dangerous behavior. Formal verification using TLA+ offers a systematic way to ensure that the state machine adheres to its intended behavior under all possible scenarios.
Target Component: The State Machine
Our focus is on the StateMachine class, specifically the selfdrive/selfdrived/state.py file. This component governs the primary engagement and disengagement logic of the self-driving system. The state machine operates across five key states:
disabled: The system is inactive and not controlling the vehicle.preEnabled: A transitional state where the system is preparing for engagement.enabled: The system is actively controlling the vehicle.softDisabling: A state triggered by a soft disable event, initiating a controlled disengagement.overriding: A state where the driver is overriding the system's control.
Why TLA+ is Ideal for State Machine Verification
TLA+ stands out as an excellent choice for formally verifying state machines due to its inherent strengths:
- Natural Mapping of State Machines: The concept of state machines, with their discrete states and transitions, aligns seamlessly with TLA+'s action-based approach. This allows for a direct and intuitive representation of the system's behavior.
- Branching Temporal Logic (CTL) Verification: TLA+ supports the verification of branching temporal properties using Computation Tree Logic (CTL). This capability is essential for expressing and verifying complex requirements about the system's future behavior, such as ensuring that the system eventually reaches a safe state after a specific event.
- Event-Driven Transitions: Self-driving systems are inherently event-driven, with state transitions triggered by various inputs and conditions. TLA+'s ability to model actions and their effects makes it well-suited for capturing these event-driven dynamics.
- Modeling Timers: The
softDisablingstate often involves a timer to manage the disengagement process. TLA+'s support for natural numbers makes it easy to model and verify timer-based behaviors, ensuring that the timer counts down correctly and triggers the appropriate actions.
Safety and Liveness Properties: What We Aim to Verify
To ensure the state machine's correctness, we need to verify both safety and liveness properties. Safety properties assert that the system will never enter an undesirable state, while liveness properties guarantee that the system will eventually reach a desired state.
Invariants: Maintaining Safe States
Invariants are safety properties that must hold true in all reachable states of the system. We aim to verify the following invariants:
- Disabled State Persistence: Once the system enters the
disabledstate, it should remain disabled until an explicitENABLEevent occurs. This prevents unintended system activation. - Immediate Disable Priority: An
IMMEDIATE_DISABLEevent should always take precedence and transition the system to thedisabledstate from any current state. This ensures a rapid and reliable shutdown in critical situations. - User Disable Priority: A
USER_DISABLEevent should also take precedence, except in the immediate aftermath of anIMMEDIATE_DISABLEevent. This allows the driver to reliably disengage the system. - Soft Disable Timer Monotonicity: The soft disable timer countdown should be monotonic and bounded, ranging from 0 to
SOFT_DISABLE_TIME. This ensures that the timer operates correctly and triggers the disengagement process as intended. - Override Safety: No state should allow simultaneous lateral and longitudinal override without the appropriate event flags. This prevents conflicting control inputs and ensures safe handling of override scenarios.
Liveness Properties: Guaranteeing Progress
Liveness properties ensure that the system makes progress and eventually reaches desired states. We aim to verify the following liveness properties:
- Soft Disable Termination: The soft disabling process should eventually terminate, with the timer always reaching 0. This guarantees that the system will eventually complete the disengagement process.
- Transition to Disabled: The system should be able to transition from any state back to the
disabledstate. This ensures that the system can always be brought to a safe, inactive state. - PreEnabled Transition: The
preEnabledstate should eventually transition to either theenabledordisabledstate. This prevents the system from getting stuck in an intermediate state.
Event Ordering Safety: Prioritizing Critical Events
Ensuring the correct ordering of events is crucial for safety. We aim to verify the following event ordering safety properties:
- Disable Priority Hierarchy: The priority hierarchy for disable events should be:
USER_DISABLE/IMMEDIATE_DISABLE>SOFT_DISABLE> other events. This ensures that the most critical disable events are handled first. - Event Type Combination Validation: The system should validate event type combinations, ensuring that incompatible events (e.g.,
NO_ENTRYpreventingENABLE) are handled correctly. This prevents invalid state transitions. - Alert Propagation: Alert types should be properly propagated alongside state changes. This ensures that the driver is informed of the system's status and any potential issues.
Deliverables: The Outcome of Formal Verification
The formal verification process will produce the following deliverables:
formal/tla/StateMachine.tla: The TLA+ specification of the state machine, capturing its behavior and properties.formal/tla/StateMachine.cfg: The TLC model checker configuration file, specifying the model checking parameters and properties to be verified.- Documentation of Verified Properties: A clear and concise documentation of the properties that have been formally verified, providing assurance of the system's correctness.
- Instructions for Running the Model Checker: Step-by-step instructions on how to run the TLA+ model checker (TLC) to reproduce the verification results.
Getting Started with TLA+
For those new to TLA+, the following resources can be helpful:
- TLA+ Toolbox: This is the official Integrated Development Environment (IDE) for TLA+. It provides a user-friendly interface for writing, editing, and verifying TLA+ specifications. The toolbox can be downloaded from the TLA+ Toolbox website.
- Learning TLA+: There are numerous online resources available for learning TLA+, including tutorials, blog posts, and books. A good starting point is the official TLA+ website, which offers comprehensive documentation and examples.
Conclusion: The Importance of Formal Verification
Formal verification using TLA+ is a powerful tool for ensuring the safety and reliability of self-driving systems. By mathematically proving the correctness of the primary state machine, we can gain confidence in the system's behavior and reduce the risk of unexpected failures. This rigorous approach is essential for building trust in autonomous driving technology and paving the way for its widespread adoption. As the complexity of self-driving systems continues to increase, formal verification will play an increasingly critical role in ensuring their safety and dependability.
For further reading on formal verification and TLA+, consider exploring resources from trusted websites like The TLA+ Home Page. This will give you a more in-depth understanding of these concepts.