Documentation Index
Fetch the complete documentation index at: https://pycrm.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Reward Machines & Counting Reward Machines
Reward Machines provide a structured way to specify complex tasks in reinforcement learning using formal automata that map sequences of symbolic events to rewards. Counting Reward Machines extend Reward Machines with counters that can track occurrences of events.
Introduction to Reward Machines
Reinforcement learning agents typically learn from simple reward signals that are functions of the current state or state-action pair. While effective for basic tasks, this approach struggles with temporally extended goals that depend on complex sequences of events. Reward Machines (RMs) address this limitation by providing a formal framework to specify tasks through automata that map sequences of events to rewards. These machines:- Define rewards based on the history of events, not just the current state
- Enable specification of temporally extended tasks with complex logic
- Allow for structured, interpretable task definitions
Types of Reward Machines
The RM/CRM framework supports two main types of Reward Machines:Vanilla Reward Machines
Vanilla Reward Machines are a special case of Counting Reward Machines and define rewards based on sequences of symbolic events, without an extended automaton memory. They consist of:- States: Different phases of the task
- Transitions: Rules for moving between states based on symbolic events
- Rewards: Values assigned during transitions
Counting Reward Machines (CRMs)
Counting Reward Machines extend Vanilla RMs with counter variables as an extended automaton memory that can track occurrences of events. They add:- Counters: Variables that increment or decrement based on events
- Counter Conditions: Rules that use counter values to determine transitions
Components of a Reward Machine
All Reward Machines in the RM/CRM framework share these core components:1. States and Initial State
States represent different phases or stages of a task. Each RM has:- A finite set of states
- An initial state
- (Optionally) terminal states that end the task
2. Transition Functions
Three key functions define the behavior of a Reward Machine:We discuss the transition functions for a Counting Reward Machine in this section. Reward Machines do not make use of a counter transition function and do not include counter conditions in their transition functions.
State Transition Function (delta_u)
Defines how the machine moves between states:
Counter Transition Function (delta_c)
Defines how counter values change (for CRMs):
Reward Transition Function (delta_r)
Defines rewards given during transitions:
3. Counters (for CRMs)
Counting Reward Machines maintain counter variables:Creating a Vanilla Reward Machine
Vanilla Reward Machines are ideal for tasks that involve sequences of events without counting. Here’s a complete example:Creating a Counting Reward Machine
Counting Reward Machines are used for more complex task specifications. Here’s an example that counts and balances events:- Counts occurrences of A in state 0
- Transitions to state 1 when B is observed
- Decrements the counter for each C in state 1
- Completes when the counter reaches 0 (equal numbers of A and C)
Transition Expression Syntax
Counting Reward Machines use a specific syntax for transition expressions:Reward Machines do not make use of counter conditions in their transition functions. All other syntax is the same.
Event Expressions (Well-Formed Formulas)
The event expression part supports Boolean logic, allowing for complex combinations:- Single event:
"A / (-)" - Logical AND:
"A and B / (-)"(both A and B must occur) - Logical OR:
"A or B / (-)"(either A or B must occur) - Logical NOT:
"not A / (-)"(A must not occur) - Complex expressions:
"A and not B / (-)"(A must occur and B must not occur)"not (A or B) / (-)"(neither A nor B can occur)"(A and B) or C / (-)"(either both A and B, or C must occur)
Counter Conditions
For Counting Reward Machines, counter conditions specify requirements for counters:- Any value:
"A / (-)" - Zero:
"A / (Z)" - Non-zero:
"A / (NZ)" - Multiple counters:
"A / (Z,NZ)"(first counter zero, second non-zero)
Advanced Features
Custom Reward Functions
Instead of constant rewards, RMs/CRMs can use callable functions that compute rewards based on the specific details of the transition:Multiple Counters
For more complex tasks, you can use multiple counters:Complex Event Conditions
The transition expressions support rich Boolean logic on events:- The Boolean logic on events must evaluate to true
- The counter conditions must be satisfied
Best Practices
When creating Reward Machines:- Start Simple: Begin with Vanilla RMs before adding counters
- Use Meaningful States: Design states to represent logical phases of the task
- Consistent Structure: Ensure all three transition functions have the same keys
- Terminal States: Use
-1to indicate task completion or failure - Test Transitions: Verify that all possible event combinations are handled
Summary
RM/CRMs provide a powerful and flexible framework for specifying complex tasks in reinforcement learning:- Vanilla Reward Machines define tasks based on sequences of events
- Counting Reward Machines add counters to enable more complex task specifications
- The formalism cleanly separates task specification from environment dynamics
- Boolean logic allows for complex event conditions
- Custom reward functions enable sophisticated reward shaping