Defining rewards based on symbolic event sequences
CountingRewardMachine
base class:
u_0
: The initial state is 0c_0
: The initial counter configuration is (0,), meaning we start with a single counter set to 0encoded_configuration_size
: Specifies how many bits are needed to encode the counter valuesA / (-)
: When event A occurs, with any counter valueC / (NZ)
: When event C occurs and the counter is Non-ZeroC / (Z)
: When event C occurs and the counter is Zero/ (-)
: Default transition when no event occurs