Agent implementations that leverage Counting Reward Machines
crm.agents
module provides reinforcement learning algorithms that integrate with Counting Reward Machines to efficiently learn task policies. These agents are designed to take advantage of the counterfactual experience generation capabilities provided by the CRM framework.
The framework includes two main types of agent implementations:
crm.agents.tabular.ql
. This provides a baseline implementation that uses the standard Q-learning update rule:
crm.agents.tabular.cql
module implements Counterfactual Q-Learning, which extends standard Q-Learning to take advantage of the counterfactual experience generation capabilities of Counting Reward Machines.
crm.agents.sb3.sac.csac
module implements Counterfactual Soft Actor-Critic (C-SAC), extending the SAC algorithm from Stable Baselines 3 to learn from counterfactual experiences.
crm.agents.sb3.vec
module, which includes:
DispatchSubprocVecEnv
: An extension of Stable Baselines 3’s SubprocVecEnv
that enables efficient parallel generation of counterfactual experiences