Counterfactual Q-Learning leverages the structured nature of Counting Reward Machines to generate additional “what-if” experiences that the agent can learn from, without actually having to explore those states.By utilizing the symbolic representation in the CRM, we can:
Counterfactual Q-Learning demonstrates a powerful approach to accelerating learning in environments with structured symbolic representations. By leveraging the CRM’s symbolic structure, we can generate additional learning experiences that significantly improve sample efficiency and convergence speed.