Letter World: The Ground Environment

Overview

The ground environment in our framework refers to the base environment that defines the world dynamics, independent of any reward structure. For Letter World, this is a grid-based environment implemented as a standard OpenAI Gymnasium environment.

Environment Description

Letter World consists of:
  • A 3×7 grid where the agent can move in four directions
  • Special positions labeled with letters ‘A’, ‘B’, and ‘C’
  • Stochastic behavior: when visiting position ‘A’, it may randomly change to ‘B’ (with 50% probability)
  • A simple observation space containing the agent’s position and a flag indicating if a symbol has been seen
Here’s a visualization of the environment:
+-------------+
|. . . . . . .|
|A . x . . C .|
|. . . . . . .|
+-------------+
Where:
  • x represents the agent (starting at position [1,3])
  • A represents the position of letter A (which may change to B)
  • C represents the position of letter C
  • . represents empty spaces

Implementation Details

The Letter World environment is implemented as a subclass of gymnasium.Env, following the standard Gym interface.
import gymnasium as gym
import numpy as np

class LetterWorld(gym.Env):
    """The Letter World environment."""

    RIGHT = 0
    LEFT = 1
    UP = 2
    DOWN = 3
    A_POSITION = np.array([1, 1])
    C_POSITION = np.array([1, 5])
    N_ROWS = 3
    N_COLS = 7

    def __init__(self) -> None:
        """Initialize the environment."""
        self.action_space = gym.spaces.Discrete(4)  # Four movement directions
        self.observation_space = gym.spaces.Box(
            low=0, high=100, shape=(3,), dtype=np.int32
        )  # [symbol_seen, row, col]

Action Space

The environment supports four actions for movement:
  • 0: Move right
  • 1: Move left
  • 2: Move up
  • 3: Move down
These actions are defined as constants in the LetterWorld class for better code readability.

Observation Space

The observation is a 3-dimensional vector:
  • First element: Binary flag (0/1) indicating if the symbol has been seen
  • Second element: Row position of the agent
  • Third element: Column position of the agent
This is implemented as a Box space with shape (3,) in the code.

Movement Mechanics

When the agent attempts to move outside the grid boundaries, it remains in the same position. Here’s the code that handles movement:
def _update_agent_position(self, action: int) -> None:
    """Moves that take agent out of the grid leave it in the same position."""
    row, col = self.agent_position

    if action == self.RIGHT:
        n_row = row
        n_col = col + 1 if col < self.N_COLS - 1 else col
    elif action == self.LEFT:
        n_row = row
        n_col = col - 1 if col > 0 else col
    elif action == self.UP:
        n_col = col
        n_row = row - 1 if row > 0 else row
    elif action == self.DOWN:
        n_col = col
        n_row = row + 1 if row < self.N_ROWS - 1 else row
    else:
        raise ValueError(f"Invalid action {action}.")
    self.agent_position = (n_row, n_col)
The method includes boundary checks for each direction to prevent the agent from moving outside the grid.

Stochastic Behavior

The environment includes a stochastic element: when the agent visits position A, there’s a 50% chance that the symbol changes from ‘A’ to ‘B’.
def step(self, action: int) -> tuple[np.ndarray, float, bool, bool, dict]:
    """Take a step in the environment."""
    self._update_agent_position(action)

    if not self.symbol_seen and np.array_equal(
        self.agent_position, self.A_POSITION
    ):
        self.symbol_seen = np.random.random() < 0.5
    return self._get_obs(), 0, False, False, {}
This stochastic behavior creates an interesting challenge for reinforcement learning algorithms, as they must adapt to uncertain outcomes. The change in symbol is tracked by the symbol_seen flag in the observation.

Using the Environment

The environment follows the standard Gym interface, making it easy to use with existing RL algorithms:
# Create the environment
env = LetterWorld()

# Reset to get initial observation
observation, info = env.reset()

# Take random actions
for _ in range(10):
    action = env.action_space.sample()  # Random action
    observation, reward, terminated, truncated, info = env.step(action)
    
    # Visualize the state
    env.render()
    
    if terminated or truncated:
        observation, info = env.reset()

Environment Visualization

When you call render(), the environment prints a colored grid representation to the console. Here’s what it looks like at different states:

Initial State

+-------------+
|. . . . . . .|
|A . x . . C .|
|. . . . . . .|
+-------------+

After Moving to Position A

+-------------+
|. . . . . . .|
|x . . . . C .|
|. . . . . . .|
+-------------+

After Symbol Change (A to B)

+-------------+
|. . . . . . .|
|B . . . . C .|
|. . . . . . .|
+-------------+

Key Points

  • The base environment is reward-free - it only defines the dynamics of the world
  • Navigation is deterministic, but the state transition at position A is stochastic
  • The environment follows the standard Gym interface for compatibility with RL algorithms
  • The agent’s task will be defined by a Counting Reward Machine, which will assign rewards based on the sequence of visited letters

Next Steps