Martin Kersner
01 October 2018
Following sections describe how to install and use Checkers environnment from Seoul AI Gym. The latest up to date documentation you can find at GitHub.
If you encounter any issue reach out for help through GitHub issues and we will try to resolve your problem as soon as possible.
Checkers environment requires Python 3.6+.
We try to keep PyPI package up to date, but the latest version of the code can be always found at GitHub.
Install through PyPI
pip install seoulai_gym
or install from source.
git clone https://github.com/seoulai/gym.git
cd gym
pip3 install -e
The main game component is Environment. The Environment stores the state of the game and allows Agents to perform moves.
import seoulai_gym as gym
# Create environment
env = gym.make("Checkers")
# Reset environment - run before every new game
obs = env.reset()
from_row = # row location of agent's piece
from_col = # column location of agent's piece
to_row = # new row location of agent's piece
to_col = # new column location of agent's piece
# Agent makes move from (from_row, from_col)
# position to (to_row, to_col) position using environment
obs, rew, done, info = env.step(agent, from_row, from_col, to_row, to_col)
# Display state of game in graphically
env.render()
# Clean termination of environment
env.close()
Environment enables two Agents to play checkers with each other. Agents take turns and if Agent attempts to make non valid move, no move will be performed. In such case opponent will make two moves in a row.
import seoulai_gym as gym
from seoulai_gym.envs.checkers.agents import RandomAgentLight
from seoulai_gym.envs.checkers.agents import RandomAgentDark
env = gym.make("Checkers")
a1 = RandomAgentLight("Agent 1")
a2 = RandomAgentDark("Agent 2")
obs = env.reset()
current_agent = a1
next_agent = a2
while True:
from_row, from_col, to_row, to_col = current_agent.act(obs)
obs, rew, done, info = env.step(current_agent, from_row, from_col, to_row, to_col)
current_agent.consume(obs, rew, done)
if done:
print(f"Game over! {current_agent} agent wins.")
obs = env.reset()
# switch agents
temporary_agent = current_agent
current_agent = next_agent
next_agent = temporary_agent
env.render()
env.close()
There are 4 important environment variables returned by step
method:
obs
rew
done
info
obs
)State of the game (obs
) is represented as List[List[Piece]]
.
To enable easier manipulation (e.g. input to convolutional network) with the state of game, we provide board_list2numpy
function that converts state to 2D NumPy array.
>>> from seoulai_gym.envs.checkers.utils import board_list2numpy
>>> help(board_list2numpy)
Help on function board_list2numpy in module seoulai_gym.envs.checkers.utils:
board_list2numpy(board_list, encoding)
Convert the state of game (`board_list`) into 2D NumPy Array using `encoding`.
Args:
board_list: (List[List[Piece]]) State of the game.
encoding: (BoardEncoding) Optional argument.
If not given default encoding will be utilized.
Returns:
board_numpy: (np.array)
>>> from seoulai_gym.envs.checkers.utils import board_list2numpy
>>> board_numpy = board_list2numpy(obs)
array([[10., 0., 10., 0., 10., 0., 10., 0.],
[ 0., 10., 0., 10., 0., 10., 0., 10.],
[10., 0., 10., 0., 10., 0., 10., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 20., 0.],
[ 0., 20., 0., 20., 0., 20., 0., 0.],
[20., 0., 20., 0., 20., 0., 20., 0.],
[ 0., 20., 0., 20., 0., 20., 0., 20.]])
>>> from seoulai_gym.envs.checkers.utils import board_list2numpy
>>> from seoulai_gym.envs.checkers.utils import BoardEncoding
>>> enc = BoardEncoding()
>>> enc.dark = 99
>>> enc.light = 33
>>> board_numpy = board_list2numpy(obs, enc)
array([[99., 0., 99., 0., 99., 0., 99., 0.],
[ 0., 99., 0., 99., 0., 99., 0., 99.],
[99., 0., 99., 0., 99., 0., 99., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 33., 0.],
[ 0., 33., 0., 33., 0., 33., 0., 0.],
[33., 0., 33., 0., 33., 0., 33., 0.],
[ 0., 33., 0., 33., 0., 33., 0., 33.]])
rew
)Rewards for different situations in the game are predefined within environment.
There are 7 different reward situations that are mutually exclusive.
Some rewards should be considered as “punishments” (invalid_move
, move_opponent_piece
).
default
- agent performed a valid moveinvalid_move
- agent attempted to make an invalid movemove_opponent_piece
- agent attempted to move with opponent’s pieceremove_opponent_piece
- agent removed opponent’s piecebecome_king
- agent made move with piece that became kingopponent_no_pieces
- opponent has no pieces left, current agent won gameopponent_no_valid_move
- opponent cannot move, current agent won gameIn case you want to set your own rewards, you can do as following:
env = gym.make("Checkers")
rewards_map = {
"default": 1.0,
"invalid_move": 0.0,
}
env.update_rewards(rewards_map)
To get valid moves for given obs
, and from_row
, from_col
, use get_valid_moves
.
done
)When game finished, environment returns True
from step
method, otherwise False
.
Agent that receives True
won the game.
info
)The last returned value from step
method contains additional information about performed move.
This is useful for debugging purposes or as a simple way for exploring current move.