{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "6a7ea65e2f19d811f1a48145be4a29dd", "grade": false, "grade_id": "cell-84617f606b66d110", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Artificial Intelligence UE\n", "## Exercises 3 - Game Playing\n", "\n", "In this series of exercises you are looking at game playing - more precisely, at the Minimax algorithm, Alpha-Beta pruning and Q-Learning. \n", "\n", "The algorithms have been explained in the lecture (VO) and we gave you some additional information in the exercise (UE). Please refer to the lecture slides (VO) for the pseudo algorithms and the exercise slides (UE) for additional hints.\n", "\n", "
Practical hints:
\n", "# YOUR CODE HERE, raise NotImplementedError() with your code.float('-Inf')float('Inf')state = env.reset() to reset the environment at the start of an episodestate, reward, done = env.step(action) to tell the environment that your agent decided to take `action`. The environment then tells you in which state you actually ended up in (state), what the immediate reward was (reward), and whether or not the episode ended (done).