All things AI related – ML, Deep Learning, Reinforcement Learning.

  • Monte Carlo Tree Search (Part 2): A Complete Explanation with Code

    In the last post we discussed the problem of acting optimally in an episodic environment by estimating the value of a state. Monte Carlo Tree Search (MCTS) naturally fits the problem by incorporating intelligent exploration into decision-time multi-step planning. Give that post a read if you haven’t checked it out yet, but it isn’t necessary…

    Continue reading …

  • Monte Carlo Tree Search (Part 1): Introduction to MDPs

    Following on from the idea of learning to make an optimal single decision, we can expand this to making multiple sequential decisions in an optimal way. To do this we’ll be exploring Monte Carlo Tree Search (MCTS); an algorithm that combines ideas from traditional tree search algorithms, and reinforcement learning (RL). Today we’re going to…

    Continue reading …

  • Multi-Armed Bandits 3: UCB and some exploration tricks

    In this post we’ll walk through some neat tricks to make -greedy more effective, and then we’ll dig into a smarter way to handle exploration: upper confidence bound action selection. We’ll be building on what we learned in my last post, and as always the code can be found in this colab notebook so you…

    Continue reading …

  • Multi-Armed Bandits 2: ε-Greedy and Non-Stationary Problems

    Today we’re going to address some of the problems with an ε-first exploration approach for multi-armed bandits problems. In the last post we saw how ε-first can perform very well on stationary problems where the true value of each bandit arm (slot machine in our example) never changes. But in the real world we are…

    Continue reading …

  • Multi-Armed Bandits 1: ε-first

    In this post we’re going to discuss the age old problem of making decisions when there is uncertainty. To illustrate what I mean, we’re going to dive right into multi-armed bandits problems and what they are exactly. You can follow along and run the code yourself using this google colab notebook. I’ll be updating the…

    Continue reading …

Scroll to Top