All things AI related – ML, Deep Learning, Reinforcement Learning.

Monte Carlo Tree Search (Part 2): A Complete Explanation with Code
In the last post we discussed the problem of acting optimally in an episodic environment by estimating the value of a state. Monte Carlo Tree Search (MCTS) naturally fits the problem by incorporating intelligent exploration into decisiontime multistep planning. Give that post a read if you haven’t checked it out yet, but it isn’t necessary…

Monte Carlo Tree Search (Part 1): Introduction to MDPs
Following on from the idea of learning to make an optimal single decision, we can expand this to making multiple sequential decisions in an optimal way. To do this we’ll be exploring Monte Carlo Tree Search (MCTS); an algorithm that combines ideas from traditional tree search algorithms, and reinforcement learning (RL). Today we’re going to…

MultiArmed Bandits 3: UCB and some exploration tricks
In this post we’ll walk through some neat tricks to make greedy more effective, and then we’ll dig into a smarter way to handle exploration: upper confidence bound action selection. We’ll be building on what we learned in my last post, and as always the code can be found in this colab notebook so you…

MultiArmed Bandits 2: εGreedy and NonStationary Problems
Today we’re going to address some of the problems with an εfirst exploration approach for multiarmed bandits problems. In the last post we saw how εfirst can perform very well on stationary problems where the true value of each bandit arm (slot machine in our example) never changes. But in the real world we are…

MultiArmed Bandits 1: εfirst
In this post we’re going to discuss the age old problem of making decisions when there is uncertainty. To illustrate what I mean, we’re going to dive right into multiarmed bandits problems and what they are exactly. You can follow along and run the code yourself using this google colab notebook. I’ll be updating the…