Bokeh Visualisation Using Pokemon Go Data – Part 2!

Bokeh Visualisation Using Pokemon Go Data – Part 2!

Python and Bokeh

Introducing Part 2 of my “let’s mess about with Python and Bokeh” series.


So last time, I created a simple bar chart showing the top Pokemon by Combat Power (CP). I used this Pokemon Go dataset from Kaggle. I want to keep it fairly simple again, but still learn something and show the data in a different way. I thought a scatter graph might be a good shout.

The Data

Scatter graphs show trends in two numbers right? We already have CP from last time so that’ll do for the Y axis again. For the X axis, I think it might be worthwhile using Hit/Health Points (HP). I would imagine there’ll be a positive correlation between the two but it’d be cool to confirm and any outliers might be interesting to note.

Looking at the dataset again, it seems like there are a few other variables we should include:

  • Pokemon Name – scatter points are kind of useless unless you know what they refer to.
  • Type 1 – this is the Pokemon’s main type. I figure we might get some trends out of it.

The Code


Just using pandas and bokeh again:

from bokeh.charts import Scatter, output_file, show
import pandas as pd


We’ve use dataframe’s read_csv() method again to grab the data and specify our columns (including our new ones). Notice I’ve also used the rename() function to get rid of spaces in some of the columns names. This is because the tooltips parameter in the graph builder needs it. NOTE: you’ll need to latest Bokeh to get the tooltips in the builder to even work. Older versions vomit. If we were to use the lower level Bokeh components this wouldn’t be an issue because we wouldn’t be using chart builders.

data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP', 'Max HP', 'Type 1'])
data = data.rename(columns={'Name': 'name', 'Type 1': 'type1'})
scatter = Scatter(data, x='Max HP', y='Max CP', color='type1', marker='type1', tooltips=['name', 'type1'])


What did we find out? Well the graph confirms a positive correlation between HP and CP:

Luckily the code all worked too. We’ve got a key showing the primary types and if we hover on a point we can see the name and type appearing as they should. Great! Using the scatter has definitely added a bit more value to the previous post. We can see that normal Pokemon lead the field when it comes to HP.

We can also see that bug Pokemon don’t have great stats:
But anyone who has played Pokemon knows that doesn’t tell the whole story. Maybe I’ve got an excuse to create more parts to this series!

Basic Visualisation With Bokeh Using Pokemon Go Data

Basic Visualisation With Bokeh Using Pokemon Go Data

The Problem

So, I want to use Bokeh to display some data. I’ve had a look at it and it seems like it could be a very powerful (and beautiful) tool to get to grips with. Right now though, I just want to see something on screen. I’ve decided to be topical and have grabbed this Pokemon Go dataset from Kaggle. While we’re on it, take this opportunity to explore Kaggle if you haven’t recently. They’ve got a whole load of user submitted datasets these days and there’s a lot of fun (and learning) to be had.

The Data

There are a few variables included in the dataset, however I just want a minimum viable solution. Why don’t we use the name and max CP fields. If we create a bar chart we could see the strongest Pokemon at a glance. Since there’s 151 of the critters in the data, we probably just want the best of the best too: let’s say anything over 3000 CP.


I’m not going to talk a lot about pandas because I don’t know enough. I do know that it’s widely used, especially in the data science community, and it makes working with Bokeh charts easier. My understanding of a pandas dataframe is it is a tabular data structure. It also has manipulation functions (e.g. filter and grouping) that can make it behave much like a database. Why is this useful? Apart from the fact that Bokeh accepts it as data input (among others), we can use the filtering capabilities to apply our CP restriction and also using pandas can specify which columns we’re interested in:

import pandas as pd
data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP'])
data = data[data['Max CP'] > 3000]

Also note the read_csv() function that automatically creates a dataframe from a csv file; fecking awesome!


Now that we have the data in the right format, we need to display it. Bokeh seems to have a lot of layers to it, to allow for customisation and building your own components but since this is our first foray into the library, we’ll use the high level bar chart builder to do most of the work for us:

bar = Bar(data, label='Name', xlabel='Pokemon', values='Max CP', ylabel='Max CP', legend=None)

Hopefully it’s pretty self explanatory. This is idiomatic Python we’re talking about! label will be our x axis and values is our y. You’ll notice I removed the legend. I think for a chart this straightforward, the legend will only get in the way of simplicity. That parameter is optional and if not specified the legend will show. Alternatively, you can specify its position e.g. legend='top_right'.


So what does it give us? We get an html file that displays the chart. The chart comes with a few option buttons for navigation as well as one to export the chart as a picture, which I’ve included below:


Mewtwo kicks ass, although I guess we already knew that. Pretty straightforward and under 10 lines of code. Bokeh and I are off to a good start.