So, I want to use Bokeh to display some data. I’ve had a look at it and it seems like it could be a very powerful (and beautiful) tool to get to grips with. Right now though, I just want to see something on screen. I’ve decided to be topical and have grabbed this Pokemon Go dataset from Kaggle. While we’re on it, take this opportunity to explore Kaggle if you haven’t recently. They’ve got a whole load of user submitted datasets these days and there’s a lot of fun (and learning) to be had.
There are a few variables included in the dataset, however I just want a minimum viable solution. Why don’t we use the name and max CP fields. If we create a bar chart we could see the strongest Pokemon at a glance. Since there’s 151 of the critters in the data, we probably just want the best of the best too: let’s say anything over 3000 CP.
I’m not going to talk a lot about pandas because I don’t know enough. I do know that it’s widely used, especially in the data science community, and it makes working with Bokeh charts easier. My understanding of a pandas dataframe is it is a tabular data structure. It also has manipulation functions (e.g. filter and grouping) that can make it behave much like a database. Why is this useful? Apart from the fact that Bokeh accepts it as data input (among others), we can use the filtering capabilities to apply our CP restriction and also using pandas can specify which columns we’re interested in:
import pandas as pd
data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP'])
data = data[data['Max CP'] > 3000]
Also note the read_csv() function that automatically creates a dataframe from a csv file; fecking awesome!
Now that we have the data in the right format, we need to display it. Bokeh seems to have a lot of layers to it, to allow for customisation and building your own components but since this is our first foray into the library, we’ll use the high level bar chart builder to do most of the work for us:
bar = Bar(data, label='Name', xlabel='Pokemon', values='Max CP', ylabel='Max CP', legend=None)
Hopefully it’s pretty self explanatory. This is idiomatic Python we’re talking about! label will be our x axis and values is our y. You’ll notice I removed the legend. I think for a chart this straightforward, the legend will only get in the way of simplicity. That parameter is optional and if not specified the legend will show. Alternatively, you can specify its position e.g.
So what does it give us? We get an html file that displays the chart. The chart comes with a few option buttons for navigation as well as one to export the chart as a picture, which I’ve included below:
Mewtwo kicks ass, although I guess we already knew that. Pretty straightforward and under 10 lines of code. Bokeh and I are off to a good start.