Archive for Python

From this year to next with Python’s Arrow

Date/time done (almost) right

Anyone who has dealt with timezones in their software knows how much of a nightmare it can be. Anyone who has used Python for time handling will also find out that it’s a bit of a minefield. Since we’re almost in 2017, I thought I’d (belatedly) add to the Arrow hype train. Arrow offers a more Python-like experience when writing code that deals with time or timezones. It’s simple but concise*.

The code

As always, a demo is better than a thousand words.
Hopefully the code doesn’t need much explanation. We call now() if we want a local time arrow instance. now(‘%timezone%’) if we want the same in a different timezone. humanize() is a very cool sweetener that translates the arrow object into a more human readable format:

Here’s what we’d get as output:

Search and find instance

A nice little touch is the ability to grab a time from a string. The given example from the docs shows this off better than I could:
found_time = arrow.get('June was born in May 1980', 'MMMM YYYY')

Arrow will correctly extract “May 1980” based on the pattern and ignore “June”. There’s more over at the API docs.

Unfortunately this awesome library won’t stop me from writing 2016 everywhere for the next few weeks but for everything else it does the job.

* As with all timezone libs, they made a few mistakes that will draw criticism (some dodgy naming and the overly ambitious “get()”)

Python 3.6 brings smarter text formatting

Python 3.6 is out!

New major release of Python today brings a few handy upgrades.

I particularly fancy the F string formatting. I’ve always liked how simple formatting of strings is in Python but it got even easier. I’ll let you RTFM if interested but essentially you stick an ‘f’ in front of a string literal to perform a more succinct version of .format(). Pretty cool.

Bokeh Visualisation Using Pokemon Go Data – Part 2!

Python and Bokeh

Introducing Part 2 of my “let’s mess about with Python and Bokeh” series.

Scatter

So last time, I created a simple bar chart showing the top Pokemon by Combat Power (CP). I used this Pokemon Go dataset from Kaggle. I want to keep it fairly simple again, but still learn something and show the data in a different way. I thought a scatter graph might be a good shout.

The Data

Scatter graphs show trends in two numbers right? We already have CP from last time so that’ll do for the Y axis again. For the X axis, I think it might be worthwhile using Hit/Health Points (HP). I would imagine there’ll be a positive correlation between the two but it’d be cool to confirm and any outliers might be interesting to note.

Looking at the dataset again, it seems like there are a few other variables we should include:

  • Pokemon Name – scatter points are kind of useless unless you know what they refer to.
  • Type 1 – this is the Pokemon’s main type. I figure we might get some trends out of it.

The Code

Imports

Just using pandas and bokeh again:

from bokeh.charts import Scatter, output_file, show
import pandas as pd

Visualise

We’ve use dataframe’s read_csv() method again to grab the data and specify our columns (including our new ones). Notice I’ve also used the rename() function to get rid of spaces in some of the columns names. This is because the tooltips parameter in the graph builder needs it. NOTE: you’ll need to latest Bokeh to get the tooltips in the builder to even work. Older versions vomit. If we were to use the lower level Bokeh components this wouldn’t be an issue because we wouldn’t be using chart builders.

data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP', 'Max HP', 'Type 1'])
data = data.rename(columns={'Name': 'name', 'Type 1': 'type1'})
scatter = Scatter(data, x='Max HP', y='Max CP', color='type1', marker='type1', tooltips=['name', 'type1'])
output_file('scatter.html')
show(scatter)

Results

What did we find out? Well the graph confirms a positive correlation between HP and CP:
snorlax

Luckily the code all worked too. We’ve got a key showing the primary types and if we hover on a point we can see the name and type appearing as they should. Great! Using the scatter has definitely added a bit more value to the previous post. We can see that normal Pokemon lead the field when it comes to HP.

We can also see that bug Pokemon don’t have great stats:
bug
But anyone who has played Pokemon knows that doesn’t tell the whole story. Maybe I’ve got an excuse to create more parts to this series!

Tell your friends to learn CS with edX’s (MITx) course on learning to program

There’s a great course doing the rounds for anyone looking to learn CS/programming. It uses Python and is taught by MIT. Most people who stumble across this blog probably won’t use it as it’s fairly basic, but it’d be a good idea to share it to any friends/relatives who have been looking to learn a bit of programming. It’s currently in week 2 so the discussion boards etc. will be a hive of activity for anyone who will need help as they go.

You can check it out here.

Basic Visualisation With Bokeh Using Pokemon Go Data

The Problem

So, I want to use Bokeh to display some data. I’ve had a look at it and it seems like it could be a very powerful (and beautiful) tool to get to grips with. Right now though, I just want to see something on screen. I’ve decided to be topical and have grabbed this Pokemon Go dataset from Kaggle. While we’re on it, take this opportunity to explore Kaggle if you haven’t recently. They’ve got a whole load of user submitted datasets these days and there’s a lot of fun (and learning) to be had.

The Data

There are a few variables included in the dataset, however I just want a minimum viable solution. Why don’t we use the name and max CP fields. If we create a bar chart we could see the strongest Pokemon at a glance. Since there’s 151 of the critters in the data, we probably just want the best of the best too: let’s say anything over 3000 CP.

pandas

I’m not going to talk a lot about pandas because I don’t know enough. I do know that it’s widely used, especially in the data science community, and it makes working with Bokeh charts easier. My understanding of a pandas dataframe is it is a tabular data structure. It also has manipulation functions (e.g. filter and grouping) that can make it behave much like a database. Why is this useful? Apart from the fact that Bokeh accepts it as data input (among others), we can use the filtering capabilities to apply our CP restriction and also using pandas can specify which columns we’re interested in:

import pandas as pd
data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP'])
data = data[data['Max CP'] > 3000]

Also note the read_csv() function that automatically creates a dataframe from a csv file; fecking awesome!

Bokeh

Now that we have the data in the right format, we need to display it. Bokeh seems to have a lot of layers to it, to allow for customisation and building your own components but since this is our first foray into the library, we’ll use the high level bar chart builder to do most of the work for us:

bar = Bar(data, label='Name', xlabel='Pokemon', values='Max CP', ylabel='Max CP', legend=None)
output_file('bar.html')
show(bar)

Hopefully it’s pretty self explanatory. This is idiomatic Python we’re talking about! label will be our x axis and values is our y. You’ll notice I removed the legend. I think for a chart this straightforward, the legend will only get in the way of simplicity. That parameter is optional and if not specified the legend will show. Alternatively, you can specify its position e.g. legend='top_right'.

Result

So what does it give us? We get an html file that displays the chart. The chart comes with a few option buttons for navigation as well as one to export the chart as a picture, which I’ve included below:

bokeh_plot

Mewtwo kicks ass, although I guess we already knew that. Pretty straightforward and under 10 lines of code. Bokeh and I are off to a good start.