Creating an Anki flashcard deck from Twitter feed to learn sign language

Creating an Anki flashcard deck from Twitter feed to learn sign language

TLDR: Twitter feed sign language signs in Anki:


The problem

Right, so I’m trying to learn some sign language. For anyone who’s not read the blog before: my hearing is really bad. So now I’m trying to get ahead of it getting worse, just in case. I’m going to do a course at the local uni next year but before that I figured it’d be nice to memorise some vocab. The great thing about being able to program is that you can improve your life in lots of little ways with some imagination and some help from Google.

Memorisation (Anki)

I’ve mentioned before that a really useful skill for devs is the ability to memorise. There are loads of techniques but something like a new language (spoken or sign) is a perfect use case for spaced repetition. I use Anki for my spaced repetition needs so I need to get some signs in there in a new deck so that I can start learning.

If you’re not familiar with spaced repetition, it’s a technique based on research that shows the best time to be retain something you’re trying to learn is to recall it just before you forget it. Then, each time you try to recall from memory you’ll be able to go a lot longer before needing to be reminded again.

Anki flashcards are similar to physical flashcards in that there is a front and a back. On one side you’ll put a prompt, e.g. a word to learn, and on the other you’ll put the “answer”, e.g. the corresponding sign for that word.

The solution

So the steps to get what I need are fairly simple:

  1. Get British Sign Language signs from somewhere
  2. Iterate over the signs
  3. Get the word and store it
  4. Get the sign and store it with the word
  5. The storage format should be importable by Anki

A quick bit of research and we’ve got a wealth of signs from a great Twitter feed by British Sign! I also know from reading somewhere before that Anki can import CSVs and you can even ref images if you add them to your Anki media folder so that’s the unknowns from steps 1 and 5 sorted. Like any programming task, once the steps are clear it’s a fairly trivial task to implement.


Python (3) is the logical choice here. It’s quick and easy and I don’t need anything sexy. Just need it to do its job. So here’s the code with a few comments:

import twitter
import re
import csv
import urllib.request

# The Twitter API is awesome. First you have to get your 
# API keys (removed my actual ones!) and create an instance of 
# the client
api = twitter.Api(consumer_key='MY_CONSUMER_KEY',

# Searching the last 1k tweets will be plenty 
t = api.GetUserTimeline(screen_name="BritishSignBSL", count=1000)
tweets = [i.AsDict() for i in t]
with open('bsl_signs.csv', 'w', newline='') as csvfile:
    signwriter = csv.writer(csvfile, delimiter=',')
    for t in tweets:
        tweet_text = t['text']

# The sign of the day tweets always have the same format so Regex 
# is handy for getting the word
        m ='sign is: (.+?) - http', tweet_text)
        if m:
            english_word =
            if 'media' in t:
                media = t['media']
                image_url = media[0]['media_url']
                filename = 'bsl-' + 
                english_word.replace(' ', '-').lower() + '.jpg'

# Downloading the image. Once copied to Anki's 
# folder you just need to ref them in the CSV with img src
                urllib.request.urlretrieve(image_url, filename)

# The important line. Write a row to the CSV, first column is 
# the word pulled from the tweet and second is the ref to the image
                f'<img src="{filename}" />'])

Copy the downloaded images to the folder for Anki and import the CSV and you’re done. I can run the script again in a few weeks/months to add new signs that get released but plenty to learn in the meantime!

Thanks to British Sign ( for kindly agreeing to me using their content.

From this year to next with Python’s Arrow

From this year to next with Python’s Arrow

Date/time done (almost) right

Anyone who has dealt with timezones in their software knows how much of a nightmare it can be. Anyone who has used Python for time handling will also find out that it’s a bit of a minefield. Since we’re almost in 2017, I thought I’d (belatedly) add to the Arrow hype train. Arrow offers a more Python-like experience when writing code that deals with time or timezones. It’s simple but concise*.

The code

As always, a demo is better than a thousand words.
Hopefully the code doesn’t need much explanation. We call now() if we want a local time arrow instance. now(‘%timezone%’) if we want the same in a different timezone. humanize() is a very cool sweetener that translates the arrow object into a more human readable format:

Here’s what we’d get as output:

Search and find instance

A nice little touch is the ability to grab a time from a string. The given example from the docs shows this off better than I could:
found_time = arrow.get('June was born in May 1980', 'MMMM YYYY')

Arrow will correctly extract “May 1980” based on the pattern and ignore “June”. There’s more over at the API docs.

Unfortunately this awesome library won’t stop me from writing 2016 everywhere for the next few weeks but for everything else it does the job.

* As with all timezone libs, they made a few mistakes that will draw criticism (some dodgy naming and the overly ambitious “get()”)

Python 3.6 brings smarter text formatting

Python 3.6 is out!

New major release of Python today brings a few handy upgrades.

I particularly fancy the F string formatting. I’ve always liked how simple formatting of strings is in Python but it got even easier. I’ll let you RTFM if interested but essentially you stick an ‘f’ in front of a string literal to perform a more succinct version of .format(). Pretty cool.

Bokeh Visualisation Using Pokemon Go Data – Part 2!

Bokeh Visualisation Using Pokemon Go Data – Part 2!

Python and Bokeh

Introducing Part 2 of my “let’s mess about with Python and Bokeh” series.


So last time, I created a simple bar chart showing the top Pokemon by Combat Power (CP). I used this Pokemon Go dataset from Kaggle. I want to keep it fairly simple again, but still learn something and show the data in a different way. I thought a scatter graph might be a good shout.

The Data

Scatter graphs show trends in two numbers right? We already have CP from last time so that’ll do for the Y axis again. For the X axis, I think it might be worthwhile using Hit/Health Points (HP). I would imagine there’ll be a positive correlation between the two but it’d be cool to confirm and any outliers might be interesting to note.

Looking at the dataset again, it seems like there are a few other variables we should include:

  • Pokemon Name – scatter points are kind of useless unless you know what they refer to.
  • Type 1 – this is the Pokemon’s main type. I figure we might get some trends out of it.

The Code


Just using pandas and bokeh again:

from bokeh.charts import Scatter, output_file, show
import pandas as pd


We’ve use dataframe’s read_csv() method again to grab the data and specify our columns (including our new ones). Notice I’ve also used the rename() function to get rid of spaces in some of the columns names. This is because the tooltips parameter in the graph builder needs it. NOTE: you’ll need to latest Bokeh to get the tooltips in the builder to even work. Older versions vomit. If we were to use the lower level Bokeh components this wouldn’t be an issue because we wouldn’t be using chart builders.

data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP', 'Max HP', 'Type 1'])
data = data.rename(columns={'Name': 'name', 'Type 1': 'type1'})
scatter = Scatter(data, x='Max HP', y='Max CP', color='type1', marker='type1', tooltips=['name', 'type1'])


What did we find out? Well the graph confirms a positive correlation between HP and CP:

Luckily the code all worked too. We’ve got a key showing the primary types and if we hover on a point we can see the name and type appearing as they should. Great! Using the scatter has definitely added a bit more value to the previous post. We can see that normal Pokemon lead the field when it comes to HP.

We can also see that bug Pokemon don’t have great stats:
But anyone who has played Pokemon knows that doesn’t tell the whole story. Maybe I’ve got an excuse to create more parts to this series!

Tell your friends to learn CS with edX’s (MITx) course on learning to program

There’s a great course doing the rounds for anyone looking to learn CS/programming. It uses Python and is taught by MIT. Most people who stumble across this blog probably won’t use it as it’s fairly basic, but it’d be a good idea to share it to any friends/relatives who have been looking to learn a bit of programming. It’s currently in week 2 so the discussion boards etc. will be a hive of activity for anyone who will need help as they go.

You can check it out here.

Basic Visualisation With Bokeh Using Pokemon Go Data

Basic Visualisation With Bokeh Using Pokemon Go Data

The Problem

So, I want to use Bokeh to display some data. I’ve had a look at it and it seems like it could be a very powerful (and beautiful) tool to get to grips with. Right now though, I just want to see something on screen. I’ve decided to be topical and have grabbed this Pokemon Go dataset from Kaggle. While we’re on it, take this opportunity to explore Kaggle if you haven’t recently. They’ve got a whole load of user submitted datasets these days and there’s a lot of fun (and learning) to be had.

The Data

There are a few variables included in the dataset, however I just want a minimum viable solution. Why don’t we use the name and max CP fields. If we create a bar chart we could see the strongest Pokemon at a glance. Since there’s 151 of the critters in the data, we probably just want the best of the best too: let’s say anything over 3000 CP.


I’m not going to talk a lot about pandas because I don’t know enough. I do know that it’s widely used, especially in the data science community, and it makes working with Bokeh charts easier. My understanding of a pandas dataframe is it is a tabular data structure. It also has manipulation functions (e.g. filter and grouping) that can make it behave much like a database. Why is this useful? Apart from the fact that Bokeh accepts it as data input (among others), we can use the filtering capabilities to apply our CP restriction and also using pandas can specify which columns we’re interested in:

import pandas as pd
data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP'])
data = data[data['Max CP'] > 3000]

Also note the read_csv() function that automatically creates a dataframe from a csv file; fecking awesome!


Now that we have the data in the right format, we need to display it. Bokeh seems to have a lot of layers to it, to allow for customisation and building your own components but since this is our first foray into the library, we’ll use the high level bar chart builder to do most of the work for us:

bar = Bar(data, label='Name', xlabel='Pokemon', values='Max CP', ylabel='Max CP', legend=None)

Hopefully it’s pretty self explanatory. This is idiomatic Python we’re talking about! label will be our x axis and values is our y. You’ll notice I removed the legend. I think for a chart this straightforward, the legend will only get in the way of simplicity. That parameter is optional and if not specified the legend will show. Alternatively, you can specify its position e.g. legend='top_right'.


So what does it give us? We get an html file that displays the chart. The chart comes with a few option buttons for navigation as well as one to export the chart as a picture, which I’ve included below:


Mewtwo kicks ass, although I guess we already knew that. Pretty straightforward and under 10 lines of code. Bokeh and I are off to a good start.