Archive for August 2016

Basic Visualisation With Bokeh Using Pokemon Go Data

The Problem

So, I want to use Bokeh to display some data. I’ve had a look at it and it seems like it could be a very powerful (and beautiful) tool to get to grips with. Right now though, I just want to see something on screen. I’ve decided to be topical and have grabbed this Pokemon Go dataset from Kaggle. While we’re on it, take this opportunity to explore Kaggle if you haven’t recently. They’ve got a whole load of user submitted datasets these days and there’s a lot of fun (and learning) to be had.

The Data

There are a few variables included in the dataset, however I just want a minimum viable solution. Why don’t we use the name and max CP fields. If we create a bar chart we could see the strongest Pokemon at a glance. Since there’s 151 of the critters in the data, we probably just want the best of the best too: let’s say anything over 3000 CP.

pandas

I’m not going to talk a lot about pandas because I don’t know enough. I do know that it’s widely used, especially in the data science community, and it makes working with Bokeh charts easier. My understanding of a pandas dataframe is it is a tabular data structure. It also has manipulation functions (e.g. filter and grouping) that can make it behave much like a database. Why is this useful? Apart from the fact that Bokeh accepts it as data input (among others), we can use the filtering capabilities to apply our CP restriction and also using pandas can specify which columns we’re interested in:

import pandas as pd
data = pd.read_csv('pokemonGo.csv', header=0, usecols=['Name', 'Max CP'])
data = data[data['Max CP'] > 3000]

Also note the read_csv() function that automatically creates a dataframe from a csv file; fecking awesome!

Bokeh

Now that we have the data in the right format, we need to display it. Bokeh seems to have a lot of layers to it, to allow for customisation and building your own components but since this is our first foray into the library, we’ll use the high level bar chart builder to do most of the work for us:

bar = Bar(data, label='Name', xlabel='Pokemon', values='Max CP', ylabel='Max CP', legend=None)
output_file('bar.html')
show(bar)

Hopefully it’s pretty self explanatory. This is idiomatic Python we’re talking about! label will be our x axis and values is our y. You’ll notice I removed the legend. I think for a chart this straightforward, the legend will only get in the way of simplicity. That parameter is optional and if not specified the legend will show. Alternatively, you can specify its position e.g. legend='top_right'.

Result

So what does it give us? We get an html file that displays the chart. The chart comes with a few option buttons for navigation as well as one to export the chart as a picture, which I’ve included below:

bokeh_plot

Mewtwo kicks ass, although I guess we already knew that. Pretty straightforward and under 10 lines of code. Bokeh and I are off to a good start.

Vim – “Change in quotes” trick

Changing quotes in Vim

One of my most used Vim (or IDEAvim in Intellij) commands when I’m editing code is ci”.
I’ve talked before about text objects in Vim like environments and even mentioned how I use c (change) on the command line in vi mode. I think the Vim verb to “change” offers a lot of power that’s lacking in other editors but I just want to talk about ci” today.

What you might not know

Hopefully if you read the last post you’ve adopted ci” in your daily usage, however you might not have realised that this also jumps to the first set of quotes on a line if you’re not currently in quotes. Example:
System.out.println("My printed statement");
Imagine your cursor is currently on the S. (Remember: ^ goes to start of line). If you do ci” then your cursor will jump to the quotes, delete the contents and enter into insert mode ready to type:
System.out.println("");

LINK: AI vs ML vs DL

AI, machine learning, and deep learning are terms that are often used interchangeably. But they are not the same things.

Source: The Difference Between AI, Machine Learning, and Deep Learning? | NVIDIA Blog

Vim text objects in zsh

Vim, not Vi, on the command line

“Vi mode” is well known about. In bash or zsh, it’s possible to move around much in the same way you would in vi (e.g. b goes back a word). What isn’t as well known, is that zsh, 5.0.8 or later, offers support for visual mode and text objects. For anyone who hasn’t yet seen the vim light, this might not mean much but read on…

Example

Say I have a command to create a new git branch but upon copy/paste I notice an error:
git checkout -b CREATURE-my-new-feature
I’m an idiot. That should say FEATURE as per my team’s branching convention. Ok, so in emacs or vi mode I can move back to the “C” in “CREATURE” easily enough using M-b or b respectively but then what? zsh’s Vim text objects allow me to make the corrections in a vim like manner:

  • ciwFEATURE rewrites CREATURE as FEATURE.
  • Even better, ctEF changes to E with an F. This makes the same amendment in a more succinct way.
  • If I wanted to rename the whole branch with something completely different then I could ciWnew-name. (Notice the capital W to change the major text object).

Shell settings

If bash is your shell of choice, then I still recommend trying vi mode for a while, even if it isn’t as fully featured as zsh’s version:
bash setting: set -o vi
zsh: bindkey -v

WTF is a text object

As usual, I’m jumping straight to the point. If you haven’t stumbled across text objects in Vim yet then I recommend opening it up and trying the command:
:help text-objects
Alternatively, there’s a great post here that explains text objects in detail!

GET request and query parameter limits

RESTful

Anyone who’s worked on a REST API for any length of time will hear people talk about being RESTful. RESTful is simply trying to abide by REST standards. People will often argue about whether to use PUT or POST in a given situation. Usually there’s a right answer but not always.

GET it?

One grey area I’ve come across is GET requests with a lot of query parameters. If we’re trying to be RESTful then a GET request shouldn’t have a body. Also, if we need to pass some criteria to a GET request then it’s usually done via query parameters right? (e.g. http://www.mysite.com/?myParam=myvalue). Now, what happens if we have lots of criteria, say a list of things that need to be included in a DB search for that particular GET to return the right info. Depending on backend implementation this might be fine, however certain frameworks will vomit because they have a maximum URI size. If you tried the above with a Java service using Spring REST you will likely get this error:
org.springframework.web.client.HttpClientErrorException: 414 Request-URI Too Long

OK my code breaks, now what?!

There’s debate about this online however I like this answer on Stack Oveflow. TLDR; send a POST. The commenter goes into more detail but essentially if we think of the query as a resource that’s being created then we still please the REST gods while essentially performing a glorified GET.

Vim fu

Becoming a Vim master starts with being a Vim beginner

One of my tech passions is Vim and its philosophy. I’m even writing this post using it. To become better at something, whether it be martial arts, playing an instrument or programming, you need to be consciously striving to improve. The old cliché “Perfect practice makes perfect” is no less correct just because it’s overstated. Just repeating a movement or series of movements with the hope that one day you’ll wake up the best at it is not the way to do it.

I think using Vim (or an offshoot like IDEAVim in Intellij’s IDEA) is going to make for a more efficient and productive developer. I could think of a million reasons why but I’m planning on just writing a series of posts with simple Vim tips that might demonstrate my reasoning.

Basic macros

q is used to start recording a macro, followed by a designation for the macro for playing it back e.g. qa will create macro named “a”. Press q again to stop recording and @a to play the macro at any time. Why is this useful? If I have a file that has 1000 lines and I need to format each line in the same way, I can record a macro for one line (and move down a line) and finish the recording. 1000@a will play the macro a thousand times and file is done. Magic. The Vim ethos of making commands repeatable combined with macros can create some very powerful functionality. Bonus tip: @@ plays the last macro again.

More to come…

Note this post and and future ones in this series are in category “Vim”. Keep an eye out for updates. If you have no idea what this Vim thing is then have a look over at The Official Vim Site

Mini Book Review: Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks

The book

Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks. I read this one a good while back but have been meaning to revisit it to see if there’s anything I missed.

cover

Contents

Jeff Heaton’s book provides an overview of the major Neural Network models in common use as well as a summary of deep learning. It also gives a simple explanation of training algorithms and where they are appropriate. As a relative beginner to the topic, I felt like I was the exact target audience.

Where it goes right

Although 345 pages can be made to seem long if the topic is dry, this one is definitely an easy read. As opposed to other materials on the subject, the author kept the maths very light. If you’re looking to be able to hold a conversation about what a neural net does and why you might use them then this is the book for you. Although I had been studying neural nets and machine learning before picking it up, I felt like certain concepts were explained very well and hearing them in plain english helped reinforce my understanding.
Another way I personally benefited was Heaton’s discussion of trends. As a hobbyist, it’s great to see why certain ideas or algorithms have gone out of favour by the people who use this stuff day in day out.

Where it goes wrong

It’s all a little too simplistic. I came away from it understanding everything I had been told, but was left wanting. A few more chapters that took the level of detail to the next level would have been great. I think anyone who already has a good fundamental understanding of neural nets (i.e. have implemented a few) and machine learning is not going to get much out of this volume. If there had been more concrete examples I think it would have been a bonus. Although it’s in the title, I also felt that deep learning could have been more in depth, although I’d let that slide given how much it’s evolving.

Score

Solid 4.0/5. Like I said, I’ve been meaning to re-read it which is a good sign. The “ELI5” approach helped certain concepts that I had already come across sink in, but it wasn’t revolutionary.

IBM’s Watson and one step closer to AGI

About to hit the hay but wanted to share a link from the news: Watson just diagnosed leukemia where human doctors failed. Awesome.

Code Review Tips – make the dreaded less dreadful

I’m a massive fan of peer review. I think when it’s done right, i.e. with everybody spurred on by willingness to improve, code reviews are a very powerful tool. I’ve laid out a few points I like to keep in mind. Most of it is common sense but there’s no harm in a reminder every once in a while:

Size matters

It’s not the size of the review that counts, it’s the way – whoops wrong blog. Keep the size of the review to a few hundred lines maximum. If it’s too big then split it up into multiple reviews, preferably divided logically e.g. review 1 is the business layer logic and review 2 is the persistence logic. This also means you can have one bit being reviewed while you’re finishing the next bit of work.

Die young

Related to the last point. Ideally, you don’t want everyone reviewing code the last 2 days of the sprint and then fixing a whole load of issues on the last day. Just like with your code, you want to fail early so you can recover. If your user story is huge (ignoring the potential smell) and is going to take the remainder of the sprint to finish then try and get a minimum viable product before that, get it reviewed, and then use what you’ve learned to improve the next iterations. Light reviews early and often.

Diversity

Get a mix of reviewers and expertise. Obviously it’s great to get the resident SQL expert in when your review has a load of SQL queries so that potential issues are discovered but code reviews are just as much (or more, in my opinion) about people learning about what’s going on and how to improve their own approach. Get the new guy/girl involved; their unique perspective might find issues that others would miss but it’s also a valuable learning experience.

LIMIT 3 OFFSET 0

Diversity doesn’t mean everybody on all reviews. 2-3 people should be fine, any more than that and it either gets hectic or some people won’t participate and quality suffers.

“True wisdom is knowing what you don’t know” – Confucius

Leave your ego at the door Everyone writes dodgy code and it happens for a multitude of reasons. The purpose of the review isn’t to have a go at you for not writing the optimal solution or to find gaps in your knowledge; it’s to provide opportunities to learn from each other and hopefully to improve the code base for everyone.

Stand your ground

Almost contrary to the last point, don’t assume the reviewer is correct. If someone comes along and tells you to use a hashmap on line 20, don’t just say “yes, sir, would like a cup of tea with that?” and change it without asking why. As a reviewer, it can be hard to know the full context for why things might be done and that custom data structure you wrote might have some benefit the other person missed.

Questions > direct criticism

Related to the last point (again!), make use of the Socratic method when reviewing another dev’s code. Don’t tell them to change things but instead ask them why a certain thing was done, so that an understanding of the best solution arises from both parties instead of one person bowing to the other’s demands.

Resistance is futile

Don’t get bogged down in personal preference. If the developer likes to name variables a certain way because that’s their style and it still falls within company conventions then let them; however if it might affect maintainability then feel free to ask if it’s the best way to do things.

One small step

Lastly, as with all process changes, I recommend making a step in the right direction. Be the review advocate your team needs and lead by example. Don’t worry about spending all day in code reviews. As I said, informal, light reviews are best.

Hello World

I’m not usually a fan of tradition but I can’t help but kick things off with:

Hello World

As per the “About” section, this blog is a collection of various musings, thoughts and rants about various software topics. I’ll try to provide as much insightful material as possible in short, digestible nuggets.

If a topic warrants more depth, I’ll try to spread it across multiple posts. Each part shouldn’t exceed “coffee length”. I encourage discussion, debate, questions or anything else that might help me or others learn.