Revising Documents

December 14, 2019

In this post, I share my notes on doing document revision As an engineer working in a company with more than 1000 people, I am often in a position to read and write documents. Documents, when used in moderation, can be useful for getting input on a design spec, describing a problem and context, or describing how a project was carried out, e.g. this is how we built our first ever lunch place automatic rotation assignment system to avoid the same debate we have every week. ... Read more

Encoding in Python 2 and 3

November 21, 2019

In this post I an encoding behavior in python 2 and their differences in python 3 If you’re still using python 2, which many folks are, you may run into encoding issues when processing data. Let’s say we have a file called translations.txt that contains translations between English and Mandarin (from the Oxford dictionary): "The book has 500 pages of text.","这本书正文有500页。" "I'll send you a text as soon as I have any news. ... Read more

Simulating Context-Free Bandits

July 22, 2019

In this post I describe a framework and experiment in simulating context-free bandits The explore-exploit dilemma can be found in many aspects of every day life. To put this in concrete terms, imagine a person receives a free 30 meal gift card from a new breakfast restaurant that just opened up in their city. The restaurant may be well known for having good breakfast options; and as a breakfast lover, the person wants to find the best breakfast option on the menu — note that best here means personally favored, not categorically best, as in defined by a food critic or social media popularity. ... Read more

Concept Drift: Notes for the practicioner

October 20, 2018

In this article, I share notes on handling concept drift for machine learning models. Introduction Concept drift occurs in an online supervised learning setting, when the relationship between the input data X and output data y is altered to the extent that a model mapping X to y can no longer do so with the same efficacy. In online supervised learning, there are three types of drift that can occur: (1) feature drift, i. ... Read more

The Monty Hall Problem

August 18, 2018

In this post I explain the way I came to reason about the Monty Hall problem and provide a tool for you to run experiements to see the outcome of different strategies for playing the game. The Monty Hall problem is an interesting probability teaser. The premise is this: suppose you are at a game show with three doors, one of which has a prize. You, as a guest, have two chances of choosing a door to win the prize. ... Read more

Understanding Data

April 3, 2018

Rich Metadata on Data I have been learning many things about dealing with data at a large scale. At first, I kept using the term quality to describe the state of data. However, it quickly became clear that the term had various dimensions to it, and it could not summarise the issues one can observe. I have come to use the expression understanding data instead because (1) it captures the state I wish to describe and (2) speaks the scientific and functional purposes of that state. ... Read more

Property Based Testing: Describing expected behaviour in terms of properties

March 19, 2018

In my daily work, I often need to write data pipelines to produce metrics, create datasets for machine learning models, or just clean up logs. I had usually done testing the traditional way, verifying my code does what I expect by checking for normal and edge cases I could think of. But over the past couple of months, I started using property based testing, and I feel like my code quality has improved dramatically. ... Read more

Language Style Guide

March 18, 2018

The first programming language I learned was Pascal. I was in my math class in secondary school, and my teacher told us about a programming competition that was held every year between schools. At that point, I everything I knew about programming was a pure construct of my imagination. I knew that you were suppose to type something into a machine and it would do things. But I was into computer hardware at the time, so I figured why not. ... Read more

Task Paralysis

February 26, 2018

What do you do when you have several things you think you should be doing, but aren’t sure which of them you should do next, either right now or at some other point in the near future? This is what is called task paralysis. And it happens to all of us, at one point or another. You look up from your screen and you suddenly realize there are sticky notes filled with things you need or want to do on your desk, from checking out that presentation about market places to reviewing a pull request. ... Read more

Relatively Painless Technical Excellence

December 20, 2017

I have been thinking about excellence for the past couple of months, since I’ve starting working full time again. Excellence in writing code, designing systems, and formulating problems. Last week I had the opportunity to attend a talk titled Relatively Painless Technical Excellence from J.B. Reinsberger. It was one of the best talks I’ve heard on agile and software engineering. I have to start by saying I didn’t have much expectations about it, and I was positively surprised by the content. ... Read more