Cognitive Dissonance: Type hinting and linting in Python

At work, the adoption of python 3 was finally moving at warp speed - the end of its support might have had something to do with it. As a result, there was a lot of code to migrate over. One of the things I did during this migration was add type hinting as well as linter checks to the codebase. And I... was not... ready for that! When I first read about type hinting I thought it would be a neat thing to help people new to the language and existing users navigate through code. After all, the language wasn’t becoming statically typed. I figured the hints were, as its name suggested, slight indications. However, combined with static analysis tools, they can actually help you identify bugs and other issues in your code, and I’m all in for that. ...

February 12, 2020 · guidj

Revising Documents

In this post, I share my notes on doing document revision As an engineer working in a company with more than 1000 people, I am often in a position to read and write documents. Documents, when used in moderation, can be useful for getting input on a design spec, describing a problem and context, or describing how a project was carried out, e.g. this is how we built our first ever lunch place automatic rotation assignment system to avoid the same debate we have every week. ...

December 14, 2019 · guidj

Encoding in Python 2 and 3

In this post I an encoding behavior in python 2 and their differences in python 3 If you’re still using python 2, which many folks are, you may run into encoding issues when processing data. Let’s say we have a file called translations.txt that contains translations between English and Mandarin (from the Oxford dictionary): "The book has 500 pages of text.","这本书正文有500页。" "I'll send you a text as soon as I have any news.","我一得到任何消息,就立刻给你发短信。" Now, say we try to read the file in python 2: ...

November 21, 2019 · guidj

Simulating Context-Free Bandits

In this post I describe a framework and experiment in simulating context-free bandits The explore-exploit dilemma can be found in many aspects of every day life. To put this in concrete terms, imagine a person receives a free 30 meal gift card from a new breakfast restaurant that just opened up in their city. The restaurant may be well known for having good breakfast options; and as a breakfast lover, the person wants to find the best breakfast option on the menu — note that best here means personally favored, not categorically best, as in defined by a food critic or social media popularity. There are dozens of breakfast options, yet, not all of them are equally good, as per the person’s preferences. The resource constraint here is number of meals, which in this case is limited to 30. Assuming a limit of a single meal per day, that gives 30 days as slots to try out the meals. The goal is to maximize total reward by spending as many free meals as possible on the most favored menu option. For instance, if the person tries menu item four, and really enjoys it, then the reward can be 1. If they don’t, then reward can be 0 - this is the lost opportunity in getting a positive experience, which is also called regret. ...

July 22, 2019 · guidj

Concept Drift: Notes for the practicioner

In this article, I share notes on handling concept drift for machine learning models. Introduction Concept drift occurs in an online supervised learning setting, when the relationship between the input data X and output data y is altered to the extent that a model mapping X to y can no longer do so with the same efficacy. In online supervised learning, there are three types of drift that can occur: (1) feature drift, i.e. distribution of X, (2) real concept drift i.e. relation between X and y or p(y|X), and (3) change in the prior distribution p(y), e.g. new classes arrived. While both feature and prior distribution changes may be interesting to monitor, for purposes that extend beyond understanding changes in the problem space, it is only real concept drift that we are chiefly concerned with. Consider the following scenario: Google decides to increase the price of their flagship Android devices by 20%, making them more appealing to certain segments and less to others whom will detract to lower end versions of the brand or switch brands altogether to acquire devices within the original price range. As a result, the distribution of users signing up from specific mobile platforms may change. This would be a feature drift. If, however, this change is not sufficient to cause the model to err in its ability to, for example, predict user retention or quality of experience because despite the shifts in demographics most users will remain within the same device price range, and therefore have a similar initial experience, there may very well be no real concept drift. ...

October 20, 2018 · guidj

The Monty Hall Problem

In this post I explain the way I came to reason about the Monty Hall problem and provide a tool for you to run experiements to see the outcome of different strategies for playing the game. The Monty Hall problem is an interesting probability teaser. The premise is this: suppose you are at a game show with three doors, one of which has a prize. You, as a guest, have two chances of choosing a door to win the prize. The first time you choose, the host eliminates one of the other doors, leaving you with two. So suppose you pick door number 1, the host could then open door number 2 to show you there is nothing behind it, leaving you with 1 and 3. At this point, you get to make your second choice, choosing between 1 and 3. The question of posed in the puzzle is: do you stick with your first choice or switch? ...

August 18, 2018 · guidj

Understanding Data

Rich Metadata on Data I have been learning many things about dealing with data at a large scale. At first, I kept using the term quality to describe the state of data. However, it quickly became clear that the term had various dimensions to it, and it could not summarise the issues one can observe. I have come to use the expression understanding data instead because (1) it captures the state I wish to describe and (2) speaks the scientific and functional purposes of that state. ...

April 3, 2018 · guidj

Property Based Testing: Describing expected behaviour in terms of properties

In my daily work, I often need to write data pipelines to produce metrics, create datasets for machine learning models, or just clean up logs. I had usually done testing the traditional way, verifying my code does what I expect by checking for normal and edge cases I could think of. But over the past couple of months, I started using property based testing, and I feel like my code quality has improved dramatically. ...

March 19, 2018 · guidj

Language Style Guide

The first programming language I learned was Pascal. I was in my math class in secondary school, and my teacher told us about a programming competition that was held every year between schools. At that point, I everything I knew about programming was a pure construct of my imagination. I knew that you were suppose to type something into a machine and it would do things. But I was into computer hardware at the time, so I figured why not. To be honest, I don’t remember much about the language, or those days for that matter. But I do recall trying to grasp the little quirks about the language. ...

March 18, 2018 · guidj

Task Paralysis

What do you do when you have several things you think you should be doing, but aren’t sure which of them you should do next, either right now or at some other point in the near future? This is what is called task paralysis. And it happens to all of us, at one point or another. You look up from your screen and you suddenly realize there are sticky notes filled with things you need or want to do on your desk, from checking out that presentation about market places to reviewing a pull request. It started off small with three items. Just reminders, so you wouldn’t forget anything important you wanted to do. ...

February 26, 2018 · guidj