A fable on an Aggregator

June 13, 2020

Many parallel data computing tasks can be solved with one abstract data type (ADT). We will describe how an Aggregator does that by walking through a problem we want to solve with parallelism and uncovering the ideal properties of an ADT that enable us to do so. Relevance of Aggregations: The Desirable ADT In the world of analytics and machine learning, data processing makes up a significant chunk of the plumbing required to do both.

The Best of...

April 24, 2020

In this life, people create models of the world in which they live in. Some of these models can be simple and codify critical yet simple rules, e.g. if it's cold, you search for warmth. Other models can be rather more complex and express things we might not even be able to always explain, like who we feel comfortable being around. What both types of models and others along the spectrum can have in common is a propensity for bias.


March 15, 2020

We are in the middle of a viral outbreak. There are many things that we're learning as we go along about this new corona virus. In the meantime, health organizations from around the world are scrambling to learn as much as they can from the cases that are known about. In Stockholm, where I live, the number of cases has been growing over the past two weeks. During this time, I have found many tools and dashboards that have been created to track the number of total infections.

Fashion, Trees, and Convolutions: Part III - Convolutions

March 10, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. I wanted to test out a guided and easy way to run hyper-parameter tuning. In Part II, I described setting up the end-to-end pipeline with a baseline, and running hyper-parameter tuning with the hyperopt package. In this third and final chapter, I describe my target models, a convolutional neural network trained from scratch and a transfer learning model.

Fashion, Trees, and Convolutions: Part II - Baseline

March 8, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. In the Part I, I described the workflow to create the data for my experiments. In this post, I describe creating the baseline and a guided hyper-parameter tuning method. The Baseline For any modeling tasks, I always like to create a baseline model as a starting point. Typically, this will be a relatively basic model in nature.

Fashion, Trees, and Convolutions: Part I - Data Crunch

March 6, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. Hyper-paramater tuning is a very important aspect of training models in machine learning. Particularly with neural networks, where the architecture, optimizer and data can be subject to different parameters. When developing machine learning solutions, there is an interative cycle that can be adopted to enable fast iteration, continous targeted improvements, and testing of the solution - as with any other software systems problems.

Cognitive Dissonance: Type hinting and linting in Python

February 12, 2020

At work, the adoption of python 3 was finally moving at warp speed - the end of its support might have had something to do with it. As a result, there was a lot of code to migrate over. One of the things I did during this migration was add type hinting as well as linter checks to the codebase. And I... was not... ready for that! When I first read about type hinting I thought it would be a neat thing to help people new to the language and existing users navigate through code.

Revising Documents

December 14, 2019

In this post, I share my notes on doing document revision As an engineer working in a company with more than 1000 people, I am often in a position to read and write documents. Documents, when used in moderation, can be useful for getting input on a design spec, describing a problem and context, or describing how a project was carried out, e.g. this is how we built our first ever lunch place automatic rotation assignment system to avoid the same debate we have every week.

Encoding in Python 2 and 3

November 21, 2019

In this post I an encoding behavior in python 2 and their differences in python 3 If you're still using python 2, which many folks are, you may run into encoding issues when processing data. Let's say we have a file called translations.txt that contains translations between English and Mandarin (from the Oxford dictionary): "The book has 500 pages of text.","这本书正文有500页。" "I'll send you a text as soon as I have any news.

Simulating Context-Free Bandits

July 22, 2019

In this post I describe a framework and experiment in simulating context-free bandits The explore-exploit dilemma can be found in many aspects of every day life. To put this in concrete terms, imagine a person receives a free 30 meal gift card from a new breakfast restaurant that just opened up in their city. The restaurant may be well known for having good breakfast options; and as a breakfast lover, the person wants to find the best breakfast option on the menu — note that best here means personally favored, not categorically best, as in defined by a food critic or social media popularity.