Generating Samples using Synthetic Multivariate Distributions

March 11, 2023

The problem I address in this post is generating samples from multivariate distributions, without having any data. Motivation Generative models are capable of generating new data. Unlike discriminative models, which determine the likelihood of an outcome given a set of input features $P(Y|X)$, a generative model learns the joint distribution between variables $P(X,Y)$. In product development, they can be used for various use cases, including imputing missing data (e.g. with conditional models), determining the likelihood of an observed sample, or creating random samples of data. ... Read more

Engineering Log: Speed

May 14, 2021

There is a meme in tech culture - that doing more things amounts to more successful outcomes. It’s a concept worth unpacking, and that’s exactly what we’re going to do in this post. The myth of more Let’s define “things” as a variable $\tau$, which can express experiments, iterations, studies or any task one would consider to be meaningful. If we increase the number of things we’re doing tomorrow, we get: ... Read more

Machine Learning needs a Protocol

March 12, 2021

The internet uses a protocol - it’s called http. A protocol, in this sense, is an agreement. It allows any party, wishing to partake in an exchange, to have a clear expectation of how to do things. For the internet, it tells interested parties how to communicate, to send and receive data. It has occurred to me that machine learning currently lacks a protocol. Instead, we have frameworks, which implement protocols defined by the people that have implemented the frameworks. ... Read more

Computing Variance Online

September 5, 2020

In my previous post, A Fable of an Aggregator, I dedcribed the properties of an abstract data type (ADT) that enables concurrent execution of aggregations, such as sum, mean, max. For example, if we want the mean of a collection of values, it sufficies for us to accumluate its sum and count - dividing the former by the latter gives us the answer. More importantly, this accumulation can be done concurrently - and hence, it’s parallelizable. ... Read more

A fable of an Aggregator

June 13, 2020

Many parallel data computing tasks can be solved with one abstract data type (ADT). We will describe how an Aggregator does that by walking through a problem we want to solve with parallelism and uncovering the ideal properties of an ADT that enable us to do so. Relevance of Aggregations: The Desirable ADT In the world of analytics and machine learning, data processing makes up a significant chunk of the plumbing required to do both. ... Read more

The Best of...

April 24, 2020

In this life, people create models of the world in which they live in. Some of these models can be simple and codify critical yet simple rules, e.g. if it’s cold, you search for warmth. Other models can be rather more complex and express things we might not even be able to always explain, like who we feel comfortable being around. What both types of models and others along the spectrum can have in common is a propensity for bias. ... Read more

COVID19

March 15, 2020

We are in the middle of a viral outbreak. There are many things that we’re learning as we go along about this new corona virus. In the meantime, health organizations from around the world are scrambling to learn as much as they can from the cases that are known about. In Stockholm, where I live, the number of cases has been growing over the past two weeks. During this time, I have found many tools and dashboards that have been created to track the number of total infections. ... Read more

Fashion, Trees, and Convolutions: Part III - Convolutions

March 10, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. I wanted to test out a guided and easy way to run hyper-parameter tuning. In Part II, I described setting up the end-to-end pipeline with a baseline, and running hyper-parameter tuning with the hyperopt package. In this third and final chapter, I describe my target models, a convolutional neural network trained from scratch and a transfer learning model. ... Read more

Fashion, Trees, and Convolutions: Part II - Baseline

March 8, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. In the Part I, I described the workflow to create the data for my experiments. In this post, I describe creating the baseline and a guided hyper-parameter tuning method. The Baseline For any modeling tasks, I always like to create a baseline model as a starting point. Typically, this will be a relatively basic model in nature. ... Read more

Fashion, Trees, and Convolutions: Part I - Data Crunch

March 6, 2020

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. Hyper-paramater tuning is a very important aspect of training models in machine learning. Particularly with neural networks, where the architecture, optimizer and data can be subject to different parameters. When developing machine learning solutions, there is an interative cycle that can be adopted to enable fast iteration, continous targeted improvements, and testing of the solution - as with any other software systems problems. ... Read more