Generating Samples using Synthetic Multivariate Distributions

The problem I address in this post is generating samples from multivariate distributions, without having any data. Motivation Generative models are capable of generating new data. Unlike discriminative models, which determine the likelihood of an outcome given a set of input features $P(Y|X)$, a generative model learns the joint distribution between variables $P(X,Y)$. In product development, they can be used for various use cases, including imputing missing data (e.g. with conditional models), determining the likelihood of an observed sample, or creating random samples of data. The last use case is the focus of this post. ...

March 11, 2023 · guidj

Engineering Log: Speed

There is a meme in tech culture - that doing more things amounts to more successful outcomes. It’s a concept worth unpacking, and that’s exactly what we’re going to do in this post. The myth of more Let’s define “things” as a variable $\tau$, which can express experiments, iterations, studies or any task one would consider to be meaningful. If we increase the number of things we’re doing tomorrow, we get: ...

May 14, 2021 · guidj

Machine Learning needs a Protocol

The internet uses a protocol - it’s called http. A protocol, in this sense, is an agreement. It allows any party, wishing to partake in an exchange, to have a clear expectation of how to do things. For the internet, it tells interested parties how to communicate, to send and receive data. It has occurred to me that machine learning currently lacks a protocol. Instead, we have frameworks, which implement protocols defined by the people that have implemented the frameworks. And while there is nothing wrong with these “framework protocols”, the fact that there are several of them is a problem. It implies that there are number of ways to achieve the same thing - read data, transform it, train a model, export it, load it, and get predictions out of it. ...

March 12, 2021 · guidj

Computing Variance Online

In my previous post, A Fable of an Aggregator, I dedcribed the properties of an abstract data type (ADT) that enables concurrent execution of aggregations, such as sum, mean, max. For example, if we want the mean of a collection of values, it sufficies for us to accumluate its sum and count - dividing the former by the latter gives us the answer. More importantly, this accumulation can be done concurrently - and hence, it’s parallelizable. ...

September 5, 2020 · guidj

A fable of an Aggregator

Many parallel data computing tasks can be solved with one abstract data type (ADT). We will describe how an Aggregator does that by walking through a problem we want to solve with parallelism and uncovering the ideal properties of an ADT that enable us to do so. Relevance of Aggregations: The Desirable ADT In the world of analytics and machine learning, data processing makes up a significant chunk of the plumbing required to do both. In the world of big data, or medium-sized data for that matter, parallel processing enables efficient usage of disparate computing resources. Quite frequently, the data we’re referring to is represented by a collection of records. ...

June 13, 2020 · guidj

The Best of...

In this life, people create models of the world in which they live in. Some of these models can be simple and codify critical yet simple rules, e.g. if it’s cold, you search for warmth. Other models can be rather more complex and express things we might not even be able to always explain, like who we feel comfortable being around. What both types of models and others along the spectrum can have in common is a propensity for bias. And by bias, I’m referring to assumptions that are made and taken for granted. Some of them can be harmless, and others can be quite damaging. ...

April 24, 2020 · guidj

COVID19

We are in the middle of a viral outbreak. There are many things that we’re learning as we go along about this new corona virus. In the meantime, health organizations from around the world are scrambling to learn as much as they can from the cases that are known about. In Stockholm, where I live, the number of cases has been growing over the past two weeks. During this time, I have found many tools and dashboards that have been created to track the number of total infections. ...

March 15, 2020 · guidj

Fashion, Trees, and Convolutions: Part III - Convolutions

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. I wanted to test out a guided and easy way to run hyper-parameter tuning. In Part II, I described setting up the end-to-end pipeline with a baseline, and running hyper-parameter tuning with the hyperopt package. In this third and final chapter, I describe my target models, a convolutional neural network trained from scratch and a transfer learning model. ...

March 10, 2020 · guidj

Fashion, Trees, and Convolutions: Part II - Baseline

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. In the Part I, I described the workflow to create the data for my experiments. In this post, I describe creating the baseline and a guided hyper-parameter tuning method. The Baseline For any modeling tasks, I always like to create a baseline model as a starting point. Typically, this will be a relatively basic model in nature. This serves three very important purposes: ...

March 8, 2020 · guidj

Fashion, Trees, and Convolutions: Part I - Data Crunch

In this mini-series of posts, I will describe a hyper-parameter tuning experiment on the fashion-mnist dataset. Hyper-paramater tuning is a very important aspect of training models in machine learning. Particularly with neural networks, where the architecture, optimizer and data can be subject to different parameters. When developing machine learning solutions, there is an interative cycle that can be adopted to enable fast iteration, continous targeted improvements, and testing of the solution - as with any other software systems problems. However, in practice, those iterations can cease as soon as there is a workable solution is in place. And more often than not, that workable solution runs on an initial guess of the right paramters. Thus, little to no hyper-paramter tuning takes place. ...

March 6, 2020 · guidj