Revising Documents

In this post, I share my notes on doing document revision As an engineer working in a company with more than 1000 people, I am often in a position to read and write documents. Documents, when used in moderation, can be useful for getting input on a design spec, describing a problem and context, or describing how a project was carried out, e.g. this is how we built our first ever lunch place automatic rotation assignment system to avoid the same debate we have every week. ...

December 14, 2019 · guidj

Concept Drift: Notes for the practicioner

In this article, I share notes on handling concept drift for machine learning models. Introduction Concept drift occurs in an online supervised learning setting, when the relationship between the input data X and output data y is altered to the extent that a model mapping X to y can no longer do so with the same efficacy. In online supervised learning, there are three types of drift that can occur: (1) feature drift, i.e. distribution of X, (2) real concept drift i.e. relation between X and y or p(y|X), and (3) change in the prior distribution p(y), e.g. new classes arrived. While both feature and prior distribution changes may be interesting to monitor, for purposes that extend beyond understanding changes in the problem space, it is only real concept drift that we are chiefly concerned with. Consider the following scenario: Google decides to increase the price of their flagship Android devices by 20%, making them more appealing to certain segments and less to others whom will detract to lower end versions of the brand or switch brands altogether to acquire devices within the original price range. As a result, the distribution of users signing up from specific mobile platforms may change. This would be a feature drift. If, however, this change is not sufficient to cause the model to err in its ability to, for example, predict user retention or quality of experience because despite the shifts in demographics most users will remain within the same device price range, and therefore have a similar initial experience, there may very well be no real concept drift. ...

October 20, 2018 · guidj

Understanding Data

Rich Metadata on Data I have been learning many things about dealing with data at a large scale. At first, I kept using the term quality to describe the state of data. However, it quickly became clear that the term had various dimensions to it, and it could not summarise the issues one can observe. I have come to use the expression understanding data instead because (1) it captures the state I wish to describe and (2) speaks the scientific and functional purposes of that state. ...

April 3, 2018 · guidj