Understanding Data

April 3, 2018

Rich Metadata on Data I have been learning many things about dealing with data at a large scale. At first, I kept using the term quality to describe the state of data. However, it quickly became clear that the term had various dimensions to it, and it could not summarise the issues one can observe. I have come to use the expression understanding data instead because (1) it captures the state I wish to describe and (2) speaks the scientific and functional purposes of that state. ... Read more

Property Based Testing: Describing expected behaviour in terms of properties

March 19, 2018

In my daily work, I often need to write data pipelines to produce metrics, create datasets for machine learning models, or just clean up logs. I had usually done testing the traditional way, verifying my code does what I expect by checking for normal and edge cases I could think of. But over the past couple of months, I started using property based testing, and I feel like my code quality has improved dramatically. ... Read more

Language Style Guide

March 18, 2018

The first programming language I learned was Pascal. I was in my math class in secondary school, and my teacher told us about a programming competition that was held every year between schools. At that point, I everything I knew about programming was a pure construct of my imagination. I knew that you were suppose to type something into a machine and it would do things. But I was into computer hardware at the time, so I figured why not. ... Read more

Task Paralysis

February 26, 2018

What do you do when you have several things you think you should be doing, but aren’t sure which of them you should do next, either right now or at some other point in the near future? This is what is called task paralysis. And it happens to all of us, at one point or another. You look up from your screen and you suddenly realize there are sticky notes filled with things you need or want to do on your desk, from checking out that presentation about market places to reviewing a pull request. ... Read more

Relatively Painless Technical Excellence

December 20, 2017

I have been thinking about excellence for the past couple of months, since I’ve starting working full time again. Excellence in writing code, designing systems, and formulating problems. Last week I had the opportunity to attend a talk titled Relatively Painless Technical Excellence from J.B. Reinsberger. It was one of the best talks I’ve heard on agile and software engineering. I have to start by saying I didn’t have much expectations about it, and I was positively surprised by the content. ... Read more

fsql: search through your file system like a database

May 20, 2017

fsql is a tool I came across recently, while searching for Go projects on github. It’s a command line tool that lets you run SQL-like queries on your file system. You can search through your files based on their name, size, mode, and time. It’ll take you 10 min to get setup, and start using it like a (semi-)pro. It’s definetly one of my favourite command line tools now. You’ll find everything you need to know about it on its Github page. ... Read more

These Past Weeks in Science & Tech - 002

May 2, 2017

In this May’s edition of “These Past Weeks in Science & Tech”, I’ll be discussing /R Biological Data Stores Our bodies are walking a library of information. From chemical processes that regulate our biological functions, to experiences imprinted in our minds through exeprience, our cells somehow manage to store, access, and use information. So, it is no surpise that we should turn to biology to find the next generation of storage. ... Read more

Distill: Clearing Research Debt

April 8, 2017

Recently, I was refered to a website called Distill, by a friend. Their purpose for existing, as they claim, is to reduce research debt. A phenomenon that occurs when there is so much work that has been done before in a field, and is hidden behind complicated explanations that could be rather simplified, thus slowing down scientific progress by demanding a high amount of energy from researchers in understanding previous work. ... Read more

These Past Weeks in Science & Tech - 001

April 2, 2017

I tend to read quite a bit. I read journals, blogs, and sites on technology, AI, learning (with and without machines), and technology. These include renowed ones, like ACM Tech News, MIT News and Technology Review, and aleatory ones that I find on the web through platforms like Medium. As someone who enjoys reading, and writing, I figured I could use summary notes on the things I learn about. And so, I decided to do that on this blog. ... Read more

Probabilistic Reasoning

March 16, 2017

It’s Monday morning, and you just got up. You look at the clock, and it tells you the time is now 7.45am. You have a presentation that starts 8.30am, so that gives you exactly 45min to get to the office. Now, you have a few decisions to make here. Breakfast, yes or no? Mode of transportation, bus, metro or biking? Clothing preparation, i.e. throwing on last week’s shirt or ironing new clothes? ... Read more